N6-methyladenosine (m6A) refers to methylation modification of the adenosine nucleotide acid
at the nitrogen-6 position. Recent research scope of identifying N6-methyladenosine (m6A)
methylation site has extend to mammalian transcriptomes. Many traditional identification
methods based on sequence feature are limited by data scale, take advantage of million
levels of mammalian m6A site dataset and larger sequence windows, deep learning technology
is expected to make a better effect with property of data driving. Dealing with analogous
sequence type data, researches in Natural Language Processing (NLP) suggested to learn
a latent representation of words using word embedding algorithms. Inspired by it,
we report Gene2vec, an RNA N6-adenosine methylation predictor based on gene-subsequence-based
neural embedding algorithms.
In this paper, we built four prediction schemes with various
RNA sequence representation with optimized convolutional neural networks and compared the
prediction effect, all these predictors based on neural network achieve a more effective
prediction than traditional methods, and the gene-subsequence-based neural embedding
(Gene2vec) method stands outing benefitting from the combination of word embedding and
deep network. As we know, this is first time that using word embedding and deep neural
network on prediction of mammalian N6-methyladenosine sites. We evaluated these predictors
on rigorous independent test dataset and proved that our proposed method outperforms the
Fig 1. Workflow of multiple predictor.
Four prediction shames have been built, i.e., one-hot encoding transformed by sequence flanking windows with four cell structures network, neighboring methylation states encoding data with two cell structures, RNA word embedding and Gene2vec from pseudo RNA sequence word with two cell structures and with two cell structures.
2D plot: 3-length of vector space correlation of RNA words generated by Gene2vec
3D plot: 3-length of vector space correlation of RNA words generated by Gene2vec