CN115662508A - RNA modification site prediction method based on multi-scale cross attention model - Google Patents

RNA modification site prediction method based on multi-scale cross attention model Download PDF

Info

Publication number
CN115662508A
CN115662508A CN202211260393.0A CN202211260393A CN115662508A CN 115662508 A CN115662508 A CN 115662508A CN 202211260393 A CN202211260393 A CN 202211260393A CN 115662508 A CN115662508 A CN 115662508A
Authority
CN
China
Prior art keywords
sequence
attention
vector
value
sequences
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211260393.0A
Other languages
Chinese (zh)
Other versions
CN115662508B (en
Inventor
王鸿磊
张�林
刘辉
张雪松
王栋
曾文亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xuzhou College of Industrial Technology
Original Assignee
Xuzhou College of Industrial Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xuzhou College of Industrial Technology filed Critical Xuzhou College of Industrial Technology
Priority to CN202211260393.0A priority Critical patent/CN115662508B/en
Publication of CN115662508A publication Critical patent/CN115662508A/en
Application granted granted Critical
Publication of CN115662508B publication Critical patent/CN115662508B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a multi-scale cross attention model-based RNA modification site prediction method, and relates to the field of post-RNA transcription modification site prediction in bioinformatics. The method comprises the following steps: to contain N 1 RNA base sequence of the methyladenosine modification site as a positive sample and containing no N 1 The RNA base sequence of the methyl adenosine modification site is a negative sample, and 3 groups of RNA base sequences with different scales are taken as input sequences in each sample; carrying out word embedding coding and position coding on the 3 groups of input sequences; inputting the encoded 3 groups of sequences into an encoding module, wherein the encoding module comprises a multi-head cross attention layer and a forward feedback full connection layer, averaging output results, and then carrying out full connectionConnecting the neural network layer and two classifiers to predict whether the RNA base sequence of human species contains N 1 -a methyl adenosine modification site. The method can describe the context relation of complex words, and enhance the influence of important words in the text on the methylation site prediction, so that the methylation site can be accurately predicted.

Description

RNA modification site prediction method based on multi-scale cross attention model
Technical Field
The invention relates to the field of prediction of post-transcriptional modification sites of RNA in bioinformatics, in particular to a multi-scale cross attention model-based N in RNA 1 -a methyladenosine modification site prediction method.
Background
Studies have shown that epigenomic transcriptome regulation by post-transcriptional RNA modification is essential for all kinds of RNA, and therefore, accurate recognition of RNA modifications is crucial for understanding its purpose and the regulatory mechanisms.
The traditional RNA modification site recognition experiment method is relatively complex, time-consuming and labor-consuming. The machine learning method is already applied to the calculation process of RNA sequence feature extraction and classification, and can more effectively supplement the experimental method. In recent years, convolutional Neural Networks (CNNs) and Long-term memory (LSTM) have achieved significant success in modifying site prediction due to their powerful functions in characterization learning.
However, convolutional Neural Networks (CNNs) can learn local responses from spatial data, but cannot learn sequence correlations; long-term memory (LSTM) is dedicated to sequence modeling and can access context representations simultaneously, but lacks spatial data extraction compared to CNN. For the above reasons, the motivation for constructing a prediction framework using Natural Language Processing (NLP) and other deep learning (deep learning, DL) is strong.
In the prior art, when a prediction framework is constructed, although an attention mechanism is used, important characteristics of sentence context can be concerned, information interaction is lacked among single attention sequences, and the context relationship of complex terms is difficult to describe; and does not adequately relate to context, enhancing the impact of important words in the text on methylation site prediction.
Disclosure of Invention
Based on this, it is necessary to provide a method for predicting RNA modification sites based on a multi-scale cross-attention model in order to solve the above technical problems.
The embodiment of the invention provides a multi-scale cross attention model-based RNA modification site prediction method, which comprises the following steps:
to contain N 1 RNA base sequence of the methyladenosine modification site as a positive sample and containing no N 1 The RNA base sequence of the methyl adenosine modification site is a negative sample, and 3 groups of RNA base sequences with different scales are taken as input sequences in each sample;
sequentially carrying out word2vec word embedded coding and position coding on the 3 groups of input sequences;
obtaining a characteristic sequence from the coded 3 groups of sequence coding modules; wherein the encoding module comprises: a plurality of coding blocks connected in series in sequence; the coding block includes: a multi-head cross attention layer and a forward feedback full connection layer, and each layer is connected with a standardization layer through a residual error;
averaging the output results of the coding modules, and predicting whether the RNA base sequence of the human species contains N or not through a fully-connected neural network layer and two classifiers 1 -a methyl adenosine modification site.
Further, constructing a data set; the data set includes: the RNA base sequence is a positive sample, the RNA base sequence is a negative sample and a category label, and the sample length is 41bp; the input sequence is set as a sequence a, a sequence b and a sequence c which are respectively a set consisting of sequences with different scales of length xbp, ybp and zbp;
the training set and the test set of the data set are represented as:
Figure BDA0003891324040000021
wherein, y n ∈{0,1},
Figure BDA0003891324040000022
Respectively representing auxiliary sequences with different scales of xbp, ybp and zbp, wherein the auxiliary sequences take the center of the sequence as the central point and intercept the sequences with different scales.
Further, each sample takes 3 sets of RNA base sequences of different sizes as input sequences, including:
the sample sequence in the data set is centered on the common motif A, the front and back sampling windows are bp with different sizes, taking 3 differences of x1bp, y1bp and z1bp as an example, namely, each m is 1 A positive sample/negative sample is composed of xbp, ybp and zbp, and when no base exists in some positions of the sample sequence, the missing base is filled with a '-' character; let x1=10, y1=15, and z1=20, so x =21, y =31, z =41.
Further, the word2vec word embedded code specifically includes:
sliding on each sample sequence by using a window with the size of 3 bases in a mode of sliding 1 base at a time until the window is in contact with the tail end of the sequence, thereby obtaining a dictionary consisting of 105 different subsequences and a unique integer sequence;
respectively encoding RNA sequences by using a CBOW model of word2vec aiming at sample sequences with different scales; for 41-base samples, a window with the size of 3 bases is used, the sliding is performed in a mode of sliding 1 base each time, the sliding is performed on each sample sequence until the window is contacted with the tail end of the sequence, therefore, 39 subsequences composed of 3 bases are obtained, the RNA sequence is coded by using a CBOW model of word2vec, therefore, each subsequence is converted into a word vector representing semantics, and the obtained word vector is used for converting the length of 41bp in the RNA base sequence into a matrix of 39 × 100, wherein 39 is the number of words in preprocessing, and 100 is the dimension of the word vector.
Further, the encoding module includes: 3 coding blocks are sequentially connected in series.
Further, the encoding module includes:
dimension d of model output model =64, multiple head number h =8, feed forward network dimension d _ ff =256, and the probability of temporary drop from the network is dropout=0.1。
Further, the multi-scale cross-attention layer comprises:
when the sequence a carries out self-attention calculation, the sequence a carries out cross attention calculation with the sequence b and the sequence c respectively, wherein the cross attention means that the first sequence is used for query input, and the other sequence is used for key input and value input to carry out attention calculation; and adding the output results of the 3 kinds of attention as the output of the cross attention layer to realize the multi-scale cross attention layer.
Further, a cross attention mechanism algorithm in the multi-scale cross attention layer, comprising:
the method comprises the following steps that a plurality of independent sequences with the same dimension and different scales are used, wherein the first sequence is used for query input, the remaining sequences are respectively subjected to attention calculation with the first sequence, namely the remaining sequences are used for key input and value input during attention calculation; the method specifically comprises the following steps:
one sequence is a sequence a, the other sequence is a sequence b, the sequence a is used for query input, and each key in the sequence b corresponds to a value; matrix multiplication is carried out between the query of the sequence a and the key of the sequence b, and then scaling is carried out to generate an attention score; and (3) normalizing the attention score by using a softmax function to obtain the weight of each key, multiplying the weight matrix by the value of the sequence b to obtain an interactive attention output, wherein the corresponding equation is as follows:
Figure BDA0003891324040000041
the softmax is used for normalizing vectors, namely normalizing similarity, so that a weight matrix after normalization is obtained, and the larger the weight of a certain value in the matrix is, the higher the similarity is. Q a Is the sequence a query vector, K b Is the sequence b key vector, V b Is a vector of values of the sequence b, d k Is the dimension size of the b-key vector of the sequence, K b T Is the transpose of the key vector of sequence b; when the input sequence is X, the sequence X is first converted using linear projectionTo Q x 、K x 、V x They are all linearly transformed from the same input sequence X, represented by the following equation:
Q x =XW Q
K x =XW K
V x =XW V
in the above formula, W Q ,W K ,W V Is a corresponding projection matrix, the value of which is initialized randomly at first, and the final value is obtained by the network self-learning.
Further, the multi-head cross attention layer algorithm comprises:
linearly projecting queries, keys and values of different sequences in the multi-scale cross attention mechanism h times to dk, dk and dv dimensions respectively, wherein dv is the dimension size of a value vector V, and executing the cross attention mechanism in parallel on the projection version of each query, key and value to generate an output value of dv dimension; splicing the output values of the h-time integrated cross attention, and projecting the output values to a linear network again to generate a final value; namely, the mathematical formula corresponding to the multi-head multi-scale cross attention layer is as follows:
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W O
Figure BDA0003891324040000042
wherein Concat is the output head of multiple multi-scale cross attention i Splicing, i takes a positive integer to represent the ith head number, W O Weights, Q, for multiple multi-scale cross-attention stitching a Is the sequence a query vector, K a 、K b 、K c Respectively are sequence a key vector, sequence b key vector, sequence c key vector and V a 、V b 、V c Is a sequence a value vector, a sequence b value vector and a sequence c value vector;
one sequence is a sequence a, the other sequence is the same sequence a, the sequence a is used for query input, and each key in the sequence a corresponds to a valueAt this time, a self-attention mechanism is made,
Figure BDA0003891324040000051
representing the query vector Q at that time a The weight of (a) is determined,
Figure BDA0003891324040000052
represents the key vector K at that time a The weight of (a) is determined,
Figure BDA0003891324040000053
represents the value vector V at that time a The three weights are initialized randomly at first, and the final value is obtained by self learning of the network; one sequence is a sequence a, the other sequence is the same sequence b, the sequence a is used for query input, each key in the sequence b corresponds to a value, the attention mechanism is carried out at the moment,
Figure BDA0003891324040000054
representing the query vector Q at that time a The weight of (a) is determined,
Figure BDA0003891324040000055
represents the key vector K at that time b The weight of (a) is determined,
Figure BDA0003891324040000056
represents the value vector V at that time b The three weights are initialized randomly at first, and the final value is obtained by self learning of the network; one sequence is a sequence a, the other sequence is the same sequence c, the sequence a is used for query input, each key in the sequence c corresponds to a value, the attention mechanism is carried out at the moment,
Figure BDA0003891324040000057
representing the query vector Q at that time a The weight of (a) is determined,
Figure BDA0003891324040000058
represents the key vector K at that time c The weight of (a) is determined,
Figure BDA0003891324040000059
representing the vector of values at that timeV c Three weights are initially randomly initialized, the final value is learned by the network itself, and
Figure BDA00038913240400000510
r represents a set of real numbers, including all rational numbers and irrational numbers, where d k =8;d v Is the dimension size of the value vector V, where d v =8;d model To the output dimension, here d model =64;
The above formula, using h =8 parallel attention layers or heads, for each of which d is used k =d v =d model /h=8。
Further, the feedforward fully-connected layer includes:
two linear transformations with a Relu activation function in between; the mathematical formula corresponding to the forward feedback full connection layer is as follows:
FFN(x)=max(0,xW 1 +b 1 )W 2 +b 2
in the formula, W 1 、W 2 、b 1 And b 2 Respectively feeding back parameters of the full connection layer; max () represents the ReLU activation function.
Compared with the prior art, the RNA modification site prediction method based on the multi-scale cross attention model has the following beneficial effects:
the invention determines the sequence to be predicted as 3 groups of input sequences with different scales, then carries out word embedding coding and position coding in sequence, then sends the processed 3 sequences into a coding module respectively, namely 3 coding blocks connected in series, finally accumulates the values after coding processing, averages the values, and predicts whether the RNA base sequence of the human species contains N or not through a full-connection neural network layer and two classifiers 1 -a methyl adenosine modification site. Wherein, the multiscale cross attention layer includes: while the sequence a carries out self-attention calculation, the sequence a carries out cross attention calculation with the sequence b and the sequence c respectively, and the cross attention means that the first sequence is used as query (query) input and the cross attention means that the first sequence is used as query inputOne sequence is used as key (key) input and value (value) input, attention calculation is carried out, and then output results of 3 kinds of attention are added up to be used as output of a cross attention layer, so that the multi-scale cross attention layer is realized.
Drawings
FIG. 1 is a schematic diagram of a multi-scale cross-attention model-based approach provided in one embodiment;
FIG. 2 is a schematic diagram of a cross-attention mechanism provided in one embodiment;
FIG. 3 is a block diagram of a three-way cross-attention mechanism provided in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The embodiment of the invention provides a method for predicting RNA modification sites of a multi-scale cross attention model, which specifically comprises the following steps:
1) Collecting positive and negative sample data sets: obtaining N of RNA of human species 1 -methyladenosine (N1-methyladenosine, m) 1 A) Modifying site data set including RNA sample sequence of positive and negative data set and corresponding class label, sample length being 41bp (Base Pair, bp), the model requires each sample to take 3 groups of RNA Base sequence with different scale as input sequence, sequence a, sequence b and sequence c are set as set composed of sequence with different scale length xbp, ybp and zbp, so training set and testing set of the model can be expressed as following form
Figure BDA0003891324040000071
Wherein, y n ∈{0,1},
Figure BDA0003891324040000072
Representing the auxiliary sequence (side sequence) with the length of xbp, ybp and zbp of the ith sample in different scales, wherein the auxiliary sequence takes the sequence center as the centerIntercepting sequences with different scales from left to right of the point;
1-1) training set and test set containing N 1 RNA of methyladenosine modification site as positive sample, containing no N 1 RNA at the methyladenosine modification site as negative sample.
Wherein positive and negative samples specify: positive sample sequence (m) 1 A methylation site sequence) is a sequence having a common length (or a common base) around the N1-methyladenosine modification site (base A). The base in the center of the negative sample sequence is also A, but it is not N 1 The methyladenosine modification site, and the negative sample sequence is also a sequence of a common length (or a common base) around the base A. The positive and negative samples have the same common length, and the length is 41bp (Base Pair, bp).
1-2) the sample sequence of the data set takes a common motif A as a center, the front and back value window is 20bp, and when no base exists at certain positions of the sample sequence, the missing base is filled by using a '-' character; can obtain each m 1 A positive/negative samples are made up of 41bp, the training set includes 593 positive samples and 5930 negative samples, and the test set includes 114 positive samples and 1140 negative samples. As shown in table 1:
TABLE 1 statistics of two RNA modification data sets
Figure BDA0003891324040000073
2) Processing samples, taking the positive and negative sample sequences with the common motif A as the center, and taking front and back value windows as bp with different sizes, wherein the front and back value windows can be integral multiples of 5, namely: 10bp,15bp,20bp, so that 21bp,31bp,41bp can be taken, and when a base does not exist in a sample sequence at some positions, the missing base is filled with a '-' character.
3) Feature encoding: a 3 base window is used, and the sliding is performed in a mode of sliding 1 base at a time, and the sliding is performed on each sample sequence until the window is in contact with the tail end of the sequence, so that a dictionary consisting of 105 different subsequences and unique integer sequences is obtained.
And respectively coding RNA sequences by using a CBOW model of word2vec aiming at sample sequences with different scales. Taking a sample with a sample length of 41 bases as an example: the sample has 41 bases, a window with the size of 3 bases is used, the sliding is finished when the window touches the tail end of the sequence in a mode of sliding 1 base each time, therefore, 39 subsequences consisting of 3 bases are obtained, the RNA sequence is coded by using a CBOW model of word2vec, therefore, each subsequence is converted into a word vector representing semantics, and then the obtained word vector is used for converting the length of 41bp in the RNA base sequence into a matrix of 39 x 100, wherein 39 is the number of words in preprocessing, and 100 is the dimension of the word vector. The word2vec model functions with the intent of capturing inter-vocabulary relationships in a high-dimensional space.
4) Introducing position coding: position information is introduced using position embedding because of the correlation between base positions in the sequence.
5) Designing a multi-view classification learning model: through a multi-head cross attention mechanism layer, a plurality of word vector sequences with different scales are learned, and then enter a feed-forward network (FFN). Finally, the values output by the coding module are averaged, and then the N is predicted to be contained in the RNA base sequence of the human species or not through a fully-connected neural network layer and two classifiers 1 -a methyl adenosine modification site.
It should be noted that, the encoding module input = embedded encoding input + position encoding.
The embedded coding input is to map the vector dimension of each word from the word vector dimension to d through a conventional embedding layer model Since it is an additive relationship, the position code here is also d model A vector of dimensions.
The position code is not a single value but a d-dimensional vector (much like a word vector) containing information about a specific position in a sentence, and the code is not integrated into the model, but the vector is used to make each word have information about its position in the sentence. In other words, the model input is enhanced by injecting sequential information of words. Given an input sequence of length m, let s denote the position of the word in the sequence,
Figure BDA0003891324040000081
the vector corresponding to the s position is represented,
Figure BDA0003891324040000082
representing the i-th element, d, in the s-position vector model Is the dimension of the input and output of the coding module, and is also the dimension of the position coding.
Figure BDA0003891324040000083
Is to generate a position vector
Figure BDA0003891324040000084
Is defined as follows:
Figure BDA0003891324040000091
wherein
Figure BDA0003891324040000092
Handle d model The vectors of the dimension are grouped in pairs, each group is a sin and a cos, and the two functions share the same frequency omega k In total, have d model Group/2, since we start numbering from 0, the last group number is d model /2-1.sin and cos function wavelength (from ω) i Decision) increases from 2 pi to 2 pi x 10000.
The specific description of the above steps is as follows:
the data set sample sequence takes a common motif A as a center, the front and back value windows are bp with different sizes, taking 3 differences of x1bp, y1bp and z1bp as an example, namely each m is 1 A positive/negative sample consists of xbp, ybp, zbp, and when there is no base in some position in the sample sequence, the missing base is filled with a '-' character, and the patent assumes x1=10, y1=15, z1=20, and thus, x =21,y =31,z =41; the base sequence is firstly subjected to word2vec word embedded coding, then directly subjected to position coding, and then passes through a coding module which is composed of 3 codesEach coding block comprises a multi-head cross attention mechanism Layer and a forward feedback propagation network, and each Layer is connected through a Residual Connection (Residual Connection) and a standardization Layer (Layer Normalization), the Residual Connection is used for preventing the network from degrading, and the problem of gradient disappearance can be avoided. The normalization layer is used to normalize the activation values of each layer. As shown in fig. 1.
Wherein the dimension d of the input and output of the coding module model =64, multiple number h =8, forward feedback network dimension d _ ff =256, dropout =0.1, dropout means that, in the training process of the deep learning network, for a neural network unit, the neural network unit is temporarily discarded from the network according to a certain probability.
The input is the base sequence with label, the sentence input from the encoder will first pass through a multi-head cross attention layer, the multi-head representation is executed many times. The sequence a, the sequence b and the sequence c are respectively a set consisting of sequences with different scales of length xbp, ybp and zbp, so that the cross attention in a coding block means that the first sequence is used as query (query) input, and the other sequence is used as key (key) input and value (value) input for performing attention calculation, as shown in fig. 2. Finally, the output results of the 3 attentions are added up as the output of the cross-attention layer, as shown in fig. 3.
The cross-attention input is two different sequences, one of which is used as a query input and the other as a key and value input.
A specific algorithm of the cross-attention mechanism is shown in fig. 2, where one sequence is a sequence a, the other sequence is a sequence b, the sequence a is used as a query input (query), and in order to capture each value (value) in the sequence b, each key (key) in the sequence b is required to correspond to the value (value).
Matrix multiplication (MatMul) is firstly carried out between the query (query) of the sequence a and the key (key) of the sequence b, then scaling (Scale) is carried out, an attention score can be obtained, a softmax function is used for carrying out normalization processing on the attention score to obtain the weight of each key, and the value of the weight matrix multiplication sequence b is output with interactive attention, wherein the corresponding equation is as follows:
Figure BDA0003891324040000101
the softmax in the formula is used for normalizing vectors, namely normalizing similarity, so that a weight matrix after normalization is obtained, and the larger the weight of a certain value in the matrix is, the higher the similarity is. Q a Is the sequence a query vector (query vector), K b Is the sequence b key vector, V b Is the sequence b Value Vector (Value Vector), d k Is K b Dimension of (A), K b T Is the transpose of the key vector of sequence b. Taking the input sequence X as an example, the sequence X is first converted to Q using linear projection x 、K x 、V x They are all linearly transformed from the same input sequence X, represented by the following equation:
Q x =XW Q
K x =XW K
V x =XW V
in the above formula, W Q ,W K ,W V Is a corresponding projection matrix, the value of which is initialized randomly at first, and the final value is obtained by the network self-learning. Input sequences X and W Q Q is obtained by sequential multiplication x Obtaining K by the same method x ,V x
As shown in fig. 3, the output results of the 3 attention layers are added to realize the role of the cross-attention layer, different sequences of queries, keys and values in the above multi-scale cross-attention mechanism are linearly projected h times onto the dk, dk and dv dimensions, respectively, where dv is the dimension of the value vector V, and then the cross-attention mechanism is executed in parallel on the projected versions of each query, key and value, respectively, to generate the output value of the dv dimension. Splicing the output values of the h-time integrated cross attention, and projecting the output values to a linear network again to generate a final value, wherein the corresponding mathematical formula form of the multi-head multi-scale cross attention layer is as follows:
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W O
Figure BDA0003891324040000111
wherein Concat is the output head of multiple multi-scale cross-attention i Splicing, i takes a positive integer to represent a specific ith head number, W O Weights, Q, for multiple multi-scale cross-attention stitching a Is the sequence a query vector, K a 、K b 、K c Respectively are sequence a key vector, sequence b key vector, sequence c key vector and V a 、V b 、V c Is a sequence a value vector, a sequence b value vector, a sequence c value vector.
One sequence is a sequence a, the other sequence is the same sequence a, the sequence a is used for query input, each key in the sequence a corresponds to a value, a self-attention mechanism is carried out at the moment,
Figure BDA0003891324040000112
representing the query vector Q at that time a The weight of (a) is determined,
Figure BDA0003891324040000113
represents the key vector K at that time a The weight of (a) is determined,
Figure BDA0003891324040000114
represents the value vector V at that time a Three weights are initially randomly initialized, and the final value is obtained by the network learning itself. One sequence is sequence a, the other sequence is sequence b, the sequence a is used as query input, each key in the sequence b corresponds to a value, a cross attention mechanism is performed at the moment,
Figure BDA0003891324040000115
representing the query vector Q at that time a The weight of (a) is determined,
Figure BDA0003891324040000116
represents the key vector K at that time b The weight of (a) is determined,
Figure BDA0003891324040000117
represents the value vector V at that time b Three weights are initially randomly initialized, and the final value is obtained by the network learning itself. One sequence is a sequence a, the other sequence is a sequence c, the sequence a is used as query input, each key in the sequence c corresponds to a value, a cross attention mechanism is performed at the moment,
Figure BDA0003891324040000118
representing the query vector Q at that time a The weight of (a) is determined,
Figure BDA0003891324040000119
represents the key vector K at that time c The weight of (a) is calculated,
Figure BDA00038913240400001110
represents the value vector V at that time c Three weights are initially randomly initialized, and the final value is obtained by the network learning itself. Wherein the content of the first and second substances,
Figure BDA00038913240400001111
r represents a set real number set, and the real number set is a set containing all rational numbers and irrational numbers; d k Is the dimension of the key vector K, where d k =8;d v Is the dimension size of the value vector V, where d v =8;d model To the output dimension, here d model =64。
The above formula, using h =8 parallel attention layers or heads. For each of these, we use d k =d v =d model /h=8。
Then, residual Connection (Residual Connection) and a standardization Layer (Layer standardization) are carried out, the Residual Connection is used for preventing the network from degrading, and the problem of gradient disappearance can be avoided. The normalization layer is used to normalize the activation values of each layer.
And further entering a forward feedback full connection layer, comprising:
two linear transformations, the middle of which is provided with a Relu activation function; namely, the form of the mathematical formula corresponding to the feedforward fully-connected layer is as follows, where max represents the ReLU activation function.
FFN(x)=max(0,xW 1 +b 1 )W 2 +b 2
W in the formula 1 、W 2 、b 1 And b 2 Respectively, the parameters of the feedback full connection layer.
It should be noted that, the function of the feed-forward full connection layer is as follows: the pure multi-head attention mechanism is not enough to extract ideal features, so that the full connection layer is added to improve the network capacity.
And then, residual Connection (Residual Connection) and a Normalization Layer (Layer Normalization) are carried out, wherein the Residual Connection is used for preventing network degradation, and the problem of gradient disappearance can be avoided. The normalization layer is used to normalize the activation values of each layer.
Further, averaging the values output by the coding module, and then predicting whether the RNA base sequence of the human species contains N or not through a full-connection neural network layer and two classifiers 1 -a methyladenosine modification site.
In the embodiment of the invention, the validity of the model is verified in a 5-fold mode by utilizing a training set:
TABLE 2 training set 5-fold prediction results
Classifiers AUROC ACC Sen Precision MCC Spe F-1 AUPRC
BiLSTM 0.9242 94.11 55.14 73.48 60.61 98.01 63.00 0.701
CNN 0.8982 93.16 62.39 62.39 5863 96.24 62.39 0.6707
BiLSTM+Attlayer 0.9270 94.28 57.84 73.61 62.25 97.93 64.78 0.7069
CNN+Attlayer 0.9026 93.34 60.54 64.22 58.71 96.63 62.33 0.6644
MSCA(21-31-41) 0.9465 94.10 61.6 72.64 63.71 97.54 66.67 0.75
MSCA (21-31-41) represents a model using a multiscale cross-attention Model (MSCA) with samples of 21bp,31bp,41bp in sequence length as input.
Consider that the test set positive and negative samples are 1:10, belonging to an unbalanced sample set, so the performance is compared by the area under the exact recall curve (aucrc), as shown in table 1, the area under the exact recall curve (aucrc) of the multiscale cross-Attention Model (MSCA) is much higher than that of the model by the bilst classification model (Bi-direct Long near-Term Memory, bilst), CNN (conditional Neural Network, CNN), bilst + atteyer (bilst Layer + bahdauadauattentionlayer), CNN + atteyer (conditional Neural Network Layer + bahdauadauattentionlayer).
Otherwise, MSCA is also higher than other known excellent classifications when comparing key indicators such as accuracy ACC.
In the embodiment of the invention, the validity of the model is verified by using the test set:
table 3 independent data set evaluation
Classifiers AUROC ACC Sen Precision MCC Spe F-1 AUPRC
BiLSTM 0.9038 94.25 55.26 75.0 61.43 93.30 63.63 0.7253
CNN 0.9106 94.73 58.77 77.90 64.95 93.14 67.00 0.7066
BiLSTM+Attlayer 0.9276 94.97 54.38 84.93 65.58 94.17 66.31 0.7642
CNN+Attlayer 0.9207 94.89 63.15 76.59 66.84 92.50 69.22 0.7538
MSCA(21-31-41) 0.9405 95.21 62.28 80.68 68.41 98.51 70.30 0.7751
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method for predicting RNA modification sites based on a multi-scale cross attention model is characterized by comprising the following steps:
to contain N 1 RNA base sequence of the methyladenosine modification site as a positive sample and containing no N 1 -the RNA base sequence of the methyladenosine modification site is a negative sample, and 3 groups of RNA base sequences with different scales are taken as input sequences of each sample;
sequentially carrying out word2vec word embedded coding and position coding on the 3 groups of input sequences;
inputting the 3 groups of coded sequences into a coding module to obtain a characteristic matrix; wherein the encoding module comprises: a plurality of coding blocks connected in series in sequence; the coding block includes: a multi-head cross attention layer and a forward feedback full connection layer, and each layer is connected with a standardization layer through a residual error;
averaging the output results of the coding modules, and predicting whether the RNA base sequence of the human species contains N or not through a full-connection neural network layer and two classifiers 1 -a methyl adenosine modification site.
2. The method for predicting the RNA modification site based on the multi-scale cross-attention model of claim 1, further comprising: constructing a data set;
the data set includes: the RNA base sequence is a positive sample, the RNA base sequence is a negative sample and a category label, and the sample length is 41bp; the input sequence is set as a sequence a, a sequence b and a sequence c which are respectively a set consisting of sequences with different scales of length xbp, ybp and zbp;
the training set and the test set of the data set are represented as:
Figure FDA0003891324030000011
wherein, y n ∈{0,1},
Figure FDA0003891324030000012
Respectively representing auxiliary sequences with different scales of xbp, ybp and zbp, wherein the auxiliary sequences take the center of the sequence as the central point and intercept the sequences with different scales.
3. The method for predicting RNA modification sites based on the multi-scale cross attention model of claim 2, wherein each sample takes 3 sets of RNA base sequences with different scales as input sequences, and comprises the following steps:
the sample sequence in the data set is centered on the common motif A, the front and back value windows are bp with different sizes, taking 3 different x1bp, y1bp and z1bp as an example, namely, each m is 1 A positive sample/negative sample is composed of xbp, ybp and zbp, and when no base exists in some positions of the sample sequence, the missing base is filled with a '-' character; let x1=10, y1=15, and z1=20, so x =21, y =31, z =41.
4. The method for predicting the RNA modification sites based on the multi-scale cross attention model as claimed in claim 1, wherein the word2vec word embedded code specifically comprises:
sliding on each sample sequence by using a window with the size of 3 bases in a mode of sliding 1 base at a time until the window is in contact with the tail end of the sequence, thereby obtaining a dictionary consisting of 105 different subsequences and a unique integer sequence;
respectively encoding RNA sequences by using a CBOW model of word2vec aiming at sample sequences with different scales; for 41-base samples, a window with the size of 3 bases is used, the sliding is performed in a mode of sliding 1 base each time, the sliding is performed on each sample sequence until the window is contacted with the tail end of the sequence, therefore, 39 subsequences composed of 3 bases are obtained, the RNA sequence is coded by using a CBOW model of word2vec, therefore, each subsequence is converted into a word vector representing semantics, and the obtained word vector is used for converting the length of 41bp in the RNA base sequence into a matrix of 39 × 100, wherein 39 is the number of words in preprocessing, and 100 is the dimension of the word vector.
5. The multi-view classification model multi-scale cross-attention model-based RNA modification site prediction method of claim 1, wherein the coding module comprises: 3 coding blocks which are connected in series in sequence.
6. The method for predicting the RNA modification site based on the multi-scale cross-attention model of claim 1, wherein the coding module comprises:
dimension d of its output model =64, multiple number of heads h =8, feed forward network dimension d _ ff =256, and the probability of temporary dropping from the network is dropout =0.1.
7. The method for predicting RNA modification sites based on the multi-scale cross attention model of claim 1, wherein the multi-scale cross attention layer comprises:
when the sequence a carries out self-attention calculation, the sequence a carries out cross attention calculation with the sequence b and the sequence c respectively, wherein the cross attention means that the first sequence is used as query input, and the other sequence is used as key input and value input for carrying out attention calculation; and adding the output results of the 3 kinds of attention as the output of the cross attention layer to realize the multi-scale cross attention layer.
8. The method of claim 7, wherein the RNA modification site prediction method based on the multi-scale cross-attention model,
a cross attention mechanism algorithm in the multi-scale cross attention layer, comprising: the method comprises the following steps that a plurality of independent sequences with the same dimension and different scales are used, wherein the first sequence is used for query input, the remaining sequences are respectively subjected to attention calculation with the first sequence, namely the remaining sequences are used for key input and value input during attention calculation; the method specifically comprises the following steps:
one sequence is a sequence a, the other sequence is a sequence b, the sequence a is used for query input, and each key in the sequence b corresponds to a value; matrix multiplication is carried out between the query of the sequence a and the key of the sequence b, and then scaling is carried out to generate an attention score; and (3) normalizing the attention score by using a softmax function to obtain the weight of each key, multiplying the weight matrix by the value of the sequence b to obtain an interactive attention output, wherein the corresponding equation is as follows:
Figure FDA0003891324030000031
the softmax is used for normalizing vectors, namely normalizing similarity to obtain a weight matrix after normalization, wherein the larger the weight of a certain value in the matrix is, the higher the similarity is; q a Is the sequence a query vector, K b Is the sequence b key vector, V b Is a vector of values of the sequence b, d k Is the dimension size of the b-key vector of the sequence, K b T Is the transpose of the key vector of sequence b; when the input sequence is X, first the sequence X is converted to Q using linear projection x 、K x 、V x They are all linearly transformed from the same input sequence X, represented by the following equation:
Q x =XW Q
K x =XW K
V x =XW V
in the above formula, W Q ,W K ,W V Is a corresponding projection matrix, the value of which is initialized randomly at first, and the final value is obtained by the network self-learning.
9. The method for predicting the RNA modification site based on the multi-scale cross attention model of claim 8, wherein the algorithm of the multi-scale cross attention layer comprises:
linearly projecting queries, keys and values of different sequences in the multi-scale cross attention mechanism h times to dk, dk and dv dimensions respectively, wherein dv is the dimension size of a value vector V, and executing the cross attention mechanism in parallel on the projection version of each query, key and value to generate an output value of dv dimension; splicing the output values of the h-time integrated cross attention, and projecting the output values to a linear network again to generate a final value; namely, the mathematical formula corresponding to the multi-head multi-scale cross attention layer is as follows:
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W O
Figure FDA0003891324030000041
wherein Concat is the output head of multiple multi-scale cross-attention i Splicing, i takes a positive integer to represent the ith head number, W O Weights, Q, for multiple multi-scale cross-attention stitching a Is the sequence a query vector, K a 、K b 、K c Respectively a sequence a key vector, a sequence b key vector, a sequence c key vector and V a 、V b 、V c Is a sequence a value vector, a sequence b value vector and a sequence c value vector;
one sequence is a sequence a, the other sequence is the same sequence a, the sequence a is used for query input, each key in the sequence a corresponds to a value, a self-attention mechanism is carried out at the moment,
Figure FDA0003891324030000042
representing the query vector Q at that time a The weight of (a) is determined,
Figure FDA0003891324030000043
represents the key vector K at that time a The weight of (a) is determined,
Figure FDA0003891324030000044
represents the value vector V at that time a The three weights are initialized randomly at first, and the final value is obtained by the learning of the network; one sequence is a sequence a, the other sequence is the same sequence b, the sequence a is used for query input, each key in the sequence b corresponds to a value, the attention mechanism is carried out at the moment,
Figure FDA0003891324030000045
representing the query vector Q at that time a The weight of (a) is determined,
Figure FDA0003891324030000046
represents the key vector K at that time b The weight of (a) is calculated,
Figure FDA0003891324030000047
represents the value vector V at that time b The three weights are initialized randomly at first, and the final value is obtained by self learning of the network; one sequence is a sequence a, the other sequence is the same sequence c, the sequence a is used for query input, each key in the sequence c corresponds to a value, the attention mechanism is carried out at the moment,
Figure FDA0003891324030000048
representing the query vector Q at that time a The weight of (a) is determined,
Figure FDA0003891324030000049
represents the key vector K at that time c The weight of (a) is determined,
Figure FDA00038913240300000410
represents the value vector V at that time c Three weights are initially randomly initialized, the final value is learned by the network itself, and
Figure FDA00038913240300000411
r is a set of real numbers representing the set, the set of real numbers including all rational numbers and irrational numbers, where d k =8;d v Is the dimension size of the value vector V, where d v =8;d model To the output dimension, here d model =64;
The above formula, using h =8 parallel attention layers or heads, for each of which d is used k =d v =d model /h=8。
10. The method for predicting the RNA modification site based on the multi-scale cross attention model of claim 1, wherein the forward feedback full-link layer comprises:
two linear transformations with a Relu activation function in between; the mathematical formula corresponding to the forward feedback full connection layer is in the form as follows:
FFN(x)=max(0,xW 1 +b 1 )W 2 +b 2
in the formula, W 1 、W 2 、b 1 And b 2 Respectively feeding back parameters of the full connection layer; max () represents the ReLU activation function.
CN202211260393.0A 2022-10-14 2022-10-14 RNA modification site prediction method based on multiscale cross attention model Active CN115662508B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211260393.0A CN115662508B (en) 2022-10-14 2022-10-14 RNA modification site prediction method based on multiscale cross attention model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211260393.0A CN115662508B (en) 2022-10-14 2022-10-14 RNA modification site prediction method based on multiscale cross attention model

Publications (2)

Publication Number Publication Date
CN115662508A true CN115662508A (en) 2023-01-31
CN115662508B CN115662508B (en) 2024-03-12

Family

ID=84986550

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211260393.0A Active CN115662508B (en) 2022-10-14 2022-10-14 RNA modification site prediction method based on multiscale cross attention model

Country Status (1)

Country Link
CN (1) CN115662508B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210279576A1 (en) * 2020-03-03 2021-09-09 Google Llc Attention neural networks with talking heads attention
CN114023376A (en) * 2021-11-02 2022-02-08 四川大学 RNA-protein binding site prediction method and system based on self-attention mechanism
CN114464249A (en) * 2021-12-31 2022-05-10 北京工业大学 Ribonucleic acid-protein site recognition method based on self-attention convolution
KR102405030B1 (en) * 2021-11-23 2022-06-07 주식회사 쓰리빌리언 System and method for predicting pathogenicity of genetic variant using explainable ai
CN115116543A (en) * 2022-04-18 2022-09-27 腾讯科技(深圳)有限公司 Antigen-antibody binding site determination method, device, equipment and storage medium
US20220310070A1 (en) * 2021-03-26 2022-09-29 Mitsubishi Electric Research Laboratories, Inc. Artificial Intelligence System for Capturing Context by Dilated Self-Attention

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210279576A1 (en) * 2020-03-03 2021-09-09 Google Llc Attention neural networks with talking heads attention
US20220310070A1 (en) * 2021-03-26 2022-09-29 Mitsubishi Electric Research Laboratories, Inc. Artificial Intelligence System for Capturing Context by Dilated Self-Attention
CN114023376A (en) * 2021-11-02 2022-02-08 四川大学 RNA-protein binding site prediction method and system based on self-attention mechanism
KR102405030B1 (en) * 2021-11-23 2022-06-07 주식회사 쓰리빌리언 System and method for predicting pathogenicity of genetic variant using explainable ai
CN114464249A (en) * 2021-12-31 2022-05-10 北京工业大学 Ribonucleic acid-protein site recognition method based on self-attention convolution
CN115116543A (en) * 2022-04-18 2022-09-27 腾讯科技(深圳)有限公司 Antigen-antibody binding site determination method, device, equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HONGLEI WANG 等: "EMDLP: Ensemble multiscale deep learning model for RNA methylation site prediction", BMC BIOINFORMATICS, pages 8 - 19 *
李国斌;杜秀全;李新路;吴志泽;: "基于卷积神经网络的基因剪接位点预测", 盐城工学院学报(自然科学版), no. 02, pages 24 - 28 *
李正光等: "融合交叉自注意力和预训练模型的文本语义相似性评估方法", 数学的实践与认识, vol. 52, no. 7, pages 166 - 167 *

Also Published As

Publication number Publication date
CN115662508B (en) 2024-03-12

Similar Documents

Publication Publication Date Title
CN109376242B (en) Text classification method based on cyclic neural network variant and convolutional neural network
CN110196906B (en) Deep learning text similarity detection method oriented to financial industry
CN111680494B (en) Similar text generation method and device
CN111400494B (en) Emotion analysis method based on GCN-Attention
CN111966812B (en) Automatic question answering method based on dynamic word vector and storage medium
CN111475655B (en) Power distribution network knowledge graph-based power scheduling text entity linking method
CN112667818A (en) GCN and multi-granularity attention fused user comment sentiment analysis method and system
CN113407660B (en) Unstructured text event extraction method
CN110516070B (en) Chinese question classification method based on text error correction and neural network
CN113641819B (en) Argumentation mining system and method based on multitasking sparse sharing learning
CN112232087A (en) Transformer-based specific aspect emotion analysis method of multi-granularity attention model
CN112101009A (en) Knowledge graph-based method for judging similarity of people relationship frame of dream of Red mansions
CN113704437A (en) Knowledge base question-answering method integrating multi-head attention mechanism and relative position coding
CN114528835A (en) Semi-supervised specialized term extraction method, medium and equipment based on interval discrimination
CN112148997A (en) Multi-modal confrontation model training method and device for disaster event detection
CN112100212A (en) Case scenario extraction method based on machine learning and rule matching
CN111651993A (en) Chinese named entity recognition method fusing local-global character level association features
CN114299512A (en) Zero-sample small seal character recognition method based on Chinese character etymon structure
CN113392191B (en) Text matching method and device based on multi-dimensional semantic joint learning
CN111680529A (en) Machine translation algorithm and device based on layer aggregation
CN115424663B (en) RNA modification site prediction method based on attention bidirectional expression model
CN113806543A (en) Residual jump connection-based text classification method for gated cyclic unit
CN111723572B (en) Chinese short text correlation measurement method based on CNN convolutional layer and BilSTM
CN112559741A (en) Nuclear power equipment defect recording text classification method, system, medium and electronic equipment
CN117271701A (en) Method and system for extracting system operation abnormal event relation based on TGGAT and CNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant