CN115662508A

CN115662508A - RNA modification site prediction method based on multi-scale cross attention model

Info

Publication number: CN115662508A
Application number: CN202211260393.0A
Authority: CN
Inventors: 王鸿磊; 张�林; 刘辉; 张雪松; 王栋; 曾文亮
Original assignee: Xuzhou College of Industrial Technology
Current assignee: Xuzhou College of Industrial Technology
Priority date: 2022-10-14
Filing date: 2022-10-14
Publication date: 2023-01-31
Anticipated expiration: 2042-10-14
Also published as: CN115662508B

Abstract

The invention discloses a multi-scale cross attention model-based RNA modification site prediction method, and relates to the field of post-RNA transcription modification site prediction in bioinformatics. The method comprises the following steps: to contain N ¹ RNA base sequence of the methyladenosine modification site as a positive sample and containing no N ¹ The RNA base sequence of the methyl adenosine modification site is a negative sample, and 3 groups of RNA base sequences with different scales are taken as input sequences in each sample; carrying out word embedding coding and position coding on the 3 groups of input sequences; inputting the encoded 3 groups of sequences into an encoding module, wherein the encoding module comprises a multi-head cross attention layer and a forward feedback full connection layer, averaging output results, and then carrying out full connectionConnecting the neural network layer and two classifiers to predict whether the RNA base sequence of human species contains N ¹ -a methyl adenosine modification site. The method can describe the context relation of complex words, and enhance the influence of important words in the text on the methylation site prediction, so that the methylation site can be accurately predicted.

Description

RNA modification site prediction method based on multi-scale cross attention model

Technical Field

The invention relates to the field of prediction of post-transcriptional modification sites of RNA in bioinformatics, in particular to a multi-scale cross attention model-based N in RNA ¹ -a methyladenosine modification site prediction method.

Background

Studies have shown that epigenomic transcriptome regulation by post-transcriptional RNA modification is essential for all kinds of RNA, and therefore, accurate recognition of RNA modifications is crucial for understanding its purpose and the regulatory mechanisms.

The traditional RNA modification site recognition experiment method is relatively complex, time-consuming and labor-consuming. The machine learning method is already applied to the calculation process of RNA sequence feature extraction and classification, and can more effectively supplement the experimental method. In recent years, convolutional Neural Networks (CNNs) and Long-term memory (LSTM) have achieved significant success in modifying site prediction due to their powerful functions in characterization learning.

However, convolutional Neural Networks (CNNs) can learn local responses from spatial data, but cannot learn sequence correlations; long-term memory (LSTM) is dedicated to sequence modeling and can access context representations simultaneously, but lacks spatial data extraction compared to CNN. For the above reasons, the motivation for constructing a prediction framework using Natural Language Processing (NLP) and other deep learning (deep learning, DL) is strong.

In the prior art, when a prediction framework is constructed, although an attention mechanism is used, important characteristics of sentence context can be concerned, information interaction is lacked among single attention sequences, and the context relationship of complex terms is difficult to describe; and does not adequately relate to context, enhancing the impact of important words in the text on methylation site prediction.

Disclosure of Invention

Based on this, it is necessary to provide a method for predicting RNA modification sites based on a multi-scale cross-attention model in order to solve the above technical problems.

The embodiment of the invention provides a multi-scale cross attention model-based RNA modification site prediction method, which comprises the following steps:

to contain N ¹ RNA base sequence of the methyladenosine modification site as a positive sample and containing no N ¹ The RNA base sequence of the methyl adenosine modification site is a negative sample, and 3 groups of RNA base sequences with different scales are taken as input sequences in each sample;

sequentially carrying out word2vec word embedded coding and position coding on the 3 groups of input sequences;

obtaining a characteristic sequence from the coded 3 groups of sequence coding modules; wherein the encoding module comprises: a plurality of coding blocks connected in series in sequence; the coding block includes: a multi-head cross attention layer and a forward feedback full connection layer, and each layer is connected with a standardization layer through a residual error;

averaging the output results of the coding modules, and predicting whether the RNA base sequence of the human species contains N or not through a fully-connected neural network layer and two classifiers ¹ -a methyl adenosine modification site.

Further, constructing a data set; the data set includes: the RNA base sequence is a positive sample, the RNA base sequence is a negative sample and a category label, and the sample length is 41bp; the input sequence is set as a sequence a, a sequence b and a sequence c which are respectively a set consisting of sequences with different scales of length xbp, ybp and zbp;

the training set and the test set of the data set are represented as:

wherein, y _n ∈{0,1},

Respectively representing auxiliary sequences with different scales of xbp, ybp and zbp, wherein the auxiliary sequences take the center of the sequence as the central point and intercept the sequences with different scales.

Further, each sample takes 3 sets of RNA base sequences of different sizes as input sequences, including:

the sample sequence in the data set is centered on the common motif A, the front and back sampling windows are bp with different sizes, taking 3 differences of x1bp, y1bp and z1bp as an example, namely, each m is ¹ A positive sample/negative sample is composed of xbp, ybp and zbp, and when no base exists in some positions of the sample sequence, the missing base is filled with a '-' character; let x1=10, y1=15, and z1=20, so x =21, y =31, z =41.

Further, the word2vec word embedded code specifically includes:

sliding on each sample sequence by using a window with the size of 3 bases in a mode of sliding 1 base at a time until the window is in contact with the tail end of the sequence, thereby obtaining a dictionary consisting of 105 different subsequences and a unique integer sequence;

respectively encoding RNA sequences by using a CBOW model of word2vec aiming at sample sequences with different scales; for 41-base samples, a window with the size of 3 bases is used, the sliding is performed in a mode of sliding 1 base each time, the sliding is performed on each sample sequence until the window is contacted with the tail end of the sequence, therefore, 39 subsequences composed of 3 bases are obtained, the RNA sequence is coded by using a CBOW model of word2vec, therefore, each subsequence is converted into a word vector representing semantics, and the obtained word vector is used for converting the length of 41bp in the RNA base sequence into a matrix of 39 × 100, wherein 39 is the number of words in preprocessing, and 100 is the dimension of the word vector.

Further, the encoding module includes: 3 coding blocks are sequentially connected in series.

Further, the encoding module includes:

dimension d of model output _model =64, multiple head number h =8, feed forward network dimension d _ ff =256, and the probability of temporary drop from the network is dropout＝0.1。

Further, the multi-scale cross-attention layer comprises:

when the sequence a carries out self-attention calculation, the sequence a carries out cross attention calculation with the sequence b and the sequence c respectively, wherein the cross attention means that the first sequence is used for query input, and the other sequence is used for key input and value input to carry out attention calculation; and adding the output results of the 3 kinds of attention as the output of the cross attention layer to realize the multi-scale cross attention layer.

Further, a cross attention mechanism algorithm in the multi-scale cross attention layer, comprising:

the method comprises the following steps that a plurality of independent sequences with the same dimension and different scales are used, wherein the first sequence is used for query input, the remaining sequences are respectively subjected to attention calculation with the first sequence, namely the remaining sequences are used for key input and value input during attention calculation; the method specifically comprises the following steps:

one sequence is a sequence a, the other sequence is a sequence b, the sequence a is used for query input, and each key in the sequence b corresponds to a value; matrix multiplication is carried out between the query of the sequence a and the key of the sequence b, and then scaling is carried out to generate an attention score; and (3) normalizing the attention score by using a softmax function to obtain the weight of each key, multiplying the weight matrix by the value of the sequence b to obtain an interactive attention output, wherein the corresponding equation is as follows:

the softmax is used for normalizing vectors, namely normalizing similarity, so that a weight matrix after normalization is obtained, and the larger the weight of a certain value in the matrix is, the higher the similarity is. Q _a Is the sequence a query vector, K _b Is the sequence b key vector, V _b Is a vector of values of the sequence b, d _k Is the dimension size of the b-key vector of the sequence, K _b ^T Is the transpose of the key vector of sequence b; when the input sequence is X, the sequence X is first converted using linear projectionTo Q _x 、K _x 、V _x They are all linearly transformed from the same input sequence X, represented by the following equation:

Q _x ＝XW ^Q

K _x ＝XW ^K

V _x ＝XW ^V

in the above formula, W ^Q ,W ^K ,W ^V Is a corresponding projection matrix, the value of which is initialized randomly at first, and the final value is obtained by the network self-learning.

Further, the multi-head cross attention layer algorithm comprises:

linearly projecting queries, keys and values of different sequences in the multi-scale cross attention mechanism h times to dk, dk and dv dimensions respectively, wherein dv is the dimension size of a value vector V, and executing the cross attention mechanism in parallel on the projection version of each query, key and value to generate an output value of dv dimension; splicing the output values of the h-time integrated cross attention, and projecting the output values to a linear network again to generate a final value; namely, the mathematical formula corresponding to the multi-head multi-scale cross attention layer is as follows:

MultiHead(Q,K,V)＝Concat(head ₁ ,...,head _h )W ^O

wherein Concat is the output head of multiple multi-scale cross attention _i Splicing, i takes a positive integer to represent the ith head number, W ^O Weights, Q, for multiple multi-scale cross-attention stitching _a Is the sequence a query vector, K _a 、K _b 、K _c Respectively are sequence a key vector, sequence b key vector, sequence c key vector and V _a 、V _b 、V _c Is a sequence a value vector, a sequence b value vector and a sequence c value vector;

one sequence is a sequence a, the other sequence is the same sequence a, the sequence a is used for query input, and each key in the sequence a corresponds to a valueAt this time, a self-attention mechanism is made,

representing the query vector Q at that time _a The weight of (a) is determined,

represents the key vector K at that time _a The weight of (a) is determined,

represents the value vector V at that time _a The three weights are initialized randomly at first, and the final value is obtained by self learning of the network; one sequence is a sequence a, the other sequence is the same sequence b, the sequence a is used for query input, each key in the sequence b corresponds to a value, the attention mechanism is carried out at the moment,

represents the key vector K at that time _b The weight of (a) is determined,

represents the value vector V at that time _b The three weights are initialized randomly at first, and the final value is obtained by self learning of the network; one sequence is a sequence a, the other sequence is the same sequence c, the sequence a is used for query input, each key in the sequence c corresponds to a value, the attention mechanism is carried out at the moment,

represents the key vector K at that time _c The weight of (a) is determined,

representing the vector of values at that timeV _c Three weights are initially randomly initialized, the final value is learned by the network itself, and

r represents a set of real numbers, including all rational numbers and irrational numbers, where d _k ＝8；d _v Is the dimension size of the value vector V, where d _v ＝8；d _model To the output dimension, here d _model ＝64；

The above formula, using h =8 parallel attention layers or heads, for each of which d is used _k ＝d _v ＝d _model /h＝8。

Further, the feedforward fully-connected layer includes:

two linear transformations with a Relu activation function in between; the mathematical formula corresponding to the forward feedback full connection layer is as follows:

FFN(x)＝max(0,xW ₁ +b ₁ )W ₂ +b ₂

in the formula, W ₁ 、W ₂ 、b ₁ And b ₂ Respectively feeding back parameters of the full connection layer; max () represents the ReLU activation function.

Compared with the prior art, the RNA modification site prediction method based on the multi-scale cross attention model has the following beneficial effects:

the invention determines the sequence to be predicted as 3 groups of input sequences with different scales, then carries out word embedding coding and position coding in sequence, then sends the processed 3 sequences into a coding module respectively, namely 3 coding blocks connected in series, finally accumulates the values after coding processing, averages the values, and predicts whether the RNA base sequence of the human species contains N or not through a full-connection neural network layer and two classifiers ¹ -a methyl adenosine modification site. Wherein, the multiscale cross attention layer includes: while the sequence a carries out self-attention calculation, the sequence a carries out cross attention calculation with the sequence b and the sequence c respectively, and the cross attention means that the first sequence is used as query (query) input and the cross attention means that the first sequence is used as query inputOne sequence is used as key (key) input and value (value) input, attention calculation is carried out, and then output results of 3 kinds of attention are added up to be used as output of a cross attention layer, so that the multi-scale cross attention layer is realized.

Drawings

FIG. 1 is a schematic diagram of a multi-scale cross-attention model-based approach provided in one embodiment;

FIG. 2 is a schematic diagram of a cross-attention mechanism provided in one embodiment;

FIG. 3 is a block diagram of a three-way cross-attention mechanism provided in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The embodiment of the invention provides a method for predicting RNA modification sites of a multi-scale cross attention model, which specifically comprises the following steps:

1) Collecting positive and negative sample data sets: obtaining N of RNA of human species ¹ -methyladenosine (N1-methyladenosine, m) ¹ A) Modifying site data set including RNA sample sequence of positive and negative data set and corresponding class label, sample length being 41bp (Base Pair, bp), the model requires each sample to take 3 groups of RNA Base sequence with different scale as input sequence, sequence a, sequence b and sequence c are set as set composed of sequence with different scale length xbp, ybp and zbp, so training set and testing set of the model can be expressed as following form

Wherein, y _n ∈{0,1},

Representing the auxiliary sequence (side sequence) with the length of xbp, ybp and zbp of the ith sample in different scales, wherein the auxiliary sequence takes the sequence center as the centerIntercepting sequences with different scales from left to right of the point;

1-1) training set and test set containing N ¹ RNA of methyladenosine modification site as positive sample, containing no N ¹ RNA at the methyladenosine modification site as negative sample.

Wherein positive and negative samples specify: positive sample sequence (m) ¹ A methylation site sequence) is a sequence having a common length (or a common base) around the N1-methyladenosine modification site (base A). The base in the center of the negative sample sequence is also A, but it is not N ¹ The methyladenosine modification site, and the negative sample sequence is also a sequence of a common length (or a common base) around the base A. The positive and negative samples have the same common length, and the length is 41bp (Base Pair, bp).

1-2) the sample sequence of the data set takes a common motif A as a center, the front and back value window is 20bp, and when no base exists at certain positions of the sample sequence, the missing base is filled by using a '-' character; can obtain each m ¹ A positive/negative samples are made up of 41bp, the training set includes 593 positive samples and 5930 negative samples, and the test set includes 114 positive samples and 1140 negative samples. As shown in table 1:

TABLE 1 statistics of two RNA modification data sets

2) Processing samples, taking the positive and negative sample sequences with the common motif A as the center, and taking front and back value windows as bp with different sizes, wherein the front and back value windows can be integral multiples of 5, namely: 10bp,15bp,20bp, so that 21bp,31bp,41bp can be taken, and when a base does not exist in a sample sequence at some positions, the missing base is filled with a '-' character.

3) Feature encoding: a 3 base window is used, and the sliding is performed in a mode of sliding 1 base at a time, and the sliding is performed on each sample sequence until the window is in contact with the tail end of the sequence, so that a dictionary consisting of 105 different subsequences and unique integer sequences is obtained.

And respectively coding RNA sequences by using a CBOW model of word2vec aiming at sample sequences with different scales. Taking a sample with a sample length of 41 bases as an example: the sample has 41 bases, a window with the size of 3 bases is used, the sliding is finished when the window touches the tail end of the sequence in a mode of sliding 1 base each time, therefore, 39 subsequences consisting of 3 bases are obtained, the RNA sequence is coded by using a CBOW model of word2vec, therefore, each subsequence is converted into a word vector representing semantics, and then the obtained word vector is used for converting the length of 41bp in the RNA base sequence into a matrix of 39 x 100, wherein 39 is the number of words in preprocessing, and 100 is the dimension of the word vector. The word2vec model functions with the intent of capturing inter-vocabulary relationships in a high-dimensional space.

4) Introducing position coding: position information is introduced using position embedding because of the correlation between base positions in the sequence.

5) Designing a multi-view classification learning model: through a multi-head cross attention mechanism layer, a plurality of word vector sequences with different scales are learned, and then enter a feed-forward network (FFN). Finally, the values output by the coding module are averaged, and then the N is predicted to be contained in the RNA base sequence of the human species or not through a fully-connected neural network layer and two classifiers ¹ -a methyl adenosine modification site.

It should be noted that, the encoding module input = embedded encoding input + position encoding.

The embedded coding input is to map the vector dimension of each word from the word vector dimension to d through a conventional embedding layer _model Since it is an additive relationship, the position code here is also d _model A vector of dimensions.

The position code is not a single value but a d-dimensional vector (much like a word vector) containing information about a specific position in a sentence, and the code is not integrated into the model, but the vector is used to make each word have information about its position in the sentence. In other words, the model input is enhanced by injecting sequential information of words. Given an input sequence of length m, let s denote the position of the word in the sequence,

the vector corresponding to the s position is represented,

representing the i-th element, d, in the s-position vector _model Is the dimension of the input and output of the coding module, and is also the dimension of the position coding.

Is to generate a position vector

Is defined as follows:

wherein

Handle d _model The vectors of the dimension are grouped in pairs, each group is a sin and a cos, and the two functions share the same frequency omega _k In total, have d _model Group/2, since we start numbering from 0, the last group number is d _model /2-1.sin and cos function wavelength (from ω) _i Decision) increases from 2 pi to 2 pi x 10000.

The specific description of the above steps is as follows:

the data set sample sequence takes a common motif A as a center, the front and back value windows are bp with different sizes, taking 3 differences of x1bp, y1bp and z1bp as an example, namely each m is ¹ A positive/negative sample consists of xbp, ybp, zbp, and when there is no base in some position in the sample sequence, the missing base is filled with a '-' character, and the patent assumes x1=10, y1=15, z1=20, and thus, x =21,y =31,z =41; the base sequence is firstly subjected to word2vec word embedded coding, then directly subjected to position coding, and then passes through a coding module which is composed of 3 codesEach coding block comprises a multi-head cross attention mechanism Layer and a forward feedback propagation network, and each Layer is connected through a Residual Connection (Residual Connection) and a standardization Layer (Layer Normalization), the Residual Connection is used for preventing the network from degrading, and the problem of gradient disappearance can be avoided. The normalization layer is used to normalize the activation values of each layer. As shown in fig. 1.

Wherein the dimension d of the input and output of the coding module _model =64, multiple number h =8, forward feedback network dimension d _ ff =256, dropout =0.1, dropout means that, in the training process of the deep learning network, for a neural network unit, the neural network unit is temporarily discarded from the network according to a certain probability.

The input is the base sequence with label, the sentence input from the encoder will first pass through a multi-head cross attention layer, the multi-head representation is executed many times. The sequence a, the sequence b and the sequence c are respectively a set consisting of sequences with different scales of length xbp, ybp and zbp, so that the cross attention in a coding block means that the first sequence is used as query (query) input, and the other sequence is used as key (key) input and value (value) input for performing attention calculation, as shown in fig. 2. Finally, the output results of the 3 attentions are added up as the output of the cross-attention layer, as shown in fig. 3.

The cross-attention input is two different sequences, one of which is used as a query input and the other as a key and value input.

A specific algorithm of the cross-attention mechanism is shown in fig. 2, where one sequence is a sequence a, the other sequence is a sequence b, the sequence a is used as a query input (query), and in order to capture each value (value) in the sequence b, each key (key) in the sequence b is required to correspond to the value (value).

Matrix multiplication (MatMul) is firstly carried out between the query (query) of the sequence a and the key (key) of the sequence b, then scaling (Scale) is carried out, an attention score can be obtained, a softmax function is used for carrying out normalization processing on the attention score to obtain the weight of each key, and the value of the weight matrix multiplication sequence b is output with interactive attention, wherein the corresponding equation is as follows:

the softmax in the formula is used for normalizing vectors, namely normalizing similarity, so that a weight matrix after normalization is obtained, and the larger the weight of a certain value in the matrix is, the higher the similarity is. Q _a Is the sequence a query vector (query vector), K _b Is the sequence b key vector, V _b Is the sequence b Value Vector (Value Vector), d _k Is K _b Dimension of (A), K _b ^T Is the transpose of the key vector of sequence b. Taking the input sequence X as an example, the sequence X is first converted to Q using linear projection _x 、K _x 、V _x They are all linearly transformed from the same input sequence X, represented by the following equation:

Q _x ＝XW ^Q

K _x ＝XW ^K

V _x ＝XW ^V

in the above formula, W ^Q ,W ^K ,W ^V Is a corresponding projection matrix, the value of which is initialized randomly at first, and the final value is obtained by the network self-learning. Input sequences X and W ^Q Q is obtained by sequential multiplication _x Obtaining K by the same method _x ，V _x 。

As shown in fig. 3, the output results of the 3 attention layers are added to realize the role of the cross-attention layer, different sequences of queries, keys and values in the above multi-scale cross-attention mechanism are linearly projected h times onto the dk, dk and dv dimensions, respectively, where dv is the dimension of the value vector V, and then the cross-attention mechanism is executed in parallel on the projected versions of each query, key and value, respectively, to generate the output value of the dv dimension. Splicing the output values of the h-time integrated cross attention, and projecting the output values to a linear network again to generate a final value, wherein the corresponding mathematical formula form of the multi-head multi-scale cross attention layer is as follows:

MultiHead(Q,K,V)＝Concat(head ₁ ,...,head _h )W ^O

wherein Concat is the output head of multiple multi-scale cross-attention _i Splicing, i takes a positive integer to represent a specific ith head number, W ^O Weights, Q, for multiple multi-scale cross-attention stitching _a Is the sequence a query vector, K _a 、K _b 、K _c Respectively are sequence a key vector, sequence b key vector, sequence c key vector and V _a 、V _b 、V _c Is a sequence a value vector, a sequence b value vector, a sequence c value vector.

One sequence is a sequence a, the other sequence is the same sequence a, the sequence a is used for query input, each key in the sequence a corresponds to a value, a self-attention mechanism is carried out at the moment,

represents the key vector K at that time _a The weight of (a) is determined,

represents the value vector V at that time _a Three weights are initially randomly initialized, and the final value is obtained by the network learning itself. One sequence is sequence a, the other sequence is sequence b, the sequence a is used as query input, each key in the sequence b corresponds to a value, a cross attention mechanism is performed at the moment,

represents the key vector K at that time _b The weight of (a) is determined,

represents the value vector V at that time _b Three weights are initially randomly initialized, and the final value is obtained by the network learning itself. One sequence is a sequence a, the other sequence is a sequence c, the sequence a is used as query input, each key in the sequence c corresponds to a value, a cross attention mechanism is performed at the moment,

represents the key vector K at that time _c The weight of (a) is calculated,

represents the value vector V at that time _c Three weights are initially randomly initialized, and the final value is obtained by the network learning itself. Wherein the content of the first and second substances,

r represents a set real number set, and the real number set is a set containing all rational numbers and irrational numbers; d _k Is the dimension of the key vector K, where d _k ＝8；d _v Is the dimension size of the value vector V, where d _v ＝8；d _model To the output dimension, here d _model ＝64。

The above formula, using h =8 parallel attention layers or heads. For each of these, we use d _k ＝d _v ＝d _model /h＝8。

Then, residual Connection (Residual Connection) and a standardization Layer (Layer standardization) are carried out, the Residual Connection is used for preventing the network from degrading, and the problem of gradient disappearance can be avoided. The normalization layer is used to normalize the activation values of each layer.

And further entering a forward feedback full connection layer, comprising:

two linear transformations, the middle of which is provided with a Relu activation function; namely, the form of the mathematical formula corresponding to the feedforward fully-connected layer is as follows, where max represents the ReLU activation function.

FFN(x)＝max(0,xW ₁ +b ₁ )W ₂ +b ₂

W in the formula ₁ 、W ₂ 、b ₁ And b ₂ Respectively, the parameters of the feedback full connection layer.

It should be noted that, the function of the feed-forward full connection layer is as follows: the pure multi-head attention mechanism is not enough to extract ideal features, so that the full connection layer is added to improve the network capacity.

And then, residual Connection (Residual Connection) and a Normalization Layer (Layer Normalization) are carried out, wherein the Residual Connection is used for preventing network degradation, and the problem of gradient disappearance can be avoided. The normalization layer is used to normalize the activation values of each layer.

Further, averaging the values output by the coding module, and then predicting whether the RNA base sequence of the human species contains N or not through a full-connection neural network layer and two classifiers ¹ -a methyladenosine modification site.

In the embodiment of the invention, the validity of the model is verified in a 5-fold mode by utilizing a training set:

TABLE 2 training set 5-fold prediction results

Classifiers	AUROC	ACC	Sen	Precision	MCC	Spe	F-1	AUPRC
									BiLSTM	0.9242	94.11	55.14	73.48	60.61	98.01	63.00	0.701
CNN	0.8982	93.16	62.39	62.39	5863	96.24	62.39	0.6707
									BiLSTM+Attlayer	0.9270	94.28	57.84	73.61	62.25	97.93	64.78	0.7069
CNN+Attlayer	0.9026	93.34	60.54	64.22	58.71	96.63	62.33	0.6644
									MSCA(21-31-41)	0.9465	94.10	61.6	72.64	63.71	97.54	66.67	0.75

MSCA (21-31-41) represents a model using a multiscale cross-attention Model (MSCA) with samples of 21bp,31bp,41bp in sequence length as input.

Consider that the test set positive and negative samples are 1:10, belonging to an unbalanced sample set, so the performance is compared by the area under the exact recall curve (aucrc), as shown in table 1, the area under the exact recall curve (aucrc) of the multiscale cross-Attention Model (MSCA) is much higher than that of the model by the bilst classification model (Bi-direct Long near-Term Memory, bilst), CNN (conditional Neural Network, CNN), bilst + atteyer (bilst Layer + bahdauadauattentionlayer), CNN + atteyer (conditional Neural Network Layer + bahdauadauattentionlayer).

Otherwise, MSCA is also higher than other known excellent classifications when comparing key indicators such as accuracy ACC.

In the embodiment of the invention, the validity of the model is verified by using the test set:

table 3 independent data set evaluation

Classifiers	AUROC	ACC	Sen	Precision	MCC	Spe	F-1	AUPRC
									BiLSTM	0.9038	94.25	55.26	75.0	61.43	93.30	63.63	0.7253
CNN	0.9106	94.73	58.77	77.90	64.95	93.14	67.00	0.7066
									BiLSTM+Attlayer	0.9276	94.97	54.38	84.93	65.58	94.17	66.31	0.7642
CNN+Attlayer	0.9207	94.89	63.15	76.59	66.84	92.50	69.22	0.7538
									MSCA(21-31-41)	0.9405	95.21	62.28	80.68	68.41	98.51	70.30	0.7751

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for predicting RNA modification sites based on a multi-scale cross attention model is characterized by comprising the following steps:

to contain N ¹ RNA base sequence of the methyladenosine modification site as a positive sample and containing no N ¹ -the RNA base sequence of the methyladenosine modification site is a negative sample, and 3 groups of RNA base sequences with different scales are taken as input sequences of each sample;

inputting the 3 groups of coded sequences into a coding module to obtain a characteristic matrix; wherein the encoding module comprises: a plurality of coding blocks connected in series in sequence; the coding block includes: a multi-head cross attention layer and a forward feedback full connection layer, and each layer is connected with a standardization layer through a residual error;

averaging the output results of the coding modules, and predicting whether the RNA base sequence of the human species contains N or not through a full-connection neural network layer and two classifiers ¹ -a methyl adenosine modification site.

2. The method for predicting the RNA modification site based on the multi-scale cross-attention model of claim 1, further comprising: constructing a data set;

the data set includes: the RNA base sequence is a positive sample, the RNA base sequence is a negative sample and a category label, and the sample length is 41bp; the input sequence is set as a sequence a, a sequence b and a sequence c which are respectively a set consisting of sequences with different scales of length xbp, ybp and zbp;

the training set and the test set of the data set are represented as:

wherein, y _n ∈{0,1},

3. The method for predicting RNA modification sites based on the multi-scale cross attention model of claim 2, wherein each sample takes 3 sets of RNA base sequences with different scales as input sequences, and comprises the following steps:

the sample sequence in the data set is centered on the common motif A, the front and back value windows are bp with different sizes, taking 3 different x1bp, y1bp and z1bp as an example, namely, each m is ¹ A positive sample/negative sample is composed of xbp, ybp and zbp, and when no base exists in some positions of the sample sequence, the missing base is filled with a '-' character; let x1=10, y1=15, and z1=20, so x =21, y =31, z =41.

4. The method for predicting the RNA modification sites based on the multi-scale cross attention model as claimed in claim 1, wherein the word2vec word embedded code specifically comprises:

5. The multi-view classification model multi-scale cross-attention model-based RNA modification site prediction method of claim 1, wherein the coding module comprises: 3 coding blocks which are connected in series in sequence.

6. The method for predicting the RNA modification site based on the multi-scale cross-attention model of claim 1, wherein the coding module comprises:

dimension d of its output _model =64, multiple number of heads h =8, feed forward network dimension d _ ff =256, and the probability of temporary dropping from the network is dropout =0.1.

7. The method for predicting RNA modification sites based on the multi-scale cross attention model of claim 1, wherein the multi-scale cross attention layer comprises:

when the sequence a carries out self-attention calculation, the sequence a carries out cross attention calculation with the sequence b and the sequence c respectively, wherein the cross attention means that the first sequence is used as query input, and the other sequence is used as key input and value input for carrying out attention calculation; and adding the output results of the 3 kinds of attention as the output of the cross attention layer to realize the multi-scale cross attention layer.

8. The method of claim 7, wherein the RNA modification site prediction method based on the multi-scale cross-attention model,

a cross attention mechanism algorithm in the multi-scale cross attention layer, comprising: the method comprises the following steps that a plurality of independent sequences with the same dimension and different scales are used, wherein the first sequence is used for query input, the remaining sequences are respectively subjected to attention calculation with the first sequence, namely the remaining sequences are used for key input and value input during attention calculation; the method specifically comprises the following steps:

the softmax is used for normalizing vectors, namely normalizing similarity to obtain a weight matrix after normalization, wherein the larger the weight of a certain value in the matrix is, the higher the similarity is; q _a Is the sequence a query vector, K _b Is the sequence b key vector, V _b Is a vector of values of the sequence b, d _k Is the dimension size of the b-key vector of the sequence, K _b ^T Is the transpose of the key vector of sequence b; when the input sequence is X, first the sequence X is converted to Q using linear projection _x 、K _x 、V _x They are all linearly transformed from the same input sequence X, represented by the following equation:

Q _x ＝XW ^Q

K _x ＝XW ^K

V _x ＝XW ^V

9. The method for predicting the RNA modification site based on the multi-scale cross attention model of claim 8, wherein the algorithm of the multi-scale cross attention layer comprises:

MultiHead(Q,K,V)＝Concat(head ₁ ,...,head _h )W ^O

wherein Concat is the output head of multiple multi-scale cross-attention _i Splicing, i takes a positive integer to represent the ith head number, W ^O Weights, Q, for multiple multi-scale cross-attention stitching _a Is the sequence a query vector, K _a 、K _b 、K _c Respectively a sequence a key vector, a sequence b key vector, a sequence c key vector and V _a 、V _b 、V _c Is a sequence a value vector, a sequence b value vector and a sequence c value vector;

represents the key vector K at that time _a The weight of (a) is determined,

represents the value vector V at that time _a The three weights are initialized randomly at first, and the final value is obtained by the learning of the network; one sequence is a sequence a, the other sequence is the same sequence b, the sequence a is used for query input, each key in the sequence b corresponds to a value, the attention mechanism is carried out at the moment,

represents the key vector K at that time _b The weight of (a) is calculated,

represents the key vector K at that time _c The weight of (a) is determined,

represents the value vector V at that time _c Three weights are initially randomly initialized, the final value is learned by the network itself, and

r is a set of real numbers representing the set, the set of real numbers including all rational numbers and irrational numbers, where d _k ＝8；d _v Is the dimension size of the value vector V, where d _v ＝8；d _model To the output dimension, here d _model ＝64；

10. The method for predicting the RNA modification site based on the multi-scale cross attention model of claim 1, wherein the forward feedback full-link layer comprises:

two linear transformations with a Relu activation function in between; the mathematical formula corresponding to the forward feedback full connection layer is in the form as follows:

FFN(x)＝max(0,xW ₁ +b ₁ )W ₂ +b ₂