CN114743591A

CN114743591A - Recognition method and device for MHC (major histocompatibility complex) bindable peptide chain and terminal equipment

Info

Publication number: CN114743591A
Application number: CN202210246129.5A
Authority: CN
Inventors: 郭菲; 江丽敏; 郗文辉; 唐继军
Original assignee: Shenzhen Technology University
Current assignee: Shenzhen Technology University
Priority date: 2022-03-14
Filing date: 2022-03-14
Publication date: 2022-07-12

Abstract

The application is applicable to the technical field of data processing, and provides an MHC (major histocompatibility complex) bindable peptide chain recognition method, an MHC bindable peptide chain recognition device and terminal equipment, wherein the method comprises the following steps: acquiring sequence information of a peptide chain to be identified, converting the sequence information into evolution information of the peptide chain, and determining a probability value of the combination of the peptide chain and MHC (major histocompatibility complex) based on the evolution information of the peptide chain and a convolution cyclic neural network; compared with the prior art that whether the peptide chain with the specific length can be combined with the MHC is judged by directly using the sequence information, the method introduces the evolution information and the sequence information of the peptide chain at the same time on one hand, and obtains the information of the peptide chains with different lengths by means of the recurrent neural network model on the other hand, and finally accurately determines the probability of the combination of the peptide chain and the MHC.

Description

Recognition method and device for MHC (major histocompatibility complex) bindable peptide chain and terminal equipment

Technical Field

The application belongs to the technical field of data processing, and particularly relates to an MHC (major histocompatibility complex) bindable peptide chain identification method, an MHC bindable peptide chain identification device and terminal equipment.

Background

Major Histocompatibility Complex (MHC) is a group of encoded proteins that are important components of immune surveillance. According to variations in the function, molecular structure and distribution of MHC, MHC can be classified into MHC class I, MHC class II, and MHC class III. Different classes of MHC can bind different peptide chains.

Currently, in determining whether a peptide chain can bind to MHC, the determination is generally made using sequence information of a specific length of the peptide chain. Errors often occur in using sequence information to determine whether a peptide chain can bind to MHC, making peptide chain recognition inaccurate.

Disclosure of Invention

The embodiment of the application provides an MHC (major histocompatibility complex) combinable peptide chain identification method, an MHC combinable peptide chain identification device and terminal equipment, and the accuracy of peptide chain identification prediction can be improved.

In a first aspect, the present embodiments provide a method for identifying MHC-bindable peptide chains, comprising:

acquiring sequence information of a peptide chain to be identified;

converting the sequence information into evolutionary information of the peptide chain;

and determining the probability value of the peptide chain binding to MHC according to the evolution information.

In a second aspect, the present embodiments provide a recognition device for MHC-bindable peptide chains, comprising:

the information acquisition module is used for acquiring sequence information of the peptide chain to be identified;

the information conversion module is used for converting the sequence information into evolution information of the peptide chain;

and the probability calculation module is used for determining the probability value of the combination of the peptide chain and the MHC according to the evolution information.

In a third aspect, an embodiment of the present application provides a terminal device, including: a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the method for recognizing MHC-bindable peptide chain according to any one of the first aspect.

In a fourth aspect, the present embodiments provide a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the method for recognizing MHC-bindable peptide chain according to any one of the first aspect.

In a fifth aspect, the present application provides a computer program product, which when run on a terminal device, causes the terminal device to execute the method for recognizing MHC-bindable peptide chains according to any one of the above first aspects.

Compared with the prior art, the embodiment of the first aspect of the application has the following beneficial effects: the method comprises the steps of obtaining sequence information of a peptide chain to be identified, converting the sequence information into evolution information of the peptide chain, and finally determining a probability value of combination of the peptide chain and an MHC (major histocompatibility complex) according to the evolution information; compared with the prior art that whether the peptide chain can be combined with the MHC or not is judged by directly using the sequence information with the specific length, the method introduces the evolution information, and the evolution information is obtained according to the sequence information, which is equivalent to that the method determines the probability that the peptide chain can be combined with the MHC by using the sequence information of the peptide chain and the evolution information of the peptide chain.

It is to be understood that, for the beneficial effects of the second aspect to the fifth aspect, reference may be made to the relevant description in the first aspect, and details are not described herein again.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic view of an application scenario of the recognition method for MHC-bindable peptide chain according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of a method for identifying MHC-bindable peptide chains according to an embodiment of the present application;

fig. 3 is a flowchart illustrating a method for determining a probability value according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a bidirectional long term and short term memory network according to an embodiment of the present application;

FIG. 5 is a graphical representation comparing ROC curves for the present application and other methods provided by an embodiment of the present application;

fig. 6 is a schematic diagram illustrating a relationship between performance and sequence information of the method according to the present application;

FIG. 7 is a schematic diagram of the structure of an MHC binding peptide chain recognition device according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

Since different classes of MHC can bind different lengths of peptide chains. The peptide chain recognition model is used for recognizing whether the peptide chain can be combined with MHC (MHC), the peptide chain recognition model is limited by the length of the peptide chain, and each peptide chain recognition model can only recognize the peptide chain with a fixed length. For peptide chains of different lengths, different peptide chain recognition models need to be established.

The recognition method of the MHC combinable peptide chain can process peptide chains with different lengths, and achieves universality of a peptide chain recognition model.

Fig. 1 is a schematic view of an application scenario of the recognition method of MHC-bindable peptide chain provided in the embodiment of the present application, which can be used to determine whether a peptide chain can bind to MHC. The storage device 10 is used for storing sequence information of a peptide chain, and the processor 20 is used for acquiring the sequence information of the peptide chain from the storage device 10, obtaining a probability value that the peptide chain can be combined with MHC through processing the sequence information, and determining whether the peptide chain can be combined with MHC through the probability value.

Fig. 2 shows a schematic flow diagram of the recognition method for MHC-bindable peptide chains provided herein, which is detailed below with reference to fig. 2:

s101, obtaining sequence information of the peptide chain to be identified.

In this example, the peptide chain is formed by connecting a plurality of amino acids by dehydration condensation to form peptide bonds (chemical bonds). The length of the different peptide chains may vary. Sequence information for the peptide chain can be obtained from a storage device.

In this embodiment, the length of the peptide chain to be identified is not fixed, and may be any length of peptide chain.

Specifically, a peptide chain to be recognized is obtained. And carrying out one-bit effective coding treatment on the peptide chain to be identified to obtain the sequence information of the peptide chain.

In this embodiment, one-bit-efficient encoding may also be referred to as one-hot encoding.

In this example, the sequence information may be a matrix, for example, a 20 × L matrix, where L is the number of amino acids of the peptide chain to be identified.

In this embodiment, a single-bit efficient encoding process is performed on the peptide chain to be identified, and the information of the peptide chain to be identified can be extracted from the sequence level.

S102, converting the sequence information into evolution information of the peptide chain.

In the present embodiment, the evolution information is a matrix, for example, a 20 × L matrix.

S103, determining the probability value of the combination of the peptide chain and the MHC according to the evolution information.

In this example, the probability values are used to characterize whether a peptide chain can bind to MHC. Specifically, if the probability value is greater than a preset value, determining that the peptide chain can be combined with MHC; and if the probability value is less than or equal to the preset value, determining that the peptide chain cannot be combined with the MHC.

In the embodiment of the application, sequence information of a peptide chain to be identified is obtained, the sequence information is converted into evolution information of the peptide chain, and finally, the probability value of the combination of the peptide chain and MHC is determined according to the evolution information; compared with the prior art that whether the peptide chain can be combined with the MHC or not is judged by directly using the sequence information, the method introduces the evolution information, and the evolution information is obtained according to the sequence information, namely the method utilizes the sequence information of the peptide chain and the evolution information of the peptide chain to determine the probability that the peptide chain can be combined with the MHC.

In this embodiment, the sequence information is converted into evolution information through a trained peptide chain recognition model, and a probability value is obtained according to the evolution information.

In this embodiment, the peptide chain recognition model includes a convolutional neural network, a two-way long-short term memory network, a first fully-connected layer, and a second fully-connected layer. The output data of the convolutional neural network is the input data of the bidirectional long-short term memory network.

In one possible implementation manner, the implementation process of step S102 may include:

and processing the sequence information by using the trained convolutional neural network to obtain evolution information of the peptide chain.

In this embodiment, the convolutional neural network may be a convolutional neural network in which 20 convolutional kernels exist. The convolution size is a 20 x 1 matrix.

Specifically, the convolutional neural network includes:

wherein, Evo (X)_k,iIs the value of the ith column in the kth row of a first matrix, the first matrix being used to characterize the evolution information,

for transposing the kth convolution sum, X_iThe vector of the ith column in a second matrix is used for representing the sequence information.

In this embodiment, the parameters in the convolution kernel in the convolutional neural network are determined based on the BLOSUM matrix. Parameters in the convolution kernel in the convolutional neural network are determined based on the BLOSUM matrix in order to extract evolutionary information in the peptide chain. Therefore, the information of the peptide chain to be identified can be extracted from the evolutionary level by processing the peptide chain to be identified by the convolutional neural network.

As shown in fig. 3, in a possible implementation manner, the implementation process of step S103 may include:

and S1031, performing deep information characterization on the evolution information by using the trained bidirectional long-short term memory network to obtain a first vector.

Specifically, the evolution information is input into a bidirectional long-short term memory network to obtain a first vector.

In the present embodiment, a Long Short-Term Memory network (LSTM) is a time-recurrent neural network, all of which have a chain form of repeating neural network modules. The bidirectional long-short term memory network comprises two sets of LSTMs, one set of LSTMs processes the merged feature matrix from left to right, and the other set of LSTMs merges the feature matrix from right to left. The bidirectional long-short term memory network meets the requirement of processing peptide chains with different lengths. A set of LSTM includes an input gate, a forgetting gate, and an output gate. The output of the bidirectional long-short term memory network is a 128-dimensional vector. FIG. 4 is a schematic diagram of a bidirectional long-term and short-term memory network.

Specifically, the calculation process of LSTM may include:

f_t＝σ(W_fx_t+U_fh_t-1+b_f)；

i_t＝σ(W_ix_t+U_ih_t-1+b_i)；

o_t＝σ(W_ox_t+U_oh_t-1+b_o)；

wherein x is_tIs an input vector; f. of_tAn activation vector for a forget gate; o_tIs the activation vector of the output gate; i all right angle_tIs the activation vector of the input gate; h is a total of_tIs a 128-dimensional hidden state vector; c_tIs the state vector of the LSTM; w_fAnd U_fA parameter matrix in the forgetting gate is used; b_fIs the offset vector in the forgetting gate; w is a group of_iAnd U_iIs a parameter matrix in the input gate; b_iIs the offset vector in the input gate; w is a group of_oAnd U_oIs a parameter matrix in the output gate; b_oIs the offset vector in the output gate; w_cAnd U_cIs a parameter matrix in the LSTM; b is a mixture of_cIs the bias vector in the LSTM.

S1032, determining a probability value of the peptide chain binding to MHC based on the first vector.

Specifically, the first vector is subjected to first combination processing by using the first full connection layer, so that a second vector is obtained;

performing second combination processing on the second vector by using the second full connection layer to obtain a first value;

and carrying out standardization processing on the first value by utilizing the second full-connection layer to obtain the probability value.

In the present embodiment, the first value is normalized using a Sigmoid function.

In the embodiment of the application, the bidirectional long-short term memory network can identify the peptide chain with any length, and compared with the prior art that different peptide chain identification models are needed to be used for peptide chains with different lengths, the bidirectional long-short term memory network has universality. The probability value of the peptide chain binding to MHC is determined, and the method is beneficial to research of vaccines or medicines.

In one possible implementation manner, the peptide chain recognition model may be trained before being used, so as to obtain a trained peptide chain recognition model.

Specifically, before the peptide chain recognition model is trained, parameters of a convolution kernel in the convolutional neural network need to be set. The convolutional neural network before training is recorded as a convolutional neural initial network, and the convolutional neural initial network after training is recorded as a convolutional neural network.

Specifically, the setting process of the parameter of the convolution kernel includes:

and determining initial parameters of convolution kernels in the convolution neural initial network based on a BLOSUM matrix, wherein the parameters in each row of the BLOSUM matrix are the initial parameters of one convolution kernel, or the parameters in each column of the BLOSUM matrix are the initial parameters of one convolution kernel.

In this example, the BLOSUM matrix is a matrix of protein families with well-conserved forms selected from the BLOCKS database, from which the relative frequency and probability of amino acid substitutions are counted. The BLOSUM matrix is a 20 x 20 matrix. All convolution kernels in the convolutional neural initial network are initialized with the BLOSUM matrix. The number of convolution kernels in the convolutional neural initial network is the same as the number of rows or columns of the BLOSUM matrix. The number of rows of each convolution kernel in the convolutional neural initial network is the same as the number of rows or columns of the BLOSUM matrix. Each row in the BLOSUM matrix may be a convolution kernel. Alternatively, each column in the BLOSUM matrix may be a convolution kernel.

Specifically, after initial parameters of a convolution kernel in the convolution neural initial network are set, a peptide chain recognition model is trained.

Specifically, according to a preset training sample, a peptide chain recognition model is trained to obtain the trained peptide chain recognition model. The trained peptide chain recognition model comprises a trained convolution bible network

And the convolutional neural initial network obtains the trained neural convolutional neural network, the trained bidirectional long-short term memory network, the trained first full link layer and the trained second full link layer.

Specifically, the predetermined training samples include peptide chains of different lengths.

Specifically, a first peptide chain in a training sample is input into a peptide chain recognition model to be trained to obtain a probability result, a loss function is determined based on the probability result, and parameters in the peptide chain recognition model are updated by the loss function. And training the peptide chain recognition model after the parameters are updated by using a second peptide chain in the training sample, updating the parameters in the peptide chain recognition model according to the probability result, and circulating in sequence until the loss function meets the preset condition or the training times reach the preset times to obtain the trained peptide chain recognition model. The process of determining the probability result is the same as the process of determining the probability value, please refer to the process of determining the probability value, and will not be described herein again.

In the embodiment of the application, the convolution kernel in the convolution neural initial network is initialized by using the BLOSUM matrix, so as to obtain the evolution information of the peptide chain in the process of identifying the peptide chain. The training of the peptide chain recognition model to be trained can be carried out through presetting the training samples, and the trained peptide chain recognition model is obtained.

In this embodiment, after obtaining the trained peptide chain recognition model, the trained peptide chain recognition model in the present application may be verified by using human data or non-human data and using a five-fold cross-validation method to determine the performance of the trained peptide chain recognition model. For example, the performance of the post-training peptide chain recognition model is determined by auc (area under the curve). AUC is a model evaluation index in the field of machine learning, and is an area under a receiver operating characteristic curve (ROC curve).

Specifically, the preset training samples may be recorded as a training set, and the training set may be data obtained from an Industrial Engineering Database (IEDB Database). After the peptide chain recognition model is trained, testing can be performed using the independent test set data. The independent test set data may be data obtained from the MHCBN and SYFPEITHI databases. For example, information on the peptide chains of the training set and independent test set data is shown in table 1 below.

Table 1 information on peptide chains of training set and independent test set data

MHC I can be designated as class I. MHC II can be described as class II.

By way of example, the performance of the peptide chain recognition model is evaluated by using human data in the training set data and a five-fold cross validation method, and the evaluation result of specific performance indexes is shown in table 2 below.

Table 2 results of evaluating the performance of peptide chain recognition models using human data

In Table 2 above, F1 represents the harmonic mean of accuracy and recall. MCC characterizes a two-class index. The AUPR characterizes the area under the PR curve, which is a plot of recall and correct rate composition.

As can be seen from table 2, the peptide chain recognition model of the present application has better performance in processing peptide chains with different lengths, for example, AUC of each of the two peptides is greater than 0.85, which determines that the peptide chain recognition model of the present application has better performance.

By way of example, the performance of the peptide chain recognition model was evaluated using non-human data in the training set data, e.g., mouse, macaque, and chimpanzee, and the five-fold cross-validation method, and the evaluation results of specific performance indicators are shown in table 3 below.

Table 3 results of evaluating the performance of peptide chain recognition models using non-human data

PCC in table 3 is the correlation result between the predicted value and the true value evaluated.

As can be seen from table 3, the peptide chain recognition model of the present application has better performance in processing peptide chains with different lengths, for example, AUC is greater than 0.8, which confirms that the peptide chain recognition model of the present application has better performance.

By way of example, comparison of the performance of the process of the present application with other existing processes can be made.

Table 4 results of evaluating the performance of peptide chain recognition models using human data

As can be seen from table 4 above, the method of the present application showed that 6 of 8 markers were higher than the others in predicting MHC class I-bindable peptide chains. In the prediction of MHC class ii bindable peptide chains, all of the 8 indices were higher than the others. Therefore, it was determined that the method of the present application performed better than other methods.

By way of example, the ROC curves of the present application are compared to other methods, as shown in fig. 5. Based on independent test set data, the ROC curve graphs of the methods in the table 2 are predicted, and the method corresponding to the curve closer to the upper left corner in the graph has better performance.

The results shown in a of fig. 5 are comprehensive ROC plots for all lengths of peptide chains in the set based on mhc class i independent tests. Line a1 represents the ROC curve for the method of the present application, with AUC values of 0.944 for the method of the present application. The A2 line is the ROC plot for the SMM method, which has an AUC value of 0.883. The A3 line is the ROC plot for the ANN method, which has an AUC value of 0.881. The a4 line is the ROC plot for the NetMHCpan method, which has an AUC value of 0.879. The a5 line is the ROC plot for NetMHCcons method, which has an AUC value of 0.876. Line a6 is the ROC plot for the PickPocket method, which has an AUC value of 0.849. The line a7 is the ROC plot for the NetMHCpan EL method, which has an AUC value of 0.833. The line A8 is the ROC graph of comblib _ sidney2008 method, and the AUC value of comblib _ sidney2008 method is 0.179.

FIG. 5B is a graph showing the results of ROC curves comparing the method of the present application with other methods when the length of the peptide chain in MHCI is 9 mers. The B1 line characterizes the ROC curve for the method of the present application, with an AUC value of 0.955 for the method of the present application. The B2 line is the ROC plot for the SMM method, which has an AUC value of 0.900. Line B3 is the ROC plot for the ANN method, which has an AUC value of 0.890. The B4 line is the ROC plot for the NetMHCpan method, which has an AUC value of 0.887. The B5 line is the ROC plot for NetMHCcons method, which has an AUC value of 0.891. Line B6 is the ROC plot for the PickPocket method, which has an AUC value of 0.861. The B7 line is the ROC plot for the NetMHCpan EL method, which has an AUC value of 0.862. The line B8 is the ROC graph for comblib _ sidney2008 method, and the AUC value for comblib _ sidney2008 method is 0.179.

FIG. 5C is a graph showing the results of ROC curves comparing the method of the present application with other methods when the length of the peptide chain in MHCI is 10 mers. The line C1 represents the ROC curve for the method of the present application, with an AUC value of 0.907 for the method of the present application. The C2 line is the ROC plot for the SMM method, which has an AUC value of 0.879. The line C3 is a ROC plot for the ANN method, which has an AUC value of 0.891. The C4 line is the ROC plot for the NetMHCpan method, which has an AUC value of 0.885. The C5 line is the ROC plot for NetMHCcons method, which has an AUC value of 0.885. The line C6 is the ROC plot for the PickPocket method, which has an AUC value of 0.860. The C7 line is the ROC plot for the NetMHCpan EL process, which has an AUC value of 0.830.

FIG. 5 is a graph showing the results of ROC curves of the method of the present application compared with other methods when the length of the peptide chain in MHCI is 11 mers. The D1 line represents the ROC curve for the methods of the present application, with AUC values of 0.967 for the methods of the present application. The D2 line is the ROC plot for the SMM method, which has an AUC value of 0.853. Line D3 is the ROC plot for the ANN method, which has an AUC value of 0.911. The D4 line is the ROC plot for the NetMHCpan method, which has an AUC value of 0.932. The D5 line is the ROC plot for NetMHCcons method, which has an AUC value of 0.927. The line D6 is the ROC plot for the PickPocket method, which has an AUC value of 0.915. Line D7 is the ROC plot for the NetMHCpan EL method, which has an AUC value of 0.888.

The results shown in FIG. 5, E, are comprehensive ROC plots based on all lengths of peptide chains in the MHC class II independent test set. Line E1 represents the ROC curve for the methods of the present application, with AUC values of 0.922 for the methods of the present application. Line E2 is the ROC plot for the NN-align method, which has an AUC value of 0.849. The line E3 is the ROC plot for NETMHCIIPan method, which has an AUC value of 0.823. The line E4 is the ROC plot for the SMM-align method, which has an AUC value of 0.798.

FIG. 5 is a graph showing the result of comparing ROC curves obtained by the method of the present invention with those obtained by other methods when the length of the MHC class II peptide chain is 13 mer. The F1 line represents the ROC curve for the method of the present application, with an AUC value of 0.929 for the method of the present application. The curve F2 is the ROC plot for the NN-align method, which has an AUC value of 0.796. The curve F3 is the ROC graph of NETMHCIIPan method, and the AUC value of NETMHCIIPan method is 0.805. The F4 line is the ROC plot for the SMM-align method, which has an AUC value of 0.806.

FIG. 5 is a graph showing the result of comparing ROC curves obtained by the method of the present invention with those obtained by other methods when G represents a 15mer peptide chain length in MHC class II. The G1 line represents the ROC curve for the methods of the present application, with AUC values of 0.936 for the methods of the present application. Line G2 is the ROC plot for the NN-align method, which has an AUC value of 0.908. The G3 line is the ROC graph of NETMHCIIPan method, which has an AUC value of 0.899. The G4 line is the ROC plot for the SMM-align method, which has an AUC value of 0.847.

FIG. 5 is a graph of the result of comparison of ROC curves obtained by the method of the present application with those obtained by other methods when H is a 20mer peptide chain length in MHC class II. Line H1 represents the ROC curve for the methods of the present application, with AUC values of 0.890 for the methods of the present application. Line H2 is the ROC plot for the NN-align method, which has an AUC value of 0.805. The line H3 is the ROC plot for NETMHCIIPan method, which has AUC values of 0.817. The H4 line is the ROC plot for the SMM-align method, which has an AUC value of 0.807.

As can be seen from fig. 5, the AUC values of the methods of the present application are all higher than those of the other methods, and thus the methods of the present application perform better than the other methods.

By way of example, a graph of the relationship between the performance of the method of the present application and the sequence information is shown in fig. 6. Panels a and B in fig. 6 are graphs of the results of five-fold cross validation based on training set data. Panels C, D and E in fig. 6 are data sets corresponding to three allele data, and corresponding sequence conservation maps for the bound and non-bound peptide chains. Panel C, D and E in FIG. 6 show the ratios of the occurrence of the corresponding amino acids at each position of the peptide chain sequence. The larger the difference between the graphs of the bound peptide chain and the non-bound peptide chain is, the better the performance of the model trained by corresponding allele data is, and the relationship between the model performance and the sequence information of the data is illustrated. Wherein, each MHC class (MHCI and MHCII) corresponds to a plurality of alleles, each allele corresponds to a peptide chain data set, and each data set has a plurality of different peptide chain data.

In conclusion, by adopting the recognition method of the MHC combinable with the peptide chain, the peptide chain recognition is more accurate, the peptide chains with different lengths can be recognized, different peptide chain recognition models do not need to be trained based on the peptide chains with different lengths, and the universality of the recognition of the peptide chains with different lengths is realized.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by functions and internal logic of the process, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Fig. 7 shows a block diagram of the recognition device for MHC-bindable peptide chains according to the present embodiment, which corresponds to the recognition method for MHC-bindable peptide chains described in the above embodiments, and only the parts related to the present embodiment are shown for convenience of explanation.

Referring to fig. 7, the apparatus 200 may include: an information acquisition module 210, an information conversion module 220, and a probability calculation module 230.

The information obtaining module 210 is configured to obtain sequence information of a peptide chain to be identified;

an information conversion module 220, configured to convert the sequence information into evolution information of the peptide chain;

and a probability calculation module 230, configured to determine a probability value of the binding of the peptide chain to the MHC according to the evolution information.

In a possible implementation manner, the probability calculation module 230 may specifically be configured to:

performing deep information characterization on the evolution information by using the trained bidirectional long-short term memory network to obtain a first vector;

based on the first vector, a probability value of binding of the peptide chain to MHC is determined.

In a possible implementation manner, the information conversion module 220 may specifically be configured to:

In one possible implementation, the convolutional neural network includes:

transpose of the kth convolution, X_iThe vector of the ith column in a second matrix is used for representing the sequence information.

In one possible implementation, the training process of the convolutional neural network includes:

determining initial parameters of convolution kernels in the convolution neural initial network based on a BLOSUM matrix, wherein the parameters in each row of the BLOSUM matrix are the initial parameters of one convolution kernel, or the parameters in each column of the BLOSUM matrix are the initial parameters of one convolution kernel;

and training the convolutional neural initial network according to a preset training sample to obtain the trained convolutional neural network.

performing first combination processing on the first vector by using the trained first full connection layer to obtain a second vector;

carrying out second combination processing on the second vector by using the trained second full-connection layer to obtain a first value;

In a possible implementation manner, the information obtaining module 210 may specifically be configured to:

and carrying out one-bit effective coding treatment on the peptide chain to obtain the sequence information of the peptide chain.

It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

An embodiment of the present application further provides a terminal device, and referring to fig. 8, the terminal device 400 may include: at least one processor 410, a memory 420, and a computer program stored in the memory 420 and executable on the at least one processor 410, wherein the processor 410 implements the steps of any of the method embodiments described above when executing the computer program, for example, the steps S101 to S103 in the embodiment shown in fig. 2. Alternatively, the processor 410, when executing the computer program, implements the functions of the modules/units in the above-described device embodiments, such as the functions of the modules 210 to 230 shown in fig. 7.

Illustratively, a computer program may be partitioned into one or more modules/units, which are stored in memory 420 and executed by processor 410 to complete the application. The one or more modules/units may be a series of computer program segments capable of performing specific functions, which are used to describe the execution of the computer program in the terminal device 400.

Those skilled in the art will appreciate that fig. 8 is merely an example of a terminal device and is not limiting and may include more or fewer components than shown, or some components may be combined, or different components such as input output devices, network access devices, buses, etc.

The Processor 410 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 420 may be an internal storage unit of the terminal device, or may be an external storage device of the terminal device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. The memory 420 is used for storing the computer programs and other programs and data required by the terminal device. The memory 420 may also be used to temporarily store data that has been output or is to be output.

The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.

The method for identifying the MHC-bindable peptide chain provided by the embodiment of the present application may be applied to a computer, a tablet computer, a notebook computer, a netbook, a Personal Digital Assistant (PDA), and other terminal devices, and the embodiment of the present application does not limit the specific type of the terminal device.

In the above embodiments, the description of each embodiment has its own emphasis, and reference may be made to the related description of other embodiments for parts that are not described or recited in any embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed terminal device, apparatus and method may be implemented in other ways. For example, the above-described terminal device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical function division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the method embodiments described above when the computer program is executed by one or more processors.

Also, as a computer program product, when the computer program product runs on a terminal device, the terminal device is enabled to implement the steps in the above-mentioned method embodiments when executed.

Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media which may not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the embodiments of the present application, and they should be construed as being included in the present application.

Claims

1. A method of recognizing an MHC-bindable peptide chain, comprising:

acquiring sequence information of a peptide chain to be identified;

2. The method for identifying MHC-bindable peptide chains according to claim 1, wherein said determining a probability value of binding of said peptide chain to said MHC from said evolution information comprises:

3. The method for recognizing an MHC-bindable peptide chain according to claim 1, wherein said converting said sequence information into evolutionary information of said peptide chain comprises:

4. The method of identifying MHC-bindable peptide chains according to claim 3, wherein said convolutional neural network comprises:

wherein, Evo (X)_k，iIs the value of the ith column of the kth row in a first matrix used to characterize theThe evolution information is obtained by the information of evolution,

5. The method for identifying MHC bindable peptide chains according to claim 3, wherein the training process of the convolutional neural network comprises:

6. The method of claim 2, wherein determining a probability value of the binding of the peptide chain to the MHC based on the first vector comprises:

carrying out first combination processing on the first vector by utilizing the trained first full-connection layer to obtain a second vector;

7. The method for recognizing MHC-bindable peptide chain according to any one of claims 1 to 6, wherein the obtaining of the sequence information of the peptide chain to be recognized comprises:

8. An MHC-bindable peptide chain recognition device comprising:

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the method for recognizing MHC-bindable peptide chains according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, implements the method for recognizing MHC-bindable peptide chain according to any one of claims 1 to 7.