CN111798919B - Tumor neoantigen prediction method, prediction device and storage medium - Google Patents

Tumor neoantigen prediction method, prediction device and storage medium Download PDF

Info

Publication number
CN111798919B
CN111798919B CN202010587400.2A CN202010587400A CN111798919B CN 111798919 B CN111798919 B CN 111798919B CN 202010587400 A CN202010587400 A CN 202010587400A CN 111798919 B CN111798919 B CN 111798919B
Authority
CN
China
Prior art keywords
amino acid
codes
chromatin
conformation
peptide sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010587400.2A
Other languages
Chinese (zh)
Other versions
CN111798919A (en
Inventor
石毅
贺光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shi Yi
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202010587400.2A priority Critical patent/CN111798919B/en
Publication of CN111798919A publication Critical patent/CN111798919A/en
Application granted granted Critical
Publication of CN111798919B publication Critical patent/CN111798919B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Physiology (AREA)
  • Peptides Or Proteins (AREA)

Abstract

The invention relates to a tumor neoantigen prediction method, a prediction device and a storage medium based on chromatin advanced conformation and deep sparse learning, wherein the method invents a deep neural network prediction model based on group selection, and trains the model through training data to obtain an immunogenicity prediction value of an object to be predicted (namely a potential tumor neoantigen peptide); wherein each sample used in training the deep neural network prediction model includes chromatin 3D conformation information and features generated based on the polypeptide amino acid sequence. Compared with the prior art, the method has the advantages of high prediction precision, convenience in prediction and the like.

Description

Tumor neoantigen prediction method, prediction device and storage medium
Technical Field
The invention relates to the field of prediction of new antigens in tumor personalized immunotherapy, in particular to a tumor new antigen prediction method, a prediction device and a storage medium based on chromatin advanced conformation and deep sparse learning.
Background
At present, the conventional treatment of tumor patients mainly depends on non-individualized surgical excision, chemoradiotherapy, targeted drug therapy and other means, but the conventional means have many problems, such as incomplete treatment, great side effect, easy tumor metastasis resistance and the like, and the life cycle of the tumor patients is only temporarily prolonged.
In recent years, the approach of tumor immunotherapy by targeting tumor cells of patients through their own immune system has entered the field of people. In personalized tumor immunotherapy, tumor patient-specific target molecules that play a critical role are called tumor neoantigens. The nature of the tumor neoantigen is protein, is generated by tumor genome mutation, and is different from the tumor self-protein antigen which is abnormally expressed because of containing non-synonymous mutation. In vivo, the tumor neoantigen can be recognized as a foreign antigen by the autoimmune system, and is not affected by central tolerance, thereby enabling the autoimmune system to specifically target tumor cells of a patient. Therefore, the tumor neoantigen is prepared into a vaccine or a polypeptide preparation for tumor immunotherapy, can selectively kill tumor cells, and has high safety and obvious effect. In this strategy, it is critical to individually select the tumor neoantigen with good expected curative effect from a plurality of peptide fragments which can distinguish tumor from normal tissue accurately and efficiently. However, the existing selection technology of tumor neoantigens still has more technical problems, such as large selection workload, low precision and the like.
The vigorous development of genomics in the last twenty years provides powerful support for tumor research. By comparing the genomes of tumor cells and normal cells, a plurality of genetic variations closely related to tumorigenesis and development are discovered, and the molecular mechanism of the genetic variations in tumorigenesis and development is partially revealed, so that powerful technical support is provided for developing novel tumor diagnosis, typing, prognosis and guiding clinical treatment. In the aspect of somatic mutation of tumor genome, it is found that single mutation on chromatin can not cause tumor, and tumor cells of almost every tumor patient can find numerous genetic and epigenetic variations through detection, including coexistence of several to hundreds of gene mutations, chromosomal translocation accompanied with gene mutation, chromosomal copy number variation at multiple positions, and the like, which are very common. More and more evidences show that the genetic variation which occurs along with (simultaneously or sequentially) has an intrinsic rule, some gene mutations are often accompanied with other gene mutations but do not occur randomly, the intrinsic genetic structure basis of the gene mutations which occur along with is not clear, but the establishment of the relevant mechanism lays the theoretical basis for deeply knowing the molecular mechanism of tumor occurrence and development, especially for knowing the causal relationship of the genetic events in tumor development, provides an effective means for accurately selecting new tumor antigens, and further provides a certain basis for tumor diagnosis and treatment.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a tumor neoantigen prediction method, a prediction device and a storage medium based on chromatin high-order conformation and deep sparse learning, which have high prediction precision and convenient prediction.
The purpose of the invention can be realized by the following technical scheme:
a tumor neoantigen prediction method based on chromatin advanced conformation and deep sparse learning is characterized in that a trained deep neural network prediction model based on group selection is used for processing a to-be-predicted object to obtain tumor neoantigen immunogenicity information corresponding to the to-be-predicted object;
wherein each sample used in training the deep neural network prediction model includes chromatin 3D conformation information and features generated based on the polypeptide amino acid sequence.
Further, each of the samples features are on the order of thousands of levels.
Each feature of the sample belongs to a certain group, and in the neural network model training, all the features in the certain group are selected or eliminated.
Further, the output of the deep neural network prediction model based on group selection comprises a new antigen with activated immunogenicity and a plurality of characteristics with the highest association degree with the new antigen.
Further, the deep neural network prediction model based on group selection is in a form of a full connection layer.
Further, the chromatin 3D conformation information is derived from a cellular chromatin 3D conformation thermodynamic map matrix obtained by Hi-C (chromatin conformation capture technique) experiments.
Further, the chromatin 3D conformation information was obtained from public Hi-C datasets.
Further, the characteristics generated based on the amino acid sequence of the polypeptide include characteristics of the polypeptide including the site of the amino acid mutation and information on high expression of the gene including the mutation.
The invention also provides a tumor neoantigen prediction device based on chromatin high-order conformation and deep sparse learning, which comprises:
a data acquisition unit that acquires samples for training, each sample including chromatin 3D conformation information and features generated based on the polypeptide amino acid sequence;
the model training unit is used for obtaining a deep neural network prediction model based on group selection based on the sample training;
and the prediction unit is used for acquiring the object to be predicted, processing the object to be predicted through the deep neural network prediction model selected based on the group and acquiring the immunogenicity information of the tumor neoantigen corresponding to the object to be predicted.
The invention also provides a computer-readable storage medium comprising a computer program which can be executed by a processor to implement the prediction method.
Compared with the prior art, the invention has the following beneficial effects:
firstly, the invention provides a method for examining and analyzing whether an amino acid polypeptide antigen corresponding to a mutated DNA site can activate T cell immunogenicity or not from the perspective of chromatin 3D conformation based on a plurality of creative researches on chromatin high-order conformation by the inventor, and adds the chromatin 3D conformation into a characteristic set predicted by machine learning, namely the space distribution information of the DNA mutation site corresponding to a neoantigen peptide on chromatin, so as to obviously improve the prediction accuracy of whether the neoantigen has immunogenicity or not.
Secondly, the invention autonomously develops a Group Selection based Deep Neural Network (DNN-GFS) classification model, has convenient prediction and small prediction workload (unnecessary nodes and edges of an input layer in the Neural Network are cut out), and has the following advantages:
1. thousands of features including comprehensive chromatin three-dimensional structure information are adopted in the feature set, so that the overfitting problem of the traditional deep neural network under the condition of more features can be avoided better, and the overall classification prediction accuracy is improved;
2. different from the traditional deep neural network which is just a black box for a user, the method can select the input features while classifying and predicting, and selects the most key features, thereby providing a basis for further mining the correlation between the input features and the output result;
3. the invention adopts the strategy of grouping selection, and can select the characteristics which should be together in groups, namely, simultaneously select the characteristics of the same group or simultaneously remove the characteristics of the same group, so that the model can be well compatible with the prior knowledge of group classification, and the self-learning effect of the model is improved.
Drawings
FIG. 1 is a schematic diagram of the principle of the present invention, and the problem solved by the present invention is mainly the content in the dashed box;
FIG. 2 is a schematic representation of the authentic signature of 3909 peptides that are immunopositive and immunopositive;
FIG. 3 is a ROC graph (ROC curve) comparing the prediction method (DNN-GFS) of the present invention with different prediction methods such as Deep Neural Network (DNN), support Vector Machine (SVM), logistic Regression (LR), K-nearest neighbor algorithm (KNN), neopsee, pTuneos, deephlApan, netherMHCpan, netherMHC and IEDB immunene under 5-partition and LOO cross-validation;
FIG. 4 is a graph comparing accuracy versus recall (P-R curves) for different prediction methods under division 5 and LOO cross validation;
FIG. 5 is a comparison of the prediction effectiveness of the ROC curve and the P-R curve on different independent validation datasets;
FIG. 6 is a comparison of the score distributions scored on positive and negative samples for different methods, LOO cross validation, 5 partition cross validation, and scoring on validation datasets, respectively;
fig. 7 is a schematic diagram of the deep neural network (DNN-GFS) based on group feature selection of the present invention, where a is a graphical representation of features belonging to different large groups, b is an explanation of DNN-GFS architecture and the effect of group feature selection, and c illustrates the geometry principles of different regularization terms applied to weighted neural network routing and two-dimensional projection from three representative perspectives.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. The present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the scope of the present invention is not limited to the following embodiments.
In recent years, people can more globally excavate the abnormal chromatin structure of tumor cells through the Hi-C technology, and find that the remote control between chromatin probably plays a key role in gene control. The inventor of the present application found in previous work that point mutations accompanied in almost all tumors have obvious proximity in chromatin three-dimensional conformation, so that the concept of "spatial mutation hot spot of tumor" is proposed and published, and thus extended from the concept, we consider that the concept of "chromatin three-dimensional conformation driven cell functional block" is very important, and can help people to examine the development of tumor in a new angle. The existing method for discovering the tumor immunity personalized new antigen peptide usually focuses on the sequence attribute of the antigen peptide, the interaction between the antigen peptide and MHC molecules, the interaction between an antigen peptide-MHC molecule compound pMHC and TCR on somatic cells and the like, but ignores the source of the antigen peptide, namely the corresponding mutated gene, and the distribution of the special property on chromatin. The distribution rule of genes corresponding to antigen peptides on chromatin space is systematically analyzed for the first time, and the obvious difference between the distribution of a neoantigen with immunogenicity (capable of activating T cells) and a neoantigen without immunogenicity on the chromatin space is found, so that the spatial distribution information of the neoantigen on the chromatin is added into a feature set of a machine learning prediction algorithm, and the prediction accuracy of whether the neoantigen has the immunogenicity or not is found to be obviously improved.
Based on the basis, the invention realizes a tumor neoantigen prediction method based on chromatin advanced conformation and Deep sparse learning, and the method processes a to-be-predicted object (namely potential tumor neoantigen peptide) through a trained Deep Neural Network prediction model based on Group Selection (Deep sparse learning algorithm based on Group Feature Selection, DNN-GFS, group Feature Selection based Deep Neural Network), so as to obtain tumor neoantigen immunogenicity information corresponding to the to-be-predicted object; wherein each sample used in training the deep neural network prediction model includes chromatin 3D conformation information and features generated based on the polypeptide amino acid sequence. The principle of the method is shown in the dashed box of fig. 1.
The feature set of each sample is formed by combining chromatin 3D conformation information with other features generated based on the amino acid sequence of a polypeptide, and represents the feature set of a certain polypeptide. The characteristic magnitude of each sample is thousands of levels (more than 5000 characteristics), and specifically comprises < x, y, z >3D coordinates of a DNA site corresponding to a target peptide on a chromatin 3D space, a distance from a nucleus center (or a nucleus membrane), HLA subtype codes of MHC molecules presenting antigen peptides, the occurrence frequency of 20 amino acids in the target peptide, amino acid sparse codes of antigen peptide sequences, amino acid BLOSM codes of antigen peptide sequences, amino acid BLOMAP codes of antigen peptide sequences, amino acid side chain classification codes of antigen peptide sequences, amino acid side chain polarity codes of antigen peptide sequences, amino acid side chain charge codes of antigen peptide sequences, amino acid side chain hydrophilicity and hydrophobicity codes of antigen peptide sequences, amino acid side chain molecular weight codes of antigen peptide sequences, occurrence frequency codes of amino acid side chains of antigen peptide sequences in a biological population, and AAindex-based codes of all amino acid index indexes listed in an AAindex database. In this embodiment, each sample is a vector containing 5459 features.
In this example, additional polypeptide amino acid sequence-based features in the sample were obtained by high throughput whole exome sequencing (ExonSeq) and whole transcriptome sequencing (RNASeq) protocols. According to the sequencing result of the whole exome, the mutation information of the tumor cells in the sample can be obtained, and finally the specific coordinate position of a certain mutation on several chromosomes is obtained, and the mutation site of the corresponding coding amino acid is found out; based on the whole transcriptome sequencing results, it can be analyzed that those genes are highly expressed in tumor cells. On the basis of the above results, polypeptides containing amino acid mutation sites are enumerated, and then high-expression variant polypeptides based on the polypeptides are selected, wherein the length of the polypeptide is defined as 9 by default, but not limited to 9.
The chromatin 3D conformation information is a chromatin 3D conformation thermodynamic map matrix of tumor cells. In this example, chromatin 3D conformation information in a sample was obtained by Hi-C experiments or replaced with multiple Hi-C datasets in a common database.
The invention adopts a Molecular Dynamics (MD) method to develop a human genome three-dimensional conformation modeling method with the resolution of 500kb (bin-size). These vessels are coarse-grained beads, and the complete genome is represented by a bead structure consisting of 23 polymer chains. The spatial position of the beads is influenced by chromatin connectivity, which limits the linear adjacency of the beads in the near 3D range, and by chromatin activity, which ensures that the active region is close to the centre of the nucleus. Chromatin activity was determined based on the directly calculable spacing of the Hi-C matrix as described above. The distance of the beads from the core center is assigned according to the interval index, and then the conformation of chromatin is optimized from a random structure by applying a bias potential to satisfy these distance constraints using molecular dynamics methods. For each cell line, 300 feasible conformational structures were optimized from the random conformational structures to reduce possible variation for further analysis.
The prediction method adopts a deep neural network prediction model based on group selection to score and classify the polypeptides encoded by the input feature set, and can find out the polypeptides which are most likely to activate the immunogenicity of T cells. In this example, the input of the deep neural network prediction model is 5459 feature codes of a potential new antigen peptide sequence, and the output is a score of whether the polypeptide can activate immunogenicity, and a higher score indicates that the immunogenicity of T cells can be activated more. When the model is trained, as shown in fig. 7, all features of each sample belong to at most one group, and one group comprises one or more elements, so that a grouping selection strategy is conveniently adopted, some features which should appear together or be removed together, namely group features, can be selected simultaneously or removed simultaneously, the self-learning effectiveness of the model is improved, the overfitting risk is reduced, and the algorithm calculation efficiency is improved.
The deep neural network prediction model of the embodiment adopts a full-connection layer form, the output of the deep neural network prediction model comprises a new antigen with immunogenicity and a plurality of characteristics with the highest relevance with the new antigen, the most key characteristics are screened out during prediction, and the relationship between the characteristics and the output result can be better clarified.
Fig. 2-6 are schematic diagrams showing the predicted results of the above prediction method (DNN-GFS) and different classification methods such as DNN, SVM, LR, KNN, neopesee, pTuneos, dephlapan, netMHCpan, netMHC and IEDB immuno. This set of graphs illustrates that, taken together, our method DNN-GFS has superior predictive potency for novel antigens over other traditional machine learning algorithms.
After the prediction is carried out by the prediction method, the polypeptide sequence with high score and classified as positive is synthesized to obtain the predicted tumor neoantigen, and then the immune efficacy observation test can be carried out on the predicted tumor neoantigen by adopting a mouse.
In another embodiment, a tumor neoantigen prediction device based on chromatin high-order conformation and deep sparse learning is provided, comprising a data acquisition unit, a model training unit and a prediction unit, wherein the data acquisition unit acquires samples for training, each sample comprising chromatin 3D conformation information and features generated based on a polypeptide amino acid sequence; the model training unit is used for obtaining a deep neural network prediction model based on group selection based on the sample training; and the prediction unit acquires an object to be predicted, processes the object to be predicted through the deep neural network prediction model selected based on the group, and acquires tumor new antigen information corresponding to the object to be predicted.
In another embodiment, a computer-readable storage medium is provided, comprising a computer program executable by a processor to implement the prediction method.
In another embodiment, a web page is provided, after the object to be predicted is obtained, the prediction result of the tumor neoantigen is rapidly obtained by using the prediction method.
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions that can be obtained by a person skilled in the art through logical analysis, reasoning or limited experiments based on the prior art according to the concepts of the present invention should be within the scope of protection determined by the claims.

Claims (9)

1. A tumor neoantigen prediction method based on chromatin advanced conformation and deep sparse learning is characterized in that a to-be-predicted object is processed through a trained deep neural network prediction model selected based on group characteristics, and tumor neoantigen immunogenicity information corresponding to the to-be-predicted object is obtained;
the characteristic set of each sample adopted during training of the deep neural network prediction model is formed by combining chromatin 3D conformation information and other characteristics generated based on polypeptide amino acid sequences, wherein the chromatin 3D conformation information is obtained by a human genome three-dimensional conformation modeling method based on molecular dynamics, each characteristic of a sample belongs to a certain group, and in the training of the neural network model, the characteristics in the certain group are selected or removed completely;
the characteristics of each sample comprise < x, y, z >3D coordinates of a target peptide corresponding DNA site on a chromatin 3D space, a distance between the target peptide corresponding DNA site and a nucleus center or a nucleus membrane, HLA subtype codes of MHC molecules presenting antigen peptides, appearance frequency of 20 amino acids in the target peptide, amino acid sparse codes of the antigen peptide sequence, amino acid BLOSM codes of the antigen peptide sequence, amino acid BLOMAP codes of the antigen peptide sequence, amino acid side chain classification codes of the antigen peptide sequence, amino acid side chain polarity codes of the antigen peptide sequence, amino acid side chain charge codes of the antigen peptide sequence, amino acid side chain hydrophilicity and hydrophobicity codes of the antigen peptide sequence, amino acid side chain molecular weight codes of the antigen peptide sequence, appearance frequency codes of the amino acid side chains of the antigen peptide sequence in a biological population and codes based on all amino acid AAindex indexes listed in an AAindex database.
2. The method of claim 1, wherein each sample is characterized by an order of thousands of levels.
3. The method of claim 1, wherein the output of the deep neural network prediction model selected based on the group characteristics comprises prediction of tumor immunogenicity of potential neoantigens and a plurality of characteristics with highest association with the neoantigens.
4. The method for predicting tumor neoantigens based on chromatin high order conformation and deep sparse learning according to claim 1, wherein the deep neural network prediction model selected based on group characteristics is in a fully connected layer form.
5. The method for predicting tumor neoantigens based on high order conformation and deep sparse learning of chromatin according to claim 1, wherein said chromatin 3D conformation information is derived from a cellular chromatin 3D conformation thermodynamic map matrix obtained by Hi-C experiments.
6. The method for predicting tumor neoantigens based on high order conformation and deep sparse learning of chromatin according to claim 1, wherein said chromatin 3D conformation information is obtained from a public Hi-C dataset.
7. The method of claim 1, wherein the characteristics generated based on the amino acid sequence of the polypeptide include the characteristics of the polypeptide containing the mutation site of the amino acid and the high expression information of the gene containing the mutation.
8. A tumor neoantigen prediction device based on chromatin high order conformation and deep sparse learning, comprising:
the data acquisition unit is used for acquiring samples for training, the feature set of each sample is formed by combining chromatin 3D conformation information and other features generated based on polypeptide amino acid sequences, and the chromatin 3D conformation information is acquired by adopting a human genome three-dimensional conformation modeling method based on molecular dynamics;
the model training unit is used for obtaining a deep neural network prediction model selected based on group characteristics based on the sample training, each characteristic of the sample belongs to a certain group, and in the neural network model training, the characteristics in the certain group are selected or removed;
the prediction unit is used for acquiring an object to be predicted, processing the object to be predicted through the deep neural network prediction model selected based on the group characteristics and acquiring tumor new antigen immunogenicity information corresponding to the object to be predicted;
the characteristics of each sample comprise < x, y, z >3D coordinates of a target peptide corresponding DNA site on a chromatin 3D space, a distance between the target peptide corresponding DNA site and a nucleus center or a nucleus membrane, HLA subtype codes of MHC molecules presenting antigen peptides, appearance frequency of 20 amino acids in the target peptide, amino acid sparse codes of the antigen peptide sequence, amino acid BLOSM codes of the antigen peptide sequence, amino acid BLOMAP codes of the antigen peptide sequence, amino acid side chain classification codes of the antigen peptide sequence, amino acid side chain polarity codes of the antigen peptide sequence, amino acid side chain charge codes of the antigen peptide sequence, amino acid side chain hydrophilicity and hydrophobicity codes of the antigen peptide sequence, amino acid side chain molecular weight codes of the antigen peptide sequence, appearance frequency codes of the amino acid side chains of the antigen peptide sequence in a biological population and codes based on all amino acid AAindex indexes listed in an AAindex database.
9. A computer-readable storage medium, comprising a computer program executable by a processor to perform the prediction method of any one of claims 1-7.
CN202010587400.2A 2020-06-24 2020-06-24 Tumor neoantigen prediction method, prediction device and storage medium Active CN111798919B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010587400.2A CN111798919B (en) 2020-06-24 2020-06-24 Tumor neoantigen prediction method, prediction device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010587400.2A CN111798919B (en) 2020-06-24 2020-06-24 Tumor neoantigen prediction method, prediction device and storage medium

Publications (2)

Publication Number Publication Date
CN111798919A CN111798919A (en) 2020-10-20
CN111798919B true CN111798919B (en) 2022-11-25

Family

ID=72803402

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010587400.2A Active CN111798919B (en) 2020-06-24 2020-06-24 Tumor neoantigen prediction method, prediction device and storage medium

Country Status (1)

Country Link
CN (1) CN111798919B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113129998B (en) * 2021-04-23 2022-06-21 云测智能科技有限公司 Method for constructing prediction model of clinical individualized tumor neoantigen
CN114242159B (en) * 2022-02-24 2022-06-07 北京晶泰科技有限公司 Method for constructing antigen peptide presentation prediction model, and antigen peptide prediction method and device
WO2023168079A2 (en) * 2022-03-04 2023-09-07 New York University Cell type-specific prediction of 3d chromatin architecture

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109073659A (en) * 2016-02-16 2018-12-21 新加坡科技研究局 Apparent gene group analysis discloses the body cell promoter situation of primary gastric adenocarcinomas
CN110600077A (en) * 2019-08-29 2019-12-20 北京优迅医学检验实验室有限公司 Prediction method of tumor neoantigen and application thereof
CN110592213A (en) * 2019-09-02 2019-12-20 深圳市新合生物医疗科技有限公司 Gene panel for prediction of neoantigen load and detection of genomic mutations
CN110770838A (en) * 2017-12-01 2020-02-07 Illumina公司 Method and system for determining clonality of somatic mutations

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201608000D0 (en) * 2016-05-06 2016-06-22 Oxford Biodynamics Ltd Chromosome detection
CN107119120A (en) * 2017-05-04 2017-09-01 河海大学常州校区 A kind of key effect molecular detecting method based on chromatin 3D conformation technologies
CN108300767B (en) * 2017-10-27 2021-08-20 清华大学 Analysis method for interaction of nucleic acid segments in nucleic acid complex
US20200411135A1 (en) * 2018-02-27 2020-12-31 Gritstone Oncology, Inc. Neoantigen Identification with Pan-Allele Models
CN110853706B (en) * 2018-08-01 2022-07-22 中国科学院深圳先进技术研究院 Tumor clone composition construction method and system integrating epigenetics
CN109021062B (en) * 2018-08-06 2021-08-20 倍而达药业(苏州)有限公司 Screening method of tumor neoantigen
CN110277135B (en) * 2019-08-10 2021-06-01 杭州新范式生物医药科技有限公司 Method and system for selecting individualized tumor neoantigen based on expected curative effect
CN110752041B (en) * 2019-10-23 2023-11-07 深圳裕策生物科技有限公司 Method, device and storage medium for predicting neoantigen based on second-generation sequencing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109073659A (en) * 2016-02-16 2018-12-21 新加坡科技研究局 Apparent gene group analysis discloses the body cell promoter situation of primary gastric adenocarcinomas
CN110770838A (en) * 2017-12-01 2020-02-07 Illumina公司 Method and system for determining clonality of somatic mutations
CN110600077A (en) * 2019-08-29 2019-12-20 北京优迅医学检验实验室有限公司 Prediction method of tumor neoantigen and application thereof
CN110592213A (en) * 2019-09-02 2019-12-20 深圳市新合生物医疗科技有限公司 Gene panel for prediction of neoantigen load and detection of genomic mutations

Also Published As

Publication number Publication date
CN111798919A (en) 2020-10-20

Similar Documents

Publication Publication Date Title
CN111798919B (en) Tumor neoantigen prediction method, prediction device and storage medium
Tampuu et al. ViraMiner: Deep learning on raw DNA sequences for identifying viral genomes in human samples
DeWitt III et al. Human T cell receptor occurrence patterns encode immune history, genetic background, and receptor specificity
Luo et al. Disease gene prediction by integrating ppi networks, clinical rna-seq data and omim data
Chen et al. Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of Chou’s pseudo amino acid composition
Sayal et al. Quantitative perturbation-based analysis of gene expression predicts enhancer activity in early Drosophila embryo
US10886007B2 (en) Methods and systems for identification of biomolecule sequence coevolution and applications thereof
A Theofilatos et al. Computational approaches for the prediction of protein-protein interactions: a survey
Peng et al. A novel codon-based de Bruijn graph algorithm for gene construction from unassembled transcriptomes
Bi et al. Prediction of epitope-associated TCR by using network topological similarity based on deepwalk
Palmal et al. Integrative prognostic modeling for breast cancer: Unveiling optimal multimodal combinations using graph convolutional networks and calibrated random forest
Liu et al. Computational intelligence and bioinformatics
EP4350708A1 (en) Method for diagnosing cancer and predicting cancer type by using terminal sequence motif frequency and size of cell-free nucleic acid fragment
Wang et al. Sequence-based protein-protein interaction prediction via support vector machine
Tahmasebipour et al. Disease-gene association using a genetic algorithm
Lesturgie et al. Ecological and biogeographic features shaped the complex evolutionary history of an iconic apex predator (Galeocerdo cuvier)
Azé et al. Using Kendall-τ meta-bagging to improve protein-protein docking predictions
Shao et al. Computational prediction of human body-fluid protein
Zhu et al. Identifying virus-receptor interactions through matrix completion with similarity fusion
Kavousi et al. A post-method condition analysis of using ensemble machine learning for cancer prognosis and diagnosis: a systematic review
BASU Application of machine learning polymer models explaining hypokalemia in COVID-19 patients
Xie et al. A review of artificial intelligence applications in bacterial genomics
Huang Computational Discovery and Annotations of Cell-Type Specific Long-Range Gene Regulation
Zeng et al. Chrombus-XMBD: A Graph Generative Model Predicting 3D-Genome, ab initio from Chromatin Features
Mutalib et al. Towards applying associative classifier for genetic variants

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20231029

Address after: 200240 No. 800, Dongchuan Road, Shanghai, Minhang District

Patentee after: Shi Yi

Address before: 200240 No. 800, Dongchuan Road, Shanghai, Minhang District

Patentee before: SHANGHAI JIAO TONG University