CN111755067A - Screening method of tumor neoantigen - Google Patents
Screening method of tumor neoantigen Download PDFInfo
- Publication number
- CN111755067A CN111755067A CN201910242904.8A CN201910242904A CN111755067A CN 111755067 A CN111755067 A CN 111755067A CN 201910242904 A CN201910242904 A CN 201910242904A CN 111755067 A CN111755067 A CN 111755067A
- Authority
- CN
- China
- Prior art keywords
- tumor
- mutation
- somatic cells
- analysis
- variation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 206010028980 Neoplasm Diseases 0.000 title claims abstract description 178
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000012216 screening Methods 0.000 title claims abstract description 20
- 230000035772 mutation Effects 0.000 claims abstract description 56
- 238000004458 analytical method Methods 0.000 claims abstract description 45
- 239000000427 antigen Substances 0.000 claims abstract description 26
- 108091007433 antigens Proteins 0.000 claims abstract description 26
- 102000036639 antigens Human genes 0.000 claims abstract description 26
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 24
- 210000004369 blood Anatomy 0.000 claims abstract description 21
- 239000008280 blood Substances 0.000 claims abstract description 21
- 230000036438 mutation frequency Effects 0.000 claims abstract description 16
- 108091032973 (ribonucleotides)n+m Proteins 0.000 claims abstract description 13
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 12
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 12
- 230000009149 molecular binding Effects 0.000 claims abstract description 4
- 238000010195 expression analysis Methods 0.000 claims abstract description 3
- 108090000765 processed proteins & peptides Proteins 0.000 claims description 48
- 210000001082 somatic cell Anatomy 0.000 claims description 47
- 238000012163 sequencing technique Methods 0.000 claims description 46
- 102000007079 Peptide Fragments Human genes 0.000 claims description 15
- 108010033276 Peptide Fragments Proteins 0.000 claims description 15
- 230000000392 somatic effect Effects 0.000 claims description 13
- 238000012217 deletion Methods 0.000 claims description 12
- 230000037430 deletion Effects 0.000 claims description 12
- 230000037433 frameshift Effects 0.000 claims description 12
- 238000003780 insertion Methods 0.000 claims description 12
- 230000037431 insertion Effects 0.000 claims description 12
- 231100000221 frame shift mutation induction Toxicity 0.000 claims description 10
- 102000004196 processed proteins & peptides Human genes 0.000 claims description 10
- 230000009948 RNA mutation Effects 0.000 claims description 5
- 108700018351 Major Histocompatibility Complex Proteins 0.000 description 18
- 230000020382 suppression by virus of host antigen processing and presentation of peptide antigen via MHC class I Effects 0.000 description 18
- 206010069754 Acquired gene mutation Diseases 0.000 description 10
- 230000037439 somatic mutation Effects 0.000 description 10
- 108020004414 DNA Proteins 0.000 description 7
- 210000004881 tumor cell Anatomy 0.000 description 7
- 229920001184 polypeptide Polymers 0.000 description 6
- 230000027455 binding Effects 0.000 description 5
- 239000003446 ligand Substances 0.000 description 5
- 102000043129 MHC class I family Human genes 0.000 description 4
- 210000001744 T-lymphocyte Anatomy 0.000 description 4
- 239000002299 complementary DNA Substances 0.000 description 4
- 238000010828 elution Methods 0.000 description 4
- 230000009707 neogenesis Effects 0.000 description 4
- 108091054437 MHC class I family Proteins 0.000 description 3
- 238000003559 RNA-seq method Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- JJWKPURADFRFRB-UHFFFAOYSA-N carbonyl sulfide Chemical compound O=C=S JJWKPURADFRFRB-UHFFFAOYSA-N 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 108020005345 3' Untranslated Regions Proteins 0.000 description 1
- 108020003589 5' Untranslated Regions Proteins 0.000 description 1
- 206010064571 Gene mutation Diseases 0.000 description 1
- 108091007460 Long intergenic noncoding RNA Proteins 0.000 description 1
- 108700005089 MHC Class I Genes Proteins 0.000 description 1
- 108700005092 MHC Class II Genes Proteins 0.000 description 1
- 102000043131 MHC class II family Human genes 0.000 description 1
- 108091054438 MHC class II family Proteins 0.000 description 1
- 238000010357 RNA editing Methods 0.000 description 1
- 230000026279 RNA modification Effects 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 150000001413 amino acids Chemical class 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 229940022399 cancer vaccine Drugs 0.000 description 1
- 238000009566 cancer vaccine Methods 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000008826 genomic mutation Effects 0.000 description 1
- 210000004602 germ cell Anatomy 0.000 description 1
- 210000002865 immune cell Anatomy 0.000 description 1
- 230000036039 immunity Effects 0.000 description 1
- 230000005847 immunogenicity Effects 0.000 description 1
- 210000000265 leukocyte Anatomy 0.000 description 1
- 238000004949 mass spectrometry Methods 0.000 description 1
- 238000001819 mass spectrum Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007479 molecular analysis Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000010839 reverse transcription Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000002103 transcriptional effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 229960005486 vaccine Drugs 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Engineering & Computer Science (AREA)
- Biotechnology (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Analytical Chemistry (AREA)
- Bioinformatics & Computational Biology (AREA)
- Chemical & Material Sciences (AREA)
- Evolutionary Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The invention discloses a method for screening tumor neoantigens, aiming at solving the problem that the existing method can not screen high-quality tumor neoantigens from multiple angles. The method comprises the following specific steps: step one, selecting a variation type of a tumor; step two, RNA variation analysis is carried out on the tumor; step three, carrying out MHC molecule analysis on normal blood and tumor; step four, RNA expression analysis is carried out on the tumor; fifthly, carrying out variation annotation on the tumor; analyzing the variation driving gene of the tumor; seventhly, predicting the HLA molecule binding affinity of the tumor; step eight, analyzing the mutation frequency of the tumor; step nine, comprehensively scoring the tumor neoantigens; step ten, analyzing the synthesis difficulty of the tumor neoantigen; step eleven, comprehensively selecting the final tumor neoantigen. The invention optimally combines tumor mutation analysis and tumor expression prediction tumor specific antigen, so that the analysis process is more efficient and accurate.
Description
Technical Field
The invention relates to the field of tumor immunity, in particular to a method for screening a tumor neoantigen.
Background
Tumor-specific antigens (TSAs) are antigens which are characteristic of tumor cells and are also known as neoantigens (neoantigens). Tumor-specific antigens were proposed in the first half of the last century, and then with the development of molecular biology and the deep knowledge of the function of major histocompatibility complex (abbreviated as MHC) molecules, Boon et al first discovered that in tumors, complexes of specific peptides and MHC molecules produced by tumors can be recognized by T cells such as CD8+ or CD4 +. Subsequent studies have recognized that these antigens recognized by T cells are derived from genomic variations of tumors expressed as tumor-specific peptides (neo-epitopes) and are defined as neoantigens (neoantigens). Unlike tumor-associated antigens, tumor-specific antigens are present only in tumor cells.
Two independent clinical phase I test results are published in 7 months in 2017 and in the same period of the British science journal Nature, a novel antigen (neoantigen) specifically expressed by tumor cells due to gene mutation is searched by sequencing DNA and RNA of the tumor cells, and then a personalized tumor vaccine is constructed and is infused back into a human body to activate immune cells and kill the tumor cells with the antigen. This is the first cancer vaccine study that has succeeded in clinical trials.
The prediction methods of tumor neoantigens which are published at present mainly comprise EpiToolkit and Epi-Seq. However, EpiToolKit only starts from mutation, does not consider the depth and coverage of sequencing data, does not consider the quality of mutation from the quality of data, and cannot judge the quality of the obtained neoantigen. In addition, EpiToolkit does not consider expression abundance and does not consider the expression condition of the neoantigen, which causes false positive prediction and can not screen high-quality neoantigen. Many mutations at the DNA level are not expressed, and on average there may be 50% of mutations that are not expressed, and thus may cause false positives for prediction of neoantigens. And the expression of the mutation is high or low, and the higher the expression is, the stronger the immunogenicity is generated on the whole. In addition, EpiToolKit does not consider the comparison between the mutant peptide and the normal peptide, and the high quality neoantigen is generally higher in affinity than the normal peptide, while EpiToolKit lacks such a comparison, which would also cause false positive screening of high quality neoantigen.
Epi-Seq predicts tumor-specific antigens only from the expression data of tumors, and predicts neoantigens from the expression data, which also causes false positives. On one hand, false positives are easily caused by the influence of RNA editing; on the other hand, since RNA sequencing is performed after reverse transcription from cDNA, this process also introduces a large number of false positives; on the other hand, there are many false positives in the detection method for the tomorcDNA VS germline DNA. The above factors result in more false positives for the new 66 antigen obtained by Epi-Seq.
Therefore, at present, there is no method for screening high-quality tumor neoantigens from multiple angles directly based on sequencing comparison results, and related research is being conducted.
Disclosure of Invention
The present invention is directed to a method for screening tumor neoantigens, which solves the above problems of the prior art.
In order to achieve the purpose, the invention provides the following technical scheme:
a screening method of tumor neoantigen comprises the following specific steps:
step one, selecting a variation type of tumor somatic cells;
step two, RNA variation analysis is carried out on the tumor somatic cells;
step three, respectively carrying out Major Histocompatibility Complex (MHC) molecule analysis on normal blood and tumor somatic cells;
step four, RNA expression analysis is carried out on the tumor somatic cells;
fifthly, carrying out variation annotation on tumor somatic cells;
analyzing the variation driving gene of the tumor somatic cell;
seventhly, predicting the binding affinity of Human Leukocyte Antigen (HLA) molecules on the tumor somatic cells;
step eight, analyzing the mutation frequency of the tumor somatic cells;
step nine, comprehensively scoring and sequencing candidate tumor neoantigens;
step ten, analyzing the synthesis difficulty of the candidate tumor neoantigen;
step eleven, integrating the results of the step nine and the step ten to select the final tumor neoantigen.
As a further scheme of the invention: the variation types of the tumor somatic cells in the first step comprise DNA point mutation, insertion deletion mutation and frame shift mutation of the tumor somatic cells.
As a further scheme of the invention: in the third step, the Optitype, xHLA and seq2HLA software is adopted to carry out molecular analysis on normal blood and tumor somatic cells, and the results of the three are integrated to determine the HLA typing result of the tumor somatic cells.
As a further scheme of the invention: the variation annotations in the fifth step comprise variation annotations of point mutations, insertion deletion mutations and frame shift mutations in the tumor somatic variations.
As a further scheme of the invention: and step six, analyzing the mutation driving gene of the point mutation, the insertion deletion mutation and the frame shift mutation in the tumor somatic cells.
As a further scheme of the invention: and seventhly, performing HLA molecule binding affinity prediction on the tumor somatic cells according to the HLA molecule type of the tumor somatic cells, the mutation prediction peptide fragment obtained in the mutation peptide fragment prediction step and the wild type peptide fragment sequence corresponding to the mutation prediction peptide fragment.
As a further scheme of the invention: the scoring and ranking in the ninth step are based on the MHC affinity of the tumor somatic cells, the expression abundance of the tumor somatic cell antigens and the contrast degree of the wild-type peptides, the mutation frequency of the tumor somatic cells, whether the tumor somatic cells are RNA mutations and whether the tumor somatic cells are tumor driving genes.
As a further scheme of the invention: and step ten, analyzing according to the molecular weight, the isoelectric point, the electrostatic charge when the pH value is 7, the average hydrophilicity and the comparison difficulty of hydrophilic residues in the synthesis of the candidate tumor neoantigen.
Compared with the prior art, the invention has the beneficial effects that:
the tumor mutation analysis and tumor expression prediction tumor specific antigen are optimally combined, so that the analysis process is more efficient and accurate;
the invention not only aims at human gene data, but also adds a mouse analysis module, so that the application range of the prediction analysis of the neoantigen is wider;
the method starts from reading the fastq file, automatically generates the result by one key, optimizes the combined calling of the intermediate file combination in the big data processing, greatly improves the analysis efficiency by adopting multi-task distributed processing, reduces the requirement of biological big data analysis hardware, ensures that the tumor mutation analysis result is more accurate, further improves the accuracy of subsequent treatment, and has positive use prospect.
Drawings
FIG. 2 is a graph showing the Expression amount of Expression _ TMP generated in example 2 of the screening method for a tumor neoantigen, which is 4.4.
FIG. 3 is a graph of affinity scores of wild-type peptide chains calculated in the case of Normal _ score of 7.6 in example 2 of the method for screening tumor neoantigens.
Detailed Description
The technical solution of the present patent will be described in further detail with reference to the following embodiments.
Example 1
A screening method of tumor neoantigen comprises the following specific steps:
1, selection of tumor somatic variation
Adopting internationally recognized GATK tumor cell somatic mutation detection software and commercial software to detect the whole exon secondary sequencing results of a tumor somatic cell sample and a normal blood sample, and taking the mutation with high mutation frequency detected by various detection software as candidate mutation; meanwhile, carrying out mutation analysis on the sequencing result of the tumor somatic cell transcriptome;
2, tumor somatic RNA variation analysis step
Combining the somatic mutation of the tumor somatic DNA and the RNA mutation of the tumor somatic, and finally determining the tumor somatic variation.
3, MHC molecule analysis step
HLA typing software Optitype is used for respectively analyzing HLA class I molecules of the full exon secondary sequencing results of the tumor somatic cell sample and the normal blood sample; respectively analyzing HLA class I molecules and HLA class II molecules of the full exon secondary sequencing results of the tumor somatic cell sample and the normal blood sample by using HLA typing software xHLA; HLA typing software seq2HLA is used for carrying out HLA class I molecule and HLA class II molecule analysis on the sequencing result of the tumor somatic cell transcriptome; the results of 3-binding finally confirmed the HLA typing results, and from the 3-binding results, the sample identity was confirmed.
4, analysis of RNA expression in tumor somatic cells
Transcriptome expression amount analysis is carried out on the sequencing result of the tumor somatic transcriptome, and genes and a transcriptional TPM (Transcripts Per Million) value are determined.
5, variant annotation step
Annotating point mutation, insertion deletion mutation and frameshift mutation in tumor somatic mutation from genome mutation to transcriptome correspondence to amino acid mutation; TMB (tumor burden) analysis of tumor somatic variations;
6, tumor somatic cell driver gene analysis step
And (3) carrying out tumor somatic mutation driving gene analysis on point mutation, insertion deletion mutation and frame shift mutation in tumor somatic mutation by referring to a COSMIC tumor database.
HLA molecule affinity prediction step
The method comprises the steps of taking the HLA molecule type of a tumor somatic cell sample obtained in the MHC molecule identification step, a mutation prediction peptide fragment obtained in the mutation peptide fragment prediction step and a wild type peptide fragment sequence corresponding to the mutation prediction peptide fragment as the input of MHC class I and MHC class II affinity prediction software, and predicting the affinity levels of the mutation peptide fragment and MHC class I and MHC class II genes respectively. The mutant peptide fragments of MHC class I molecules are predicted by using a computer neural network NNAlign algorithm in combination with affinity and MS elution ligand data.
8, tumor somatic mutation frequency analysis step
The method comprises the step of detecting the frequency of the mutation of tumor somatic cells occupying the gene locus in all DNA by adopting tumor mutation frequency analysis software, wherein the higher the mutation frequency is, the higher the percentage of the tumor somatic cells is.
9, comprehensive grading and sequencing step of candidate tumor neoantigens
The method comprises the steps of grading mutation prediction peptide sections in the candidate tumor neogenesis antigen according to influence factors such as MHC affinity, antigen expression abundance, wild type peptide comparison, tumor somatic mutation frequency, RNA mutation and tumor somatic driving gene, and the like, and sorting the mutation prediction peptide sections from high to low according to scores, and selecting the section with a high score as the tumor neogenesis antigen.
10, analysis step of ease of polypeptide synthesis
Peptide chains with the length of 30 bits are respectively filled towards the left and the right by taking the predicted mutant peptide as the center, and the synthesis difficulty of the candidate tumor neoantigen is analyzed from the aspects of molecular weight, isoelectric point, electrostatic charge when the pH value is 7, average hydrophilicity and hydrophilic residue ratio according to polypeptide synthesis difficulty analysis software.
11, candidate tumor neoresistance Final selection step
The final synthetic tumor neoantigen was selected according to the score of step 9 and the ease of polypeptide synthesis of step 10.
Example 2
A screening method of tumor neoantigen comprises the following specific steps:
1, selection of tumor somatic variation
1.1. Carrying out second-generation sequencing on tumor tissues and blood of the sample by using related reagents, wherein the sequencing depths are 200X and 100X respectively;
1.2. the sequencing data in 1.1 were subjected to comprehensive quality analysis using the fastp software of OpenGene, and if Q20 was less than 98%, or Q30 was less than 90%, or the GC ratio was abnormal, the sequencing data were considered to be of unacceptable quality, and the neoantigen analysis was stopped. Filters out reads with too low, too short, or too many N.
1.3. BWA-MEM alignment was performed on fastq clean data. Judging the type of the sample, if the sample is a human sample, selecting a GRCh38 human reference gene group to compare the tumor tissue sequencing data with the blood sequencing data; if it is a mouse sample, the GRCm38 reference panel is selected to align the tumor tissue sequencing data with the blood sequencing data.
1.4. And (5) further counting the sequencing quality. And respectively counting the Phred scores of each sequencing cycle in the tumor tissue and blood data after 1.3, wherein the Q value of each sequencing cycle is required to be more than 30, and otherwise, stopping the new antigen analysis. The sequencing depth of the tumor tissue and blood data after 1.3 steps was calculated, respectively, and neoantigen analysis was stopped if the sequencing depth was below 200X and 100X, respectively.
1.5. The number of repeated rejects after step 1.3.
1.6. And performing realignment on the data after the step 1.5 according to the known indel information in 1000G.
1.7. The existing variation database is used to build a model to generate a recalibration table. The mass fraction of bases is then corrected according to this model.
1.8. Tumor somatic mutation analysis was performed using mutec, mutec 2. The results of both variants were combined.
2, analysis of RNA variation in tumor tissues
2.1. RNA-seq was performed on tumor tissue with a sequencing cluster of 60M
2.2. RNA-seq data were quality verified using FastQC software. And stopping the new antigen analysis if the quality is unqualified.
2.3. Alignment was performed using STAR software. Judging the type of the sample, and if the sample is a human sample, selecting a GRCh38 human reference gene group to compare the tumor tissue sequencing data with the blood sequencing data. If the mouse is the mouse, the GRCm38 reference gene group is selected to align the tumor tissue sequencing data with the blood sequencing data.
2.4. And (5) further counting the sequencing quality. And respectively counting the Phred scores of each sequencing cycle in the tumor tissue and blood data after 2.3, wherein the Q value of each sequencing cycle is required to be more than 30, and otherwise, stopping the new antigen analysis.
2.5. The number of repeated rejects after step 2.3.
2.6. The split reads strategy is used to discover new connections.
2.7. And performing realignment on the data after the step 2.6 according to the known indel information in 1000G.
2.8. The existing variation database is used to build a model to generate a recalibration table. The mass fraction of bases is then corrected according to this model.
2.9. Mutation analysis was performed using HaplotypeCaller. The emit _ conf parameter in the haplotypecall command is set to 30, the call _ conf parameter is set to 25, and the ploidy transfer parameter is set to 4.
3, MHC molecule analysis step
3.1. HLA typing software Optitype is used for respectively analyzing HLA class I molecules of the full exon secondary sequencing results of the sample tumor tissue and the normal blood;
3.2. analyzing HLA class I molecules and HLA class II molecules of the full exon secondary sequencing results of the sample tumor tissue and the normal blood respectively by using HLA typing software xHLA;
3.3. HLA typing software seq2HLA is used for carrying out HLA class I molecule and HLA class II molecule analysis on the sequencing result of the tumor tissue transcriptome;
3.4. the results of binding 3.1,3.2 and 3.3 ultimately determined HLA typing results. If the 3 results are very different from the alarm exit, the new antigen analysis is stopped.
4, analysis of RNA expression in tumor tissue
4.1. And comparing sequencing results of the tumor tissue transcriptome, judging the type of the sample, and if the sample is a human sample, selecting a grch38_ tran human reference genome to compare the sequencing data of the tumor tissue with the blood sequencing data. If the mouse is the mouse, a grcm38_ tran reference gene group is selected to align the tumor tissue sequencing data with the blood sequencing data.
4.2. The 4.1 output bam files are sorted by samtools.
4.3. Calculating the expression level of transcriptome by using stringtie.
4.4. TMP (Transcripts Per Million) values for transcriptomes were extracted from the gtf files generated in 4.3.
5, variant annotation step
5.1. Annotation of genomic mutations for point, indel, and frameshift mutations in tumor somatic variations was performed with VEP software. Judging the type of the sample, if the sample is a human sample, selecting a GRCh38 human reference gene group to compare the tumor tissue sequencing data with the blood sequencing data; if it is a mouse sample, the GRCm38 reference panel is selected to align the tumor tissue sequencing data with the blood sequencing data.
5.2. The vcf format is converted to maf format with vcf2maf software.
5.2. Tumor somatic variations were screened for variations that eliminated Intron, 5'UTR, 3' UTR, IGR, 5'Flank, 3' Flank, RNA, and lincRNA types, and were variations in non-dbsnp, and TMB (tumor burden) analysis was calculated.
6, tumor driver Gene analysis step
And (3) carrying out tumor somatic mutation driving gene analysis on point mutation, insertion deletion mutation and frame shift mutation in tumor somatic mutation by referring to a COSMIC tumor database.
HLA molecule affinity prediction step
The method comprises the steps of taking the HLA molecule type of a tumor sample obtained in the MHC molecule identification step, a mutation prediction peptide segment obtained in the mutation peptide segment prediction step and a wild type peptide segment sequence corresponding to the mutation prediction peptide segment as the input of MHC I type and MHC II type affinity prediction software, and predicting the affinity levels of the mutation peptide segment and MHC I type and MHC II type genes respectively. The mutant peptide fragments of MHC class I molecules are predicted by using a computer neural network NNAlign algorithm in combination with affinity and MS elution ligand data.
7.1. Judging the type of the sample, if the sample is a human sample, selecting a cDNA reference sequence of GRCh38 and a peptide sequence of GRCh 38; in the case of mouse samples, the cDNA reference sequence of GRCm38 and the peptide sequence of GRCm38 were selected. Inquiring a wild type peptide chain and a mutant type peptide chain with corresponding predicted lengths for the mutation of SNP mutation, insertion and deletion types by using the result of vcf2maf in 5.2; and inquiring the wild type peptide chain and the mutant peptide chain with corresponding predicted lengths according to the cDNA sequence and the peptide chain reference sequence for the frame shift mutation.
7.2. The HLA class I molecules generated in step 3 were analyzed using netMHCpan-4.0. Turning on netMHCpan-4.0, integrating affinity (BA) and mass spectrometry data (MS) parameters, more information was obtained from two different angles. Firstly, necessary screening is carried out by utilizing class MHC data of an IEDB database, model training is carried out by utilizing data of affinity (BA) and mass spectrum elution ligand (MS elected ligand), information of the two data is integrated by an artificial neural network method, and the affinity value of a peptide segment for predicting the binding of a specific MHC molecule and the length of the peptide segment are increased based on an NNAlign framework. The method of NetMHCpan-4.0 improves the prediction accuracy of the T cell immune epitope in tumor neoantigens, verified Elution Ligands (ELs) and T cells. And predicting and scoring by utilizing the affinity of the netMHCpan-4.0 predicted HLA class I molecules with the wild-type peptide chain with the length of 8-15 bits and the mutant peptide chain with the length of 8-15 bits generated in 7.1.
7.3. The HLA class II molecules generated in step 3 were analyzed using netMHCIIpan-3.2. Prediction of HLA class I molecules affinity to the 8-15 bit long wild-type peptide chain and 8-15 bit long mutant peptide chain generated in 7.1 was performed and scoring was performed.
7.4. netMHC analysis was performed on the HLA-I, class II molecules generated in step 3, with an affinity threshold set at 500 nm. Prediction of HLA class I molecules affinity to the 8-15 bit long wild-type peptide chain and 8-15 bit long mutant peptide chain generated in 7.1 was performed and scoring was performed.
7.5. The HLA-I, class II molecules generated in step 3 were analyzed using NetMHCcons with an affinity threshold set at 500 nm. Prediction of HLA class I molecules affinity to the 8-15 bit long wild-type peptide chain and 8-15 bit long mutant peptide chain generated in 7.1 was performed and scoring was performed.
7.6. The affinity scores of the HLA-I molecules of the above 3 kinds of software are median, and the Chinese value is used as the final affinity score.
8, tumor mutation frequency analysis step
The method comprises the step of detecting the frequency of tumor mutation accounting for the gene site in all DNA by using tumor mutation frequency analysis software, wherein the higher the mutation frequency is, the higher the percentage of tumor cells is. And reading the mutation frequency from the VCF file, reading the FA field in the VCF file if the mutation frequency is the result of Mutect software analysis, and reading the AF field in the VCF file if the mutation frequency is the result of Mutect 2.
9, comprehensive grading and sequencing step of candidate tumor neoantigens
The method comprises the steps of scoring each mutation prediction peptide segment in the candidate tumor neogenesis antigen according to influence factors such as MHC affinity, antigen expression abundance, wild type peptide comparison, tumor mutation frequency, RNA mutation, tumor driving gene and the like, sorting according to scores from high to low, and selecting a person with a high score as the tumor neogenesis antigen.
9.1. Formula one
The scores of the neoantigens of the peptide chains at positions 8-15 in 7.6 were calculated, respectively, and all the neoantigens were sorted in the reverse order of the scores.
9.2. Formula two
The value _ affinity _ score calculation formula:
Mutant_affinity_score =Δ*(1+emutant_score*10-5)
the Mutant score is the affinity score of the Mutant peptide chain calculated in 7.6, and the affinity is used as an index to convert the affinity into a natural logarithm for operation. WhereinThe number of tumor variations, if snp variations, are included Is 1; if the mutation is insertion or deletion, thenThe number of specific insertions or deletions; if it is a shift variationIs a specific number of frameshifts. FIG. 1 is a drawing of a fakeIs provided withCurve at 3.
9.3. Formula three
Expression _ score calculation formula:
FIG. 2 shows the Expression level of Expression _ TMP produced in 4.4, and the value of Expression _ score is 1 when the Expression level of the transcriptome reaches a certain level.
9.4. Formula four
Normal _ affinity _ score calculation formula:
Normal_affinity_score =1/(1+enormal_*10-5)
FIG. 3 shows the affinity score for wild-type peptide chains calculated in 7.6 for Normal _ score, converted to natural logarithm using the affinity as an index and inverted to calculate the value.
α=0.99*allele_frequency+0.9*TBM+0.1*in_RNA_mutant+
0.1*is_cancer_driven_genne
Allle _ frequency: tumor mutation frequency calculated in step 8.
TBM: tumor burden calculated in 5.2.
In _ RNA _ mutant: 2.9 whether the tumor variation is among RNA variations.
Is _ cancer _ drive _ gene: whether it is a tumor driver gene in step 6.
10, analysis step of ease of polypeptide synthesis
Respectively filling peptide chains with the length of 25-30 bits to the left and the right by taking the predicted mutant peptide as a center, and analyzing the synthesis difficulty of the candidate tumor neoantigens from the aspects of molecular weight, isoelectric point, electrostatic charge when the pH value is 7, average hydrophilicity and hydrophilic residue ratio according to polypeptide synthesis difficulty analysis software.
11, candidate tumor neoresistance Final selection step
Selecting the final synthesized tumor neoantigen according to the scoring of the 9 th step and the synthetic difficulty of the polypeptide of the 10 th step
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.
Claims (7)
1. A method for screening tumor neoantigens is characterized by comprising the following specific steps:
step one, selecting a variation type of tumor somatic cells;
step two, RNA variation analysis is carried out on the tumor somatic cells;
step three, respectively carrying out MHC molecule analysis on normal blood and tumor somatic cells;
step four, RNA expression analysis is carried out on the tumor somatic cells;
fifthly, carrying out variation annotation on tumor somatic cells;
analyzing the variation driving gene of the tumor somatic cell;
seventhly, predicting the HLA molecule binding affinity of the tumor somatic cells;
step eight, analyzing the mutation frequency of the tumor somatic cells;
step nine, comprehensively scoring and sequencing candidate tumor neoantigens;
step ten, analyzing the synthesis difficulty of the candidate tumor neoantigen;
step eleven, integrating the results of the step nine and the step ten to select the final tumor neoantigen.
2. The method for screening tumor neoantigens according to claim 1, wherein the types of the variation of tumor somatic cells in the first step include DNA point mutation, insertion deletion mutation and frame shift mutation of tumor somatic cells.
3. The method for screening tumor neoantigens according to claim 1, wherein the variation annotations in the step five comprise variation annotations of point mutations, insertion/deletion mutations and frameshift mutations in tumor somatic variations.
4. The method for screening tumor neoantigen according to claim 1 or 2, wherein the analysis of the mutation driver is performed for the point mutation, the insertion deletion mutation and the frame shift mutation in the tumor somatic cell in the sixth step.
5. The method for screening tumor neoantigens according to claim 1, wherein in the seventh step, the HLA molecule binding affinity of tumor somatic cells is predicted according to the HLA molecule type of tumor somatic cells, the mutation prediction peptide fragment obtained in the step of predicting the mutation peptide fragment, and the wild-type peptide fragment sequence corresponding to the mutation prediction peptide fragment.
6. The method for screening tumor neoantigen according to claim 1 or 3, wherein the step nine is performed by ranking according to the MHC affinity of tumor somatic cells, the antigen expression abundance of tumor somatic cells and the contrast degree of wild-type peptides, the mutation frequency of tumor somatic cells, whether tumor somatic cells are RNA mutations and whether tumor driver genes.
7. The method of claim 1, wherein the step ten comprises analyzing the molecular weight, isoelectric point, electrostatic charge at pH 7, average hydrophilicity, and ease of synthesis of the hydrophilic residue ratio to the candidate tumor neoantigen.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910242904.8A CN111755067A (en) | 2019-03-28 | 2019-03-28 | Screening method of tumor neoantigen |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910242904.8A CN111755067A (en) | 2019-03-28 | 2019-03-28 | Screening method of tumor neoantigen |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111755067A true CN111755067A (en) | 2020-10-09 |
Family
ID=72671533
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910242904.8A Pending CN111755067A (en) | 2019-03-28 | 2019-03-28 | Screening method of tumor neoantigen |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111755067A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112309502A (en) * | 2020-10-14 | 2021-02-02 | 深圳市新合生物医疗科技有限公司 | Method and system for calculating tumor neoantigen load |
CN112466396A (en) * | 2020-12-04 | 2021-03-09 | 中山大学附属第一医院 | Screening method of tumor high-affinity new antigen and application of tumor high-affinity new antigen in indication of treatment prognosis curative effect of PD-1 of liver cancer patient |
CN113322233A (en) * | 2021-04-19 | 2021-08-31 | 格源致善(上海)生物科技有限公司 | Improved preparation method and application of reactive T cells based on neoantigens |
CN113517021A (en) * | 2021-06-09 | 2021-10-19 | 海南精准医疗科技有限公司 | Cancer driver gene prediction method |
CN114446389A (en) * | 2022-02-08 | 2022-05-06 | 上海科技大学 | Tumor neoantigen characteristic analysis and immunogenicity prediction tool and application thereof |
CN114882951A (en) * | 2022-05-27 | 2022-08-09 | 深圳裕泰抗原科技有限公司 | Method and device for detecting MHC II tumor neoantigen based on next generation sequencing data |
CN117174166A (en) * | 2023-10-26 | 2023-12-05 | 北京基石京准诊断科技有限公司 | Tumor neoantigen prediction method and system based on third-generation sequencing data |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104662171A (en) * | 2012-07-12 | 2015-05-27 | 普瑟姆尼股份有限公司 | Personalized cancer vaccines and adoptive immune cell therapies |
EP3323070A1 (en) * | 2015-07-14 | 2018-05-23 | Personal Genome Diagnostics Inc. | Neoantigen analysis |
CN108796055A (en) * | 2018-06-12 | 2018-11-13 | 深圳裕策生物科技有限公司 | Tumor neogenetic antigen detection method, device and storage medium based on the sequencing of two generations |
WO2019008365A1 (en) * | 2017-07-05 | 2019-01-10 | The Francis Crick Institute Limited | Method for treating cancer by targeting a frameshift indel neoantigen |
-
2019
- 2019-03-28 CN CN201910242904.8A patent/CN111755067A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104662171A (en) * | 2012-07-12 | 2015-05-27 | 普瑟姆尼股份有限公司 | Personalized cancer vaccines and adoptive immune cell therapies |
EP3323070A1 (en) * | 2015-07-14 | 2018-05-23 | Personal Genome Diagnostics Inc. | Neoantigen analysis |
CN108351916A (en) * | 2015-07-14 | 2018-07-31 | 个人基因组诊断公司 | Neoantigen is analyzed |
WO2019008365A1 (en) * | 2017-07-05 | 2019-01-10 | The Francis Crick Institute Limited | Method for treating cancer by targeting a frameshift indel neoantigen |
CN108796055A (en) * | 2018-06-12 | 2018-11-13 | 深圳裕策生物科技有限公司 | Tumor neogenetic antigen detection method, device and storage medium based on the sequencing of two generations |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112309502A (en) * | 2020-10-14 | 2021-02-02 | 深圳市新合生物医疗科技有限公司 | Method and system for calculating tumor neoantigen load |
CN112309502B (en) * | 2020-10-14 | 2024-09-20 | 深圳市新合生物医疗科技有限公司 | Method and system for calculating tumor neoantigen load |
CN112466396A (en) * | 2020-12-04 | 2021-03-09 | 中山大学附属第一医院 | Screening method of tumor high-affinity new antigen and application of tumor high-affinity new antigen in indication of treatment prognosis curative effect of PD-1 of liver cancer patient |
CN113322233A (en) * | 2021-04-19 | 2021-08-31 | 格源致善(上海)生物科技有限公司 | Improved preparation method and application of reactive T cells based on neoantigens |
CN113517021A (en) * | 2021-06-09 | 2021-10-19 | 海南精准医疗科技有限公司 | Cancer driver gene prediction method |
CN113517021B (en) * | 2021-06-09 | 2022-09-06 | 海南精准医疗科技有限公司 | Cancer driver gene prediction method |
CN114446389A (en) * | 2022-02-08 | 2022-05-06 | 上海科技大学 | Tumor neoantigen characteristic analysis and immunogenicity prediction tool and application thereof |
CN114446389B (en) * | 2022-02-08 | 2024-05-14 | 上海科技大学 | Tumor neoantigen feature analysis and immunogenicity prediction tool and application thereof |
CN114882951A (en) * | 2022-05-27 | 2022-08-09 | 深圳裕泰抗原科技有限公司 | Method and device for detecting MHC II tumor neoantigen based on next generation sequencing data |
CN114882951B (en) * | 2022-05-27 | 2022-12-27 | 深圳裕泰抗原科技有限公司 | Method and device for detecting MHC II tumor neoantigen based on next generation sequencing data |
CN117174166A (en) * | 2023-10-26 | 2023-12-05 | 北京基石京准诊断科技有限公司 | Tumor neoantigen prediction method and system based on third-generation sequencing data |
CN117174166B (en) * | 2023-10-26 | 2024-03-26 | 北京基石生命科技有限公司 | Tumor neoantigen prediction method and system based on third-generation sequencing data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111755067A (en) | Screening method of tumor neoantigen | |
CN108796055B (en) | Method, device and storage medium for detecting tumor neoantigen based on second-generation sequencing | |
CN109801678B (en) | Tumor antigen prediction method based on complete transcriptome and application thereof | |
CN109767810B (en) | High-throughput sequencing data analysis method and device | |
EP2718862B1 (en) | Method for assembly of nucleic acid sequence data | |
CN109584960B (en) | Method, device and storage medium for predicting tumor neoantigen | |
US20130332081A1 (en) | Variant annotation, analysis and selection tool | |
CN110621785B (en) | Method and device for haplotyping diploid genome based on three-generation capture sequencing | |
JP2021534492A (en) | Systems and Methods Using Neural Networks for Germline and Somatic Mutation Calls | |
CN110211633B (en) | Detection method for MGMT gene promoter methylation, processing method for sequencing data and processing device | |
CN110752041A (en) | Method, device and storage medium for predicting neoantigen based on next generation sequencing | |
CN111139291A (en) | High-throughput sequencing analysis method for monogenic hereditary diseases | |
KR20190085667A (en) | Circulating Tumor DNA Detection Method Using Sample comprising Cell free DNA and Uses thereof | |
CN114446389B (en) | Tumor neoantigen feature analysis and immunogenicity prediction tool and application thereof | |
WO2014041380A1 (en) | Method and computer program product for detecting mutation in a nucleotide sequence | |
CN106021993A (en) | Tumor exome sequencing analysis system and method | |
CN111534602A (en) | Method for analyzing human blood type and genotype based on high-throughput sequencing and application thereof | |
CN112210596B (en) | Tumor neoantigen prediction method based on gene fusion event and application thereof | |
CN111696628A (en) | Method for identifying neoantigens | |
CN115240773B (en) | New antigen identification method and device, equipment and medium of tumor specific circular RNA | |
KR101815529B1 (en) | Human Haplotyping System And Method | |
Al Seesi et al. | Geneo: a bioinformatics toolbox for genomics-guided neoepitope prediction | |
CN114333998A (en) | Tumor neoantigen prediction method and system based on deep learning model | |
CN111599410B (en) | Method for extracting microsatellite unstable immunotherapy new antigen by integrating multiple sets of chemical data and application | |
Esim et al. | Determination of malignant melanoma by analysis of variation values |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |