Tumor neogenetic antigen detection method, device and storage medium based on the sequencing of two generations
Technical field
This application involves tumor neogenetic antigen detection fields, anti-more particularly to a kind of tumor neogenetic based on the sequencing of two generations
Former detection method, device and storage medium.
Background technology
Tumour specific antigen (tumor-specific antigens, abridge TSAs) refers to specific to tumour cell
Antigen, also known as neoantigen (neoantigens).Tumour specific antigen is set forth in last century first half leaf, later with point
Sub- Development of Biology and to major histocompatibility complex (major histocompatibility complex, abbreviation
MHC) the deep understanding of molecular function, Boon et al. have found in tumour first, there is the specific peptide fragment that tumour generates and MHC points
Sub- compound can be identified by T cells such as CD8+ either CD4+.Subsequent research recognizes that these can be identified anti-by T cell
The genome mutation that original comes from tumour is expressed as the distinctive peptide fragment of tumour (neo-epitopes), is defined as neoantigen
(neoantigens).Different from tumor related antigen, tumour specific antigen exists only in tumour cell.
Nearest immunologic test point suppression therapy obtains huge success in clinic, especially compares mutational load
High tumor patient.Because the mutational load of tumour is high, the tumor neogenetic antigen expressed is just relatively more, to easily cause
Internal T cell identification and killing tumor cell.Therefore the quality and quantity of tumor neogenetic antigen affects the of immunization therapy
One step has served critical.2013, immunotherapy of tumors was chosen as by Science first of ten big Progress & New Products, with
Scientist headed by Rosenberg, Schreiber etc. has led the research boom of tumor neogenetic antigen.In May, 2014,
Rosenberg team exists《science》Magazine ran crosses an epoch-making successful case:Specifically using amplification in vitro, energy
Property identification cancer cell gene mutation caused by paraprotein lymphocyte, the successful treatment late period bile duct of an example high malignancy
Cancer patient.Year ends 2016, Rosenberg team have filtered out the tumor neogenetic antigen after the G12D mutation of targeting KRAS genes
Til cell, so that tumor regression, article are published in top medical journal after amplification is fed back《NEJM》.2017, Catherine
J.Wu and Ugur Sahin are delivered simultaneously《nature》Personalized tumor vaccine of the report based on tumor neogenetic antigen passes through early stage
Clinical test.As it can be seen that the detection of tumor neogenetic antigen is of great significance to immunization therapy.
The pre- flow gauge for the tumor neogenetic antigen announced at present includes mainly EpiToolKit and Epi-Seq.But
For EpiToolKit only from mutation, there is no the depth and coverage that consider sequencing data, are not examined from the quality of data
The quality condition for considering mutation, to judge the quality of obtained neoantigen.In addition, EpiToolKit does not account for table
Up to abundance, the expression of neoantigen is not accounted for, prediction false positive can be caused, high quality neoantigen can not be screened.Very
The mutation of more DNA levels is not expressed, and averagely may have 50% mutation not express, it is thus possible to cause prediction newborn
The false positive of antigen.And to have height to have low for the expression of mutation, and expression is higher, and the immunogenicity generally generated is stronger.In addition,
EpiToolKit does not account for the comparison of mutant peptide and normal peptide yet, and the neoantigen of high quality is usually the affinity of mutant peptide
Affinity than normal peptide is high, and EpiToolKit shortages are such relatively, will also result in the screening of high quality neoantigen
There is false positive.
Epi-Seq only predicts tumour specific antigen from the expression data of tumour, newborn from expression data prediction
Antigen can equally cause false positive.On the one hand, it is influenced by rna editing, be easy to cause false positive;On the other hand, because of RNA
Sequencing is sequenced again after cDNA reverse transcriptions, this process can also introduce prodigious false positive;In another aspect, being exactly tumor
CDNA VS germline DNA have many false positives in detection method.It is new that factors above causes Epi-Seq to obtain
There are more false positives for raw antigen.
Therefore, there is presently no the tumour of high quality can be screened from multiple angles directly from sequencing comparison result
The method and flow of neoantigen.
Invention content
The purpose of the application is to provide a kind of new tumor neogenetic antigen detection method being sequenced based on two generations, device and deposited
Storage media.
To achieve the goals above, the application uses following technical scheme:
The first aspect of the application discloses a kind of tumor neogenetic antigen detection method being sequenced based on two generations, this method packet
Include following steps,
Make a variation detecting step, includes the sequencing knot using at least two abrupt climatic change softwares to tumor sample and normal sample
The comparison file of fruit carries out the point mutation of tumour body cell and insertion and deletion mutation is detected, and takes two kinds of abrupt climatic change software detections
Intersection as Candidate Mutant;Meanwhile fusion abrupt climatic change is carried out to the comparison file of tumor transcriptional group sequencing result, it will
The fusion mutation of detection is also used as Candidate Mutant;Wherein, the intersection of two kinds of abrupt climatic change software detection refers to two kinds of mutation
Inspection software all has a mutation detected simultaneously, in a kind of realization method of the application, specifically uses VarScan and mutect
Two software detection point mutation and insertion and deletion mutation, and use STAR-Fusion detection fusion gene mutations;
MHC molecule authentication step, including HLA molecule type inspection software polysolver and BWA mem couple is respectively adopted
The HLA molecule types of normal sample and tumor sample are detected, if the HLA molecules of the tumor sample of polysolver detections
It is matched with normal sample, is then used as HLA molecular isoform results to export;If it does not match, checking the tumour sample of BWA mem detections
The match condition of this HLA molecules and normal sample exports the HLA molecular isoform testing results of BWA mem if matching,
If still mismatched, export empty as a result, showing can not to judge the molecular isoform of HLA;
Make a variation annotating step, include in Candidate Mutant point mutation and insertion and deletion mutation carry out genome mutation to ammonia
The annotation of base acid mutation;In a kind of realization method of the application, VEP (Variant Effect Prediction) is specifically used
It is annotated;
Be mutated peptide fragment prediction steps, include to the point mutation in Candidate Mutant, insertion and deletion is prominent and fusion is mutated
Peptide fragment is predicted;It specifically includes, centered on the mutating acid of point mutation, the front and back length for extending at least ten amino acid
Mutation forecasting peptide fragment as point mutation;Centered on the mutated site of insertion and deletion mutation, few 10 amino are extended forwardly into
The length of acid extends back up to the position for reaching normal amino acid translation, the mutation forecasting peptide as insertion and deletion mutation
Section;Centered on the position of fusion of fusion mutation, at least ten amino acid with 5 ' ends is held in interception by the 3 ' of fusion
Mutation forecasting peptide fragment as fusion mutation;In a kind of realization method of the application, specifically using transvar tools into
The prediction of row genome mutation peptide fragment;
Peptide fragment MHC I types and MHC II type affinity prediction steps are mutated, include swelling what MHC molecule authentication step obtained
HLA (human lymphocytic antigen human lymphocyte antigen, abridge HLA) molecule type, the mutant peptide of tumor sample
The mutation forecasting peptide fragment and the corresponding wild type peptide section sequence of mutation forecasting peptide fragment that section prediction steps obtain are as MHC I types
With the input of MHC II type affinity forecasting softwares, prediction mutation peptide fragment and MHC I types and MHC II type genes is affine respectively
Power is horizontal, regard the affinity level of prediction as candidate tumor neoantigen less than 500nM;A kind of realization method of the application
In, affinity forecasting software specifically uses netMHCpan and netMHCIIpan, and 500nM is the decision content of a routine;
Antigen presentation abundance detecting step, including candidate tumor neoantigen is detected using antigen presentation abundance software for calculation
In each mutation forecasting peptide fragment antigen presentation abundance;In a kind of realization method of the application, specifically calculated using RSEM softwares prominent
Become the TPM values of peptide fragment as neoantigen gene expression abundance;
Clonal analytical procedure, including detected in candidate tumor neoantigen using mutant clon analysis software and be respectively mutated
Predict the Clonal of peptide fragment, the Clonal ratio characterization for accounting for tumour cell in surveyed tumor tissues with mutant cell;The application
A kind of realization method in, it is specific that the Clonal of the mutation where antigen is calculated using PyClone, and export gram of neoantigen
The probability of grand probability and subclone, that is, the probability for the clone being mutated and the probability of subclone;
Candidate tumor neoantigen synthesis marking sequence step, including according to each in formula a pair of candidate tumor neoantigen
Mutation forecasting peptide fragment is given a mark, and is sorted from high to low according to score value, chooses the high person of score value as tumor neogenetic antigen;
Formula one:Score (m)=EpitopeContent (m) × ExpressionLevel (m) × ClonalLevel
(m)
In formula one, Score (m) is the total score of mutation forecasting peptide fragment m, and EpitopeContent (m) indicates newborn anti-
The summation of the marking value of all antigen peptide fragment p with MHC affinity corresponding to former m;ExpressionLevel (m) is indicated
The antigen presentation abundance of neoantigen m;ClonalLevel (m) indicates that neoantigen m's is Clonal.
It is appreciated that the application carries out comprehensive marking sequence, the higher new life of score to all candidate tumor neoantigens
Antigen, quality is higher, and the neoantigen of high score is better as the target spot effect of cell or vaccine therapy, therefore, is selecting
When selecting application from high to low according to score value, the neoantigen of high score is preferentially selected.
It should be noted that the tumor neogenetic antigen detection method of the application, the comparison result being directly sequenced from two generations goes out
Hair, detection mutation and MHC types, and it is new to candidate tumor from multiple angles such as antigen presentation abundance, Clonal and MHC affinity
Raw antigen is given a mark, to filter out the tumor neogenetic antigen of high quality.Therefore, the tumor neogenetic antigen detection side of the application
Method has the advantage that:1) screening of a variety of variation peptide fragments can be carried out, including:Missense mutation, shearing site mutation, frameshit are prominent
Become, non-frameshit insertion and deletion, fusion;2) the Clonal of neoantigen can be detected;3) can predict simultaneously peptide fragment and MHCI and
The affinity of MHCII, and optimize affinity prediction result using many algorithms;4) for predict come peptide fragment can carry out false sun
Property filtering, including wildtype, many kinds of parameters such as homology filtering;5) according to affinity, expression and Clonal etc. anti-to new life
Original carries out marking sequence, filters out the neoantigen of high quality.
Preferably, in the tumor neogenetic antigen detection method of the application, the EpitopeContent (m) of formula one is by formula
Two calculate acquisition,
Formula two:
In formula two, EpitopeScore (p [i:I+k] indicate each mutation forecasting peptide fragment, in being with mutating acid
The heart, the front and back antigen peptide fragment p for extending k amino acid, the summation with the affinity of each MHC;I indicates to prolong before and after specific
Under the Antigenic Peptide for stretching k length, across the serial number of all Antigenic Peptides of mutation, the serial number is since 0;| p | it represents to be mutated amino
Centered on acid, the front and back peptide segment length for extending k amino acid;| p |-k is represented in the specific front and back Antigenic Peptide for extending k length
Under, across the upper limit of all Antigenic Peptide serial numbers of mutation, i.e., the summation across all Antigenic Peptide numbers of mutation;Wherein, I types
The length of k is 8,9,10 or 11 in the Antigenic Peptide of MHC, and the length of k is 15 in II type MHC Antigenic Peptides;
Preferably, EpitopeScore (p [i:I+k] it is obtained by the calculating of formula three,
Formula three:EpitopeScore (e)=∑a∈HLAσ (BindingAffinity (e, a)) × SelfFilter (e,
a)
In formula three, EpitopeScore (e) i.e. EpitopeScore (p [i:I+k] value, ∑a∈HLAσ
(BindingAffinity (e, a)) indicates the summation of the affinity of each core peptide fragment peptide fragment e and all MHC hypotypes a, σ
(BindingAffinity (e, a)) by formula four calculate obtain, SelfFilter (e, a) refer to antigen peptide fragment homology;
Formula four:
In formula four, (BindingAffinity (e, a)), e are the nature truth of a matter to σ (s) i.e. σ, and s is affinity forecasting software
The affine force value of the core peptide fragment peptide fragment e provided and the MHC of a hypotypes;
SelfFilter (e, a) value by the following method, Antigenic Peptide e, for a hypotypes of MHC homologous peptide fragment the case where,
If finding similar peptide fragment on normal human subject genome, (e, a) value is 0 to SelfFilter, and other situations are 1.
Preferably, in the tumor neogenetic antigen detection method of the application, the ExpressionLevel (m) of formula one press with
Lower method value, if the antigenic expression of mutation forecasting peptide fragment m is less than 10-3, then ExpressionLevel (m)=0;Such as
The antigenic expression of fruit mutation forecasting peptide fragment m is not less than 10-3, then ExpressionLevel (m) take antigen presentation abundance calculate
The antigen presentation Abundances of software output.Wherein, antigenic expression is less than 10-3, then be defined as it is non-express, therefore value be 0,
The antigen presentation abundance of antigenic expression, that is, antigen presentation abundance software for calculation detection;
Preferably, in the tumor neogenetic antigen detection method of the application, the ClonalLevel (m) of formula one is by formula five
It calculates and obtains,
Formula five:ClonalLevel (m)=p (Clonal) × (1-p (subclonal))
In formula five, p (Clonal) is the probability of the neoantigen clone of mutant clon analysis software output, p
(subclonal) it is the probability of the subclone of the neoantigen of mutant clon analysis software output.
Preferably, in antigen presentation abundance detecting step, antigen presentation abundance software for calculation is RSEM softwares, soft with RSEM
The TPM values for the mutation forecasting peptide fragment that part calculates are as antigen presentation abundance.
In the application, neoantigen m indicates the neoantigen of a mutagenic origin, and a mutation can generate it is very much
Antigen peptide fragment p, therefore, the formula of the application are exactly that the score value of antigen peptide fragment p with antigenic capacity all adds up, and are done
It is total score value that this mutation becomes neoantigen.Each mutation by with different MHC hypotypes point, can there are many, people
In class individual, most multipotency predicts 8 kinds now;From the point of view of the different peptide segment length k combined with MHC, can be with 5 kinds of length it is anti-
Former peptide section;Therefore there is multiple summation symbols in formula two.Mutant peptide refers to the peptide that the mutation predicted at the beginning can generate
Section, i.e. mutation forecasting peptide fragment;Antigen peptide fragment p refers to select from mutant peptide having the regular length that can be identified by MHC
All potential peptide fragments;Core peptide fragment peptide fragment e refers to after the prediction of affinity forecasting software, from all potential Antigenic Peptides
The peptide fragment for having immunogenicity come is predicted in section p, i.e. affinity is less than the antigen peptide fragment p of 500nM.
The second aspect of the application discloses a kind of tumor neogenetic antigen detection device being sequenced based on two generations, including,
Make a variation detection module, for the sequencing knot using at least two abrupt climatic change softwares to tumor sample and normal sample
The comparison file of fruit carries out the point mutation of tumour body cell and insertion and deletion mutation is detected, and takes two kinds of abrupt climatic change software inspections
The intersection gone out is as Candidate Mutant;Meanwhile fusion abrupt climatic change is carried out to the comparison file of tumor transcriptional group sequencing result,
Also it regard the fusion mutation of detection as Candidate Mutant;
MHC molecule identifies module, for HLA molecule type inspection software polysolver and BWA mem couple to be respectively adopted
The HLA molecule types of normal sample and tumor sample are detected, if the HLA molecules of the tumor sample of polysolver detections
It matches, is then exported as a result with normal sample;If it does not match, checking the HLA molecules of the tumor sample of BWA mem detections
With the match condition of normal sample, the testing result of BWA mem is exported if matching, if still mismatched, is exported
Empty result;
Make a variation annotations module, for in Candidate Mutant point mutation and insertion and deletion mutation carry out genome mutation to ammonia
The annotation of base acid mutation;
Be mutated peptide fragment prediction module, for the point mutation in Candidate Mutant, insertion and deletion is prominent and fusion mutation
Peptide fragment is predicted;It specifically includes, centered on the mutating acid of point mutation, the front and back length for extending at least ten amino acid
Mutation forecasting peptide fragment as point mutation;Centered on the mutated site of insertion and deletion mutation, few 10 amino are extended forwardly into
The length of acid extends back up to the position for reaching normal amino acid translation, the mutation forecasting peptide as insertion and deletion mutation
Section;Centered on the position of fusion of fusion mutation, at least ten amino acid with 5 ' ends is held in interception by the 3 ' of fusion
Mutation forecasting peptide fragment as fusion mutation;
It is mutated peptide fragment MHC I types and MHC II type affinity prediction modules, it is swollen for obtaining MHC molecule authentication step
The mutation forecasting peptide fragment and mutation forecasting peptide fragment that the HLA molecule types of tumor sample, mutation peptide fragment prediction steps obtain are corresponding
Input of the wild type peptide section sequence as MHC I types and MHC II type affinity forecasting softwares, respectively prediction are mutated peptide fragment and MHC
The affinity of I types and MHC II type genes is horizontal, and the affinity level of prediction is resisted less than 500nM as candidate tumor new life
It is former;
Antigen presentation abundance detection module, for detecting candidate tumor neoantigen using antigen presentation abundance software for calculation
In each mutation forecasting peptide fragment antigen presentation abundance;
Clonal analysis module is respectively mutated for using mutant clon analysis software to detect in candidate tumor neoantigen
Predict the Clonal of peptide fragment, the Clonal ratio characterization for accounting for tumour cell in surveyed tumor tissues with mutant cell;
Candidate tumor neoantigen synthesis marking sorting module, for according to each in formula a pair of candidate tumor neoantigen
Mutation forecasting peptide fragment is given a mark, and is sorted from high to low according to score value, chooses the high person of score value as tumor neogenetic antigen;
Formula one:Score (m)=EpitopeContent (m) × ExpressionLevel (m) × ClonalLevel
(m)
In formula one, Score (m) is the total score of mutation forecasting peptide fragment m, and EpitopeContent (m) indicates newborn anti-
The summation of the marking value of all antigen peptide fragment p with MHC affinity corresponding to former m;ExpressionLevel (m) is indicated
The antigen presentation abundance of neoantigen m;ClonalLevel (m) indicates that neoantigen m's is Clonal.
Preferably, in the tumor neogenetic antigen detection device of the application, the EpitopeContent (m) of formula one,
ExpressionLevel (m) and ClonalLevel (m) are calculated according to the tumor neogenetic antigen detection method of the application.
The third aspect of the application discloses a kind of tumor neogenetic antigen detection device being sequenced based on two generations, including:
Memory, for storing program;
Processor, for the program by executing memory storage to realize the tumor neogenetic antigen detection side of the application
Method.
The fourth aspect of the application discloses a kind of computer readable storage medium, including program, which can be located
Device is managed to execute to realize the tumor neogenetic antigen detection method of the application.
Due to using the technology described above, the advantageous effect of the application is:
The tumor neogenetic antigen detection method of the application, directly by two generations sequencing comparison file based on carry out mutation and
MHC is detected, and from MHC I/II types affinity, antigen presentation abundance, Clonal three dimensions to candidate tumor neoantigen
It gives a mark, can not only reduce the false positive of neoantigen screening, but also the higher neoantigen of immunogenicity can be led to
It crosses marking sequence to screen, to filter out the tumor neogenetic antigen of high quality, to control based on the immune of tumor neogenetic antigen
Treatment is laid a good foundation.
Description of the drawings
Fig. 1 is the flow diagram for the tumor neogenetic antigen detection method being sequenced based on two generations in the embodiment of the present application;
Fig. 2 is the structure diagram for the tumor neogenetic antigen detection device being sequenced based on two generations in the embodiment of the present application.
Specific implementation mode
The application is described in further detail below by specific implementation mode combination attached drawing.In the following embodiments and the accompanying drawings
In, many datail descriptions are in order to enable the application can be better understood.However, those skilled in the art can be without lifting an eyebrow
Recognize, which part feature is dispensed in varied situations, or can be by other elements, material, method institute
It substitutes.In some cases, the application it is relevant some operation there is no in the description show or describe, be in order to avoid
The core of the application is flooded by excessive description, and to those skilled in the art, these correlations are described in detail
Operation is not necessary, they can completely understand phase according to the general technology knowledge of description and this field in specification
Close operation.
As shown in Figure 1, the tumor neogenetic antigen detection method based on the sequencing of two generations of the application includes the following steps,
(1) make a variation detecting step, includes the survey using at least two abrupt climatic change softwares to tumor sample and normal sample
The comparison file of sequence result carries out the point mutation of tumour body cell and insertion and deletion mutation is detected, and takes two kinds of abrupt climatic change softwares
The intersection of detection is as Candidate Mutant;Meanwhile fusion mutation inspection is carried out to the comparison file of tumor transcriptional group sequencing result
It surveys, also regard the fusion mutation of detection as Candidate Mutant.
Wherein, the intersection of two kinds of abrupt climatic change software detection refers to two kinds of abrupt climatic change softwares all while having what is detected to dash forward
Become, in some embodiments, specifically uses VarScan and two software detection point mutation of mutect and insertion and deletion mutation;
And use STAR-Fusion detection fusion gene mutations, that is, apply STAR-Fusion to the RNA bam formatted files of comparison into
Row fusion detects.
(2) MHC molecule authentication step, including HLA molecule type inspection software polysolver and BWA mem is respectively adopted
The HLA molecule types of normal sample and tumor sample are detected, if HLA points of the tumor sample of polysolver detections
Son and normal sample matching, then export as a result;If it does not match, checking HLA points of the tumor sample of BWA mem detections
The match condition of son and normal sample exports the testing result of BWA mem if matching, defeated if still mismatched
Go out empty result.
(3) make a variation annotating step, include in Candidate Mutant point mutation and insertion and deletion mutation carry out genome mutation
To the annotation of amino acid mutation.
In some embodiments, it is specifically annotated using VEP (Variant Effect Prediction).
(4) peptide fragment prediction steps are mutated, include to the point mutation in Candidate Mutant, insertion and deletion is prominent and fusion mutation
Peptide fragment predicted;It specifically includes, centered on the mutating acid of point mutation, the front and back length for extending at least ten amino acid
Spend the mutation forecasting peptide fragment as point mutation;Centered on the mutated site of insertion and deletion mutation, few 10 ammonia are extended forwardly into
The length of base acid extends back up to the position for reaching normal amino acid translation, the mutation forecasting as insertion and deletion mutation
Peptide fragment;Centered on the position of fusion of fusion mutation, at least ten amino with 5 ' ends is held in interception by the 3 ' of fusion
Mutation forecasting peptide fragment of the acid as fusion mutation.
In some embodiments, the prediction of genome mutation peptide fragment is specifically carried out using transvar tools.
(5) peptide fragment MHC I types and MHC II type affinity prediction steps are mutated, including MHC molecule authentication step is obtained
Tumor sample HLA molecule types, mutation peptide fragment prediction steps obtain mutation forecasting peptide fragment and mutation forecasting peptide fragment pair
Input of the wild type peptide section sequence answered as MHC I types and MHC II type affinity forecasting softwares, respectively prediction are mutated peptide fragment
With the affinity of MHC I types and MHC II type genes level, it regard the affinity level of prediction as candidate tumor less than 500nM
Neoantigen.
In some embodiments, it is predicted respectively using netMHCpan and netMHCIIpan and MHC I types and MHC II types
The affinity of gene is horizontal.
(6) antigen presentation abundance detecting step, including it is newborn using antigen presentation abundance software for calculation detection candidate tumor
The antigen presentation abundance of each mutation forecasting peptide fragment in antigen.
In some embodiments, the TPM values of mutation peptide fragment are specifically calculated as neoantigen gene expression abundance using RSEM softwares.
(7) Clonal analytical procedure, including using each in mutant clon analysis software detection candidate tumor neoantigen
Clonal, the Clonal ratio characterization for accounting for tumour cell in surveyed tumor tissues with mutant cell of mutation forecasting peptide fragment.
In some embodiments, the Clonal of the mutation where antigen is specifically calculated using PyClone, and is exported newborn anti-
The probability of the probability and subclone of former clone, i.e., the probability of the probability and subclone of the clone of each mutation.
(8) candidate tumor neoantigen synthesis marking sequence step, including according to formula a pair of candidate tumor neoantigen
In each mutation forecasting peptide fragment give a mark, sort from high to low according to score value, choose the high person of score value as tumor neogenetic antigen;
Formula one:Score (m)=EpitopeContent (m) × ExpressionLevel (m) × ClonalLevel
(m)
In formula one, Score (m) is the total score of mutation forecasting peptide fragment m, and EpitopeContent (m) indicates newborn anti-
The summation of the marking value of all antigen peptide fragment p with MHC affinity corresponding to former m;ExpressionLevel (m) is indicated
The antigen presentation abundance of neoantigen m;ClonalLevel (m) indicates that neoantigen m's is Clonal.
Wherein, the EpitopeContent (m) of formula one is calculated by formula two and is obtained,
Formula two:
In formula two, EpitopeScore (p [i:I+k] indicate each mutation forecasting peptide fragment, in being with mutating acid
The heart, the front and back antigen peptide fragment p for extending k amino acid, the summation with the affinity of each MHC;I indicates to prolong before and after specific
Under the Antigenic Peptide for stretching k length, across the serial number of all Antigenic Peptides of mutation, the serial number is since 0;| p | it represents to be mutated amino
Centered on acid, the front and back peptide segment length for extending k amino acid;| p |-k is represented in the specific front and back Antigenic Peptide for extending k length
Under, across the upper limit of all Antigenic Peptide serial numbers of mutation, i.e., the summation across all Antigenic Peptide numbers of mutation;
EpitopeScore(p[i:I+k] it is obtained by the calculating of formula three,
Formula three:EpitopeScore (e)=∑a∈HLAσ (BindingAffinity (e, a)) × SelfFilter (e,
a)
In formula three, EpitopeScore (e) i.e. EpitopeScore (p [i:I+k] value, ∑a∈HLAσ
(BindingAffinity (e, a)) indicates the summation of the affinity of each core peptide fragment peptide fragment e and all MHC hypotypes a, σ
(BindingAffinity (e, a)) by formula four calculate obtain, SelfFilter (e, a) refer to antigen peptide fragment homology;
Formula four:
In formula four, (BindingAffinity (e, a)), e are the nature truth of a matter to σ (s) i.e. σ, and s is affinity forecasting software
The affine force value of the core peptide fragment peptide fragment e provided and the MHC of a hypotypes;
SelfFilter (e, a) value by the following method, Antigenic Peptide e, for a hypotypes of MHC homologous peptide fragment the case where,
If finding similar peptide fragment on normal human subject genome, (e, a) value is 0 to SelfFilter, and other situations are 1.
The ExpressionLevel (m) of formula one values by the following method, if the antigen presentation of mutation forecasting peptide fragment m
Level is less than 10-3, then ExpressionLevel (m)=0;If the antigenic expression of mutation forecasting peptide fragment m is not less than 10-3, then ExpressionLevel (m) take antigen presentation abundance software for calculation export antigen presentation Abundances.
The ClonalLevel (m) of formula one is calculated by formula five and is obtained,
Formula five:ClonalLevel (m)=p (Clonal) × (1-p (subclonal))
In formula five, p (Clonal) is the probability of the neoantigen clone of mutant clon analysis software output, p
(subclonal) it is the probability of the subclone of the neoantigen of mutant clon analysis software output.
It will be understood by those skilled in the art that all or part of function of the above embodiment method can pass through hardware
Mode is realized, can also be realized by way of computer program.When all or part of function passes through meter in the above embodiment
When the mode of calculation machine program is realized, which can be stored in a computer readable storage medium, and storage medium may include:
Read-only memory, random access memory, disk, CD, hard disk etc. execute the program to realize above-mentioned function by computer.Example
Such as, program is stored in the memory of equipment, memory Program is executed when passing through processor, you can realize it is above-mentioned whole or
Partial function.In addition, when all or part of function is realized by way of computer program in the above embodiment, the program
It can also be stored in the storage mediums such as server, another computer, disk, CD, flash disk or mobile hard disk, pass through download
Or in copying and saving to the memory of local device, or version updating is carried out to the system of local device, is held when by processor
When program in line storage, you can realize all or part of function in the above embodiment.
Therefore, as shown in Fig. 2, in one embodiment of the application, based on the tumor neogenetic antigen detection device of two generations sequencing, packet
It includes:Variation detection module 201, variation annotations module 203, mutation peptide fragment prediction module 204, is dashed forward at MHC molecule identification module 202
Become peptide fragment MHC I types and MHC II type affinity prediction module 205, antigen presentation abundance detection module 206, Clonal analysis mould
Block 207 and candidate tumor neoantigen synthesis marking sorting module 208.
Wherein, make a variation detection module 201, for using at least two abrupt climatic change softwares to tumor sample and normal sample
Sequencing result comparison file carry out the point mutation of tumour body cell and insertion and deletion mutation be detected, and take two kinds mutation inspection
The intersection of software detection is surveyed as Candidate Mutant;Meanwhile fusion is carried out to the comparison file of tumor transcriptional group sequencing result
Abrupt climatic change also regard the fusion mutation of detection as Candidate Mutant;MHC molecule identifies module 202, for being respectively adopted
HLA molecule types inspection software polysolver and BWA mem examines the HLA molecule types of normal sample and tumor sample
It surveys, if the HLA molecules of the tumor sample of polysolver detections and normal sample matching, export as a result;If no
Matching then checks the match condition of the HLA molecules and normal sample of the tumor sample of BWA mem detections, by BWA if matching
The testing result of mem exports, if still mismatched, exports empty result;Make a variation annotations module 203, for prominent to candidate
Point mutation and insertion and deletion mutation in change carry out genome mutation to the annotation of amino acid mutation;It is mutated peptide fragment prediction module
204, for predicting the point mutation in Candidate Mutant, the peptide fragment that insertion and deletion is dashed forward and fusion is mutated;It specifically includes,
Centered on the mutating acid of point mutation, mutation forecasting peptide of the front and back length for extending at least ten amino acid as point mutation
Section;Centered on the mutated site of insertion and deletion mutation, the length of few 10 amino acid is extended forwardly into, is extended back until arriving
Up to the position of normal amino acid translation, the mutation forecasting peptide fragment as insertion and deletion mutation;The fusion being mutated with fusion
Centered on site, the mutation that interception holds at least ten amino acid held with 5 ' to be mutated as fusion using the 3 ' of fusion is pre-
Survey peptide fragment;It is mutated peptide fragment MHC I types and MHC II type affinity prediction module 205, for obtain MHC molecule authentication step
The mutation forecasting peptide fragment and mutation forecasting peptide fragment that the HLA molecule types of tumor sample, mutation peptide fragment prediction steps obtain correspond to
Input of the wild type peptide section sequence as MHC I types and MHC II type affinity forecasting softwares, respectively prediction mutation peptide fragment with
The affinity of MHC I types and MHCII type genes is horizontal, and the affinity level of prediction is new as candidate tumor less than 500nM
Raw antigen;Antigen presentation abundance detection module 206, for anti-using antigen presentation abundance software for calculation detection candidate tumor new life
The antigen presentation abundance of each mutation forecasting peptide fragment in original;Clonal analysis module 207, for using mutant clon analysis software
Each mutation forecasting peptide fragment is Clonal in detection candidate tumor neoantigen, Clonal to use mutant cell in surveyed tumor tissues
Account for the ratio characterization of tumour cell;Candidate tumor neoantigen synthesis marking sorting module 208, for a pair of candidate according to formula
Each mutation forecasting peptide fragment is given a mark in tumor neogenetic antigen, is sorted from high to low according to score value, chooses the high person of score value as swollen
Tumor neoantigen.
Another embodiment of the application also provides a kind of tumor neogenetic antigen detection device being sequenced based on two generations, including:It deposits
Reservoir, for storing program;Processor, for the program by executing above-mentioned memory storage to realize following method:Variation
Detecting step includes the comparison file using at least two abrupt climatic change softwares to the sequencing result of tumor sample and normal sample
It carries out the point mutation of tumour body cell and insertion and deletion mutation is detected, take the intersection of two kinds of abrupt climatic change software detections as time
Choosing mutation;Meanwhile fusion abrupt climatic change is carried out to the comparison file of tumor transcriptional group sequencing result, by the fusion base of detection
It is also used as Candidate Mutant because being mutated;MHC molecule authentication step, including HLA molecule type inspection softwares are respectively adopted
Polysolver and BWA mem are detected the HLA molecule types of normal sample and tumor sample, if polysolver is examined
The HLA molecules and normal sample of the tumor sample of survey match, then export as a result;If it does not match, checking BWA mem inspections
The HLA molecules of the tumor sample of survey and the match condition of normal sample export the testing result of BWA mem if matching,
If still mismatched, empty result is exported;Make a variation annotating step, include in Candidate Mutant point mutation and insertion and deletion
Mutation carries out genome mutation to the annotation of amino acid mutation;Peptide fragment prediction steps are mutated, include prominent to the point in Candidate Mutant
Become, insertion and deletion is prominent and the peptide fragment of fusion mutation is predicted;It specifically includes, during the mutating acid with point mutation is
The heart, mutation forecasting peptide fragment of the front and back length for extending at least ten amino acid as point mutation;The mutation being mutated with insertion and deletion
Centered on position, the length of few 10 amino acid is extended forwardly into, is extended back until reaching the position of normal amino acid translation
It sets, the mutation forecasting peptide fragment as insertion and deletion mutation;Centered on the position of fusion of fusion mutation, interception will merge base
The mutation forecasting peptide fragment that 3 ' ends of cause and at least ten amino acid at 5 ' ends are mutated as fusion;It is mutated peptide fragment MHC I types
With MHC II type affinity prediction steps, includes the HLA molecule types for the tumor sample for obtaining MHC molecule authentication step, dashes forward
The mutation forecasting peptide fragment and the corresponding wild type peptide section sequence of mutation forecasting peptide fragment that change peptide fragment prediction steps obtain are as MHC
The input of I types and MHC II type affinity forecasting softwares, respectively prediction are mutated the parent of peptide fragment and MHC I types and MHC II type genes
With power level, it regard the affinity level of prediction as candidate tumor neoantigen less than 500nM;Antigen presentation abundance detection step
Suddenly, include the antigen presentation that each mutation forecasting peptide fragment in candidate tumor neoantigen is detected using antigen presentation abundance software for calculation
Abundance;Clonal analytical procedure, including using respectively mutation is pre- in mutant clon analysis software detection candidate tumor neoantigen
Survey the Clonal of peptide fragment, the Clonal ratio characterization for accounting for tumour cell in surveyed tumor tissues with mutant cell;Candidate tumor
Neoantigen synthesis marking sequence step, including carried out according to each mutation forecasting peptide fragment in formula a pair of candidate tumor neoantigen
Marking, sorts from high to low according to score value, chooses the high person of score value as tumor neogenetic antigen.
The application another kind embodiment also provides a kind of computer readable storage medium, including program, which can be by
Processor is executed to realize following method:Make a variation detecting step, including using at least two abrupt climatic change softwares to tumor sample
The point mutation of tumour body cell is carried out with the comparison file of the sequencing result of normal sample and insertion and deletion mutation is detected, and takes two
The intersection of kind abrupt climatic change software detection is as Candidate Mutant;Meanwhile the comparison file of tumor transcriptional group sequencing result is carried out
Fusion abrupt climatic change also regard the fusion mutation of detection as Candidate Mutant;MHC molecule authentication step, including respectively
Using HLA molecule types inspection software polysolver and BWA mem to the HLA molecule types of normal sample and tumor sample into
Row detection, if the HLA molecules of the tumor sample of polysolver detections and normal sample matching, export as a result;Such as
Fruit mismatches, then the match condition of the HLA molecules and normal sample of the tumor sample of BWA mem detections is checked, if matching
The testing result of BWA mem is exported, if still mismatched, exports empty result;Make a variation annotating step, including to candidate
Point mutation and insertion and deletion mutation in mutation carry out genome mutation to the annotation of amino acid mutation;It is mutated peptide fragment prediction step
Suddenly, include predicting the point mutation in Candidate Mutant, the peptide fragment that insertion and deletion is dashed forward and fusion is mutated;It specifically includes,
Centered on the mutating acid of point mutation, mutation forecasting peptide of the front and back length for extending at least ten amino acid as point mutation
Section;Centered on the mutated site of insertion and deletion mutation, the length of few 10 amino acid is extended forwardly into, is extended back until arriving
Up to the position of normal amino acid translation, the mutation forecasting peptide fragment as insertion and deletion mutation;The fusion being mutated with fusion
Centered on site, the mutation that interception holds at least ten amino acid held with 5 ' to be mutated as fusion using the 3 ' of fusion is pre-
Survey peptide fragment;Peptide fragment MHC I types and MHC II type affinity prediction steps are mutated, include swelling what MHC molecule authentication step obtained
The mutation forecasting peptide fragment and mutation forecasting peptide fragment that the HLA molecule types of tumor sample, mutation peptide fragment prediction steps obtain are corresponding
Input of the wild type peptide section sequence as MHC I types and MHC II type affinity forecasting softwares, respectively prediction are mutated peptide fragment and MHC
The affinity of I types and MHC II type genes is horizontal, and the affinity level of prediction is resisted less than 500nM as candidate tumor new life
It is former;Antigen presentation abundance detecting step, including using each in antigen presentation abundance software for calculation detection candidate tumor neoantigen
The antigen presentation abundance of mutation forecasting peptide fragment;Clonal analytical procedure, including it is candidate using the detection of mutant clon analysis software
Each mutation forecasting peptide fragment is Clonal in tumor neogenetic antigen, and Clonal with mutant cell in surveyed tumor tissues to account for tumour thin
The ratio of born of the same parents characterizes;Candidate tumor neoantigen synthesis marking sequence step, including it is anti-according to formula a pair of candidate tumor new life
Each mutation forecasting peptide fragment is given a mark in original, is sorted from high to low according to score value, chooses the high person of score value as tumor neogenetic antigen.
The application is described in further detail below by specific embodiments and the drawings.Following embodiment is only to the application
It is further described, should not be construed as the limitation to the application.
Embodiment 1
This example utilizes Yadav, Mahesh, et al. " Predicting immunogenic tumour mutations
by combining mass spectrometry and exome sequencing."Nature 515.7528(2014):
The data delivered in 572. documents (hereinafter referred to as document 1):The tumor sample of mouse model MC-38 and normal sample it is outer
Aobvious subdata and transcript profile data;Using the tumor neogenetic antigen detection method being sequenced based on two generations, it is new that tumour is carried out to it
Raw antigen detection, it is specific as follows:
(1) variation detection
The bam files compared by the DNA sequencing to tumor sample and normal sample, use VarScan and mutect
Two software detection tumour body cell point mutation (single nucleotide variant, SNV) and insertion and deletion
(insertion and deletion, InDel).The mutation of high quality in order to obtain, using the intersection of two software as height
The Candidate Mutant of quality.Detection for fusion carries out the RNA bam formatted files of comparison using STAR-Fusion
Detection.
(2) MHC molecule is identified
In order to check that the type of MHC-I and MHC-II molecules, this example use polysolver detection normal samples and tumour
The HLA molecule types of sample.If matched with the polysolver HLA molecules checked in tumour and normal sample, make
It exports, is checked if mismatching in BWAmem as a result, if BWAmem's as a result, it has been found that normal sample and tumour for result
Sample matches then use BWA mem's as a result, if also mismatched, and export empty result.
(3) variation annotation
For point mutation and insertion and deletion, genome is completed using VEP (Variant Effect Prediction) tool
It is mutated the annotation of amino acid mutation.
(4) mutation peptide fragment prediction
For point mutation and insertion and deletion, the prediction of genome mutation peptide fragment is completed using transvar tools.Point mutation
Centered on mutating acid, the front and back length for extending 10 (MHC II 14) a amino acid is as final mutation peptide fragment.It is inserted into and lacks
Mutation is lost, centered on mutated site, extends the length of 10 (MHC II 14) a amino acid forward, is extended back until reaching
The position of normal amino acid translation.
The peptide fragment of fusion is the 10 (MHC that 3 ' ends of fusion and 5 ' are held in interception centered on position of fusion
II 14) a amino acid is as final mutation peptide fragment.
(5) mutation peptide fragment MHC I/II type affinity prediction
Mutation peptide section sequence that the HLA molecule partings and (4) step of the patient that (2) step is obtained obtain and corresponding
Input of the wild type peptide section sequence as netMHCpan and netMHCIIpan softwares is predicted and MHC I types and MHC II respectively
The affinity of type gene is horizontal.Affinity level is less than the potential tumor neogenetic antigen result of conduct of 500nM in prediction result.
(6) neoantigen gene expression abundance detects
RESM softwares are used to calculate the TPM values of mutation peptide fragment as neoantigen gene expression abundance.
(7) neoantigen clonal analysis
The Clonal of the mutation where antigen, the ratio of the Clonal tumour cell shared with mutation are calculated using PyClone
Example is weighed.
(8) neoantigen synthesis marking sequence
Generally, shown in the marking formula one of neoantigen peptide fragment
Formula one:Score (m)=EpitopeContent (m) × ExpressionLevel (m) × ClonalLevel
(m)
In formula one, Score (m) is the total score of mutation forecasting peptide fragment m, and EpitopeContent (m) indicates newborn anti-
The summation of the marking value of all antigen peptide fragment p with MHC affinity corresponding to former m;ExpressionLevel (m) is indicated
The antigen presentation abundance of neoantigen m;ClonalLevel (m) indicates that neoantigen m's is Clonal.
Wherein, the EpitopeContent (m) of formula one is calculated by formula two and is obtained,
Formula two:
In formula two, EpitopeScore (P [i:I+k] indicate each mutation forecasting peptide fragment, in being with mutating acid
The heart, the front and back antigen peptide fragment p for extending k amino acid, the summation with the affinity of each MHC;I indicates to prolong before and after specific
Under the Antigenic Peptide for stretching k length, across the serial number of all Antigenic Peptides of mutation, the serial number is since 0;| p | it represents to be mutated amino
Centered on acid, the front and back peptide segment length for extending k amino acid;| p |-k is represented in the specific front and back Antigenic Peptide for extending k length
Under, across the upper limit of all Antigenic Peptide serial numbers of mutation, i.e., the summation across all Antigenic Peptide numbers of mutation;
EpitopeScore(p[i:I+k] it is obtained by the calculating of formula three,
Formula three:EpitopeScore (e)=∑a∈HLAσ (BindingAffinity (e, a)) × SelfFilter (e,
a)
In formula three, EpitopeScore (e) i.e. EpitopeScore (p [i:I+k] value, ∑a∈HLAσ
((e a) indicates each core peptide fragment peptide fragment e and all MHC hypotypes a to BindingAffinity (e, a)) × SelfFilter
Affinity summation, σ (BindingAffinity (e, a)) by formula four calculate obtain, (e refers to a) resisting to SelfFilter
The homology of former peptide section;
Formula four:
In formula four, (BindingAffinity (e, a)), e are the nature truth of a matter to σ (s) i.e. σ, and s is affinity forecasting software
The affine force value of the core peptide fragment peptide fragment e provided and the MHC of a hypotypes.
(e can a) be obtained SelfFilter with following formula:
SelfFilter (e, a) calculation formula be described as follows:Antigenic Peptide e, for the feelings of the homologous peptide fragment of a hypotypes of MHC
Condition, if finding similar peptide fragment on normal human subject genome, (e, a) value is 0 to SelFilter, and other situations are 1.
The ExpressionLevel (m) of formula one is obtained by following formula,
ExpressionLevel (m) formula are described as follows:If the antigenic expression of mutation forecasting peptide fragment m is less than
10-3, then ExpressionLevel (m)=0;If the antigenic expression of mutation forecasting peptide fragment m is not less than 10-3, then
ExpressionLevel (m) takes the antigen presentation Abundances that antigen presentation abundance software for calculation exports.
The ClonalLevel (m) of formula one is calculated by formula five and is obtained,
Formula five:ClonalLevel (m)=p (Clonal) × (1-p (subclonal))
In formula five, p (Clonal) is the probability of the neoantigen clone of mutant clon analysis software output, p
(subclonal) it is the probability of the subclone of mutant clon analysis software output.
The two generation sequencing datas of the mouse model MC-38 delivered to document 1 according to above method are analyzed, finally from
In the mutation in 1290 transcript profile regions that document 1 discloses, screening obtains 64 tumor neogenetic antigens, wherein containing document
3 tumor neogenetic antigens being proved to be successful using mass-spectrometric technique in 1.And document 1 finds 1290 for exon region and turns altogether
The mutation in record group region, predicts 170 neoantigens, and 3 have been proved to be successful using mass-spectrometric technique.It will be from original false positive
63.5% result is eliminated in prediction result.
Embodiment 2
Using delivering data ICC24 (Sia D, Losic B, Moeini A, et al.Massive parallel
sequencing uncovers actionable FGFR2-PPHLN1fusion and ARAF mutations in
intrahepatic cholangiocarcinoma.[J].Nature Communications,2015,6:6087-6087.),
Neoantigen detection is carried out to it using the tumor neogenetic antigen detection method of embodiment 1.The results show that Application Example 1
Method, detection obtain 5 Antigenic Peptides that can be identified by HLA, can be known by HLA-01 including the fusion of ICC medium-high frequencies
Not, the fusion FGFR2-PPHLN1 of intrahepatic cholangiocarcinoma is derived from.As it can be seen that the tumor neogenetic antigen using embodiment 1 detects
Method, it was found that new tumor neogenetic antigen in cholangiocellular carcinoma.Late period cholangiocellular carcinoma does not have good treatment means, existence
Rate is low;Neoantigen is obtained by the method detection of embodiment 1, it was found that the novel therapeutic modality of cholangiocellular carcinoma is courage
The treatment of solencyte cancer provides a kind of new scheme and approach.
Embodiment 3
Neoantigen detection, 288 intrahepatic cholangiocarcinoma samples are carried out using 288 intrahepatic cholangiocarcinoma (ICC) samples of this method pair
This derives from following 4 documents:
Hiromi Nakamura,Yasuhito Arai1,Yasushi Totoki,et al.Genomic spectra
of biliary tract cancer.[J].Nature Genetics,2015,47(9):1003.
Shanshan Zou,Jiarui Li,Huabang Zhou,et al.Mutational landscape of
intrahepatic cholangiocarcinoma.[J].Nature Communications,2014,5:5696.
Yuchen Jiao,Timothy M Pawlik,Robert A Anders,et al.Exome sequencing
identifies frequent inactivating mutations in BAP1,ARID1A and PBRM1 in
intrahepatic cholangiocarcinomas.[J].Nature Genetics,2013,45(12):1470-U93.
Sia D,Losic B,Moeini A,et al.Massive parallel sequencing uncovers
actionable FGFR2–PPHLN1 fusion and ARAF mutations in intrahepatic
cholangiocarcinoma.[J].Nature Communications,2015,6:6087-6087.
The analysis result of 18813 nonsynonymous mutations of 288 ICC samples is shown, each ICC sample means can be looked for
The mutant antigen peptide that can be identified to 22.8 by the HLA genotype of crowd's medium-high frequency, wherein it is clonal to have 62%
mutation.Illustrate these samples when not suitable targeted drug, the side of accurate cellular immunotherapy can be applied
Method treats patient.
The foregoing is a further detailed description of the present application in conjunction with specific implementation manners, and it cannot be said that this Shen
Specific implementation please is confined to these explanations.For those of ordinary skill in the art to which this application belongs, it is not taking off
Under the premise of conceiving from the application, a number of simple deductions or replacements can also be made, all shall be regarded as belonging to the protection of the application
Range.