CN117174166B

CN117174166B - Tumor neoantigen prediction method and system based on third-generation sequencing data

Info

Publication number: CN117174166B
Application number: CN202311401140.5A
Authority: CN
Inventors: 张函槊; 闫柏先
Original assignee: Genex Health Co Ltd
Current assignee: Genex Health Co Ltd
Priority date: 2023-10-26
Filing date: 2023-10-26
Publication date: 2024-03-26
Anticipated expiration: 2043-10-26
Also published as: CN117174166A

Abstract

The invention relates to the technical field of bioinformatics, and discloses a tumor neoantigen prediction method and system based on third-generation sequencing data, wherein the method comprises the following steps: receiving third generation whole exon sequencing data of a sample; receiving HLA typing information of a sample; mapping and annotating third generation full exon sequencing data; carrying out structural variation analysis on the sample according to the annotation and the transcript information; carrying out SNP analysis on the sample according to the annotation, the transcript information and the structural variation information; obtaining specific polypeptide information according to SNP information of a tumor sample and SNP information of a control sample; netMHCpan analysis is performed on tumor samples based on specific polypeptide information and HLA typing information for tumor samples to predict tumor neoantigens. The invention fully utilizes the advantages of the third generation sequencing technology, has high precision and good accuracy for predicting the tumor neoantigen aiming at the specific polypeptide of the tumor sample, and can bring great economic value to society.

Description

Tumor neoantigen prediction method and system based on third-generation sequencing data

Technical Field

The invention relates to the technical field of bioinformatics, in particular to a tumor neoantigen prediction method and system based on third-generation sequencing data.

Background

The third generation sequencing technique, also known as the slave sequencing technique, refers to single molecule sequencing techniques. In DNA sequencing, third generation sequencing techniques do not require PCR amplification to achieve separate sequencing of each DNA molecule.

The third generation test technology is classified according to principles mainly including two kinds: the first is a single-molecule fluorescence sequencing method, wherein fluorescent labeling is firstly carried out on deoxynucleotide, then fluorescence intensity change is observed in real time through a microscope to realize sequencing, when the fluorescent labeled deoxynucleotide is doped into a DNA chain, the fluorescent labeling can be detected on the DNA chain at the same time, when the fluorescent labeled deoxynucleotide forms a chemical bond with the DNA chain, a fluorescent group is excised by DNA polymerase, the fluorescent labeling disappears, the activity of the DNA polymerase is not influenced by the fluorescent labeled deoxynucleotide, and after the fluorescent group is excised, the synthesized DNA chain is identical with a natural DNA chain; the second is nanopore sequencing (nanopore sequencing), which uses electrophoresis to drive individual molecules through the nanopore one by one to achieve sequencing, which is achieved by allowing only a single nucleic acid polymer to pass through due to the very small diameter of the nanopore, while ATCG single bases have different charged properties, and by observing the difference in electrical signals, the type of base passed can be detected.

Compared with the first generation sequencing technology and the second generation sequencing technology, the third generation sequencing technology can measure longer DNA fragments, and has high data quality, high accuracy and wide coverage. At present, the third generation sequencing technology is needed to be applied to tumor neoantigen prediction so as to improve the efficiency and quality of tumor neoantigen prediction, and a powerful knowledge base is provided for medical research.

Disclosure of Invention

The invention provides a tumor neoantigen prediction method and a tumor neoantigen prediction system based on third-generation sequencing data, which are used for improving the efficiency and quality of tumor neoantigen prediction.

The invention provides a tumor neoantigen prediction method based on third generation sequencing data, which comprises the following steps:

s1, respectively carrying out S101-S105 on a tumor sample and a control sample to obtain SNP information of the tumor sample and SNP information of the control sample, wherein the tumor sample and the control sample are collectively called as samples in S101-S105;

s101, receiving third-generation full exon sequencing data of a sample;

s102, receiving HLA typing information of a sample;

s103, mapping and annotating third generation full exon sequencing data to obtain annotation and transcript information of a sample;

s104, carrying out structural variation analysis on the sample according to the annotation and the transcript information to obtain structural variation information of the sample;

s105, carrying out SNP analysis on the sample according to the annotation, the transcript information and the structural variation information to obtain SNP information of the sample;

s2, obtaining specific polypeptide information according to SNP information of a tumor sample and SNP information of a control sample;

s3, according to the specific polypeptide information and HLA typing information of the tumor sample, carrying out NetMHCpan analysis on the tumor sample so as to predict tumor neoantigens.

According to the tumor neoantigen prediction method based on the third generation sequencing data provided by the invention, the step S105 comprises the following steps:

s10501, marking annotation information of each transcript according to the annotation, the transcript information and the structural variation information;

s10502, performing minimum 2 mapping on the transcripts to obtain mapping information;

s10503, comparing the annotation information with the mapping information, and reserving the mapping information consistent with the annotation information;

s10504, carrying out SNP analysis on the sample according to the mapping information consistent with the annotation information, and obtaining SNP information of transcripts in the sample.

According to the tumor neoantigen prediction method based on the third generation sequencing data provided by the invention, the step S2 comprises the following steps:

s201, comparing SNP information of a tumor sample with SNP information of a control sample, and reserving the Somatic SNP information (Somatic mutation SNP information);

s202, filtering out the information of the Somatic SNP which only appears once in the tumor sample;

s203, comparing the structural variation information of the tumor sample with the structural variation information of the control sample, and reserving the structural variation information of the Somatic;

s204, merging transcripts containing filtered-out Somatic SNP information and transcripts containing Somatic structural variation information in tumor samples to obtain a transcript set;

s205, transcript base sequence information is obtained for transcript sets according to reference information (such as annotation files, reference sequences and the like);

s206, correcting the base sequence information of the transcript according to the filtered information of the Somatic SNP and the filtered information of the variation of the Somatic structure;

s207, retaining CDS region sequence information which can be translated into polypeptide in the transcript according to the reference information and the corrected transcript base sequence information;

s208, filtering CDS region sequence information without an initiation codon at the position of a reference initiation codon;

s209, translating according to the filtered CDS region sequence information to obtain peptide chain information;

s210, cutting a peptide chain into a plurality of polypeptides according to the peptide chain information to obtain polypeptide information;

s211, filtering out translatable polypeptide information obtained by normal translation of the reference sequence to obtain specific polypeptide information.

According to the tumor neoantigen prediction method based on the third generation sequencing data, provided by the invention, the tumor neoantigen prediction method further comprises the following steps:

and S4, carrying out information filtering, backtracking and arrangement according to the NetMHCpan analysis result to obtain antigen prediction information.

According to the tumor neoantigen prediction method based on the third generation sequencing data provided by the invention, the step S4 comprises the following steps:

s401, retaining peptide related information of which the EL rank is less than or equal to 2 according to NetMHCpan analysis result information;

s402, backtracking the reserved polypeptide according to the related information and the specific polypeptide information, and finishing the position information and start-stop information of the polypeptide in the transcript and the structural variation information and SNP information contained in the transcript to obtain antigen prediction information.

Note that the official definition of EL rank is: rank of the predicted Affinity compared to a set of 400.000.000. 400.000 random natural peptides. This measure is not affected by inherent bias of certain molecules towards higher or lower mean predicted affinities. Strong binders are defined as having% Rank <0.5, and weak binders with% Rank <2. We advise to select candidate binders based on% Rank rather than nM Affinity (predicted Affinity class compared to a set of 400.000 random natural peptides. This measure is not affected by the inherent bias of certain molecules to higher or lower average predicted affinities. Strong binders are defined as% Rank <0.5, weak binders are defined as% Rank <2. It is recommended that candidate binders be selected based on% Rank instead of nM Affinity).

According to the tumor neoantigen prediction method based on the third generation sequencing data provided by the invention, step S103 utilizes software (such as TAGET software) special for the third generation sequencing data to map and annotate the sample.

According to the tumor neoantigen prediction method based on the third-generation sequencing data provided by the invention, step S104 is used for carrying out structural variation analysis on a sample by utilizing software (such as TAGET-sv software) special for the third-generation sequencing data.

The invention also provides a tumor neoantigen prediction system based on the third generation sequencing data, which comprises the following steps:

a tumor-control sample processing module for: steps S101-S105 are performed on the tumor sample and the control sample, respectively, to obtain SNP information of the tumor sample and SNP information of the control sample, where in steps S101-S105, the tumor sample and the control sample are collectively referred to as a sample:

s101, receiving third-generation full exon sequencing data of a sample;

s102, receiving HLA typing information of a sample;

a specific polypeptide information acquisition module for: obtaining specific polypeptide information according to SNP information of a tumor sample and SNP information of a control sample;

a tumor neoantigen prediction module for: netMHCpan analysis is performed on tumor samples based on specific polypeptide information and HLA typing information for tumor samples to predict tumor neoantigens.

The invention also provides electronic equipment, which comprises a processor and a memory storing a computer program, and is characterized in that the processor executes the computer program to realize the tumor neoantigen prediction method of any one of the above.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements any of the above-described tumor neoantigen prediction methods.

The present invention also provides a computer program product comprising a computer program storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of performing any one of the above-described tumor neoantigen prediction methods.

According to the tumor new antigen prediction method and system based on the third-generation sequencing data, the advantages of high average reading length, short sequencing time and no need of amplification of the third-generation sequencing technology are utilized, the third-generation whole-exon sequencing technology is adopted to sequence the sample, the accuracy and the completeness of the data are effectively ensured, the third-generation whole-exon sequencing data are mapped and annotated, the structural variation analysis and the SNP analysis are carried out on the sample according to the annotation and transcript information, the SNP information of the tumor sample and the SNP information of a control sample are compared, specific polypeptide information is obtained, the specific polypeptide information and HLA typing information of the tumor sample are combined, the NetMHCpan analysis is carried out on the tumor sample, the tumor new antigen can be predicted in a high quality and high efficiency, and important innovation value is brought to the tumor new antigen prediction, and great economic benefits are brought to medical research.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following brief description will be given of the drawings used in the embodiments or the description of the prior art, it being obvious that the drawings in the following description are some embodiments of the invention and that other drawings can be obtained from them without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a tumor neoantigen prediction method based on three-generation sequencing data.

Fig. 2 is a schematic structural diagram of a tumor neoantigen prediction system based on three-generation sequencing data.

Fig. 3 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions thereof will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments, which should not be construed as limiting the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention. In the description of the present invention, it is to be understood that the terminology used is for the purpose of description only and is not to be interpreted as indicating or implying relative importance.

The tumor neoantigen prediction method and system based on the third generation sequencing data provided by the invention are described below with reference to fig. 1-3.

FIG. 1 is a flow chart of a tumor neoantigen prediction method based on third generation sequencing data. Referring to fig. 1, the tumor neoantigen prediction method based on the third generation sequencing data provided by the invention can include:

s1, respectively carrying out S101-S105 on a tumor sample and a control sample to obtain SNP information of the tumor sample and SNP information of the control sample, wherein the tumor sample and the control sample are collectively called as a sample in S101-S105.

It should be noted that the sample may be any effective sample meeting the sequencing standard, the tumor sample may be a tumor sample separated from a human body, the control sample may be a normal sample separated from a human body, the tumor sample may also be a tumor sample separated from a mouse, a pig or the like, and the control sample may be a normal sample of a corresponding species, that is, the species types of the tumor sample and the control sample need to be unified.

S101, receiving third generation full exon sequencing data of a sample. In one embodiment, the third generation whole exon sequencing data obtained in S101 can be obtained by third generation whole exon sequencing of a sample based on platforms (e.g., capture kit, sequencer, etc.) well known in the art.

S102, receiving HLA typing information of the sample. In one embodiment, the HLA typing information obtained in S102 may be known in advance. For example, animal models are purchased with associated HLA typing information. If the typing of the sample is unknown, the sample may be HLA typed by techniques well known in the art, such as serologic typing techniques, cytological typing techniques, DNA typing techniques, and the like.

And S103, mapping and annotating the third generation whole exon sequencing data to obtain annotation and transcript information of the sample. In one embodiment, S103 can compare the third generation whole exon sequencing data with reference genomic data (e.g., human reference genomic data, hg38, mm10, the species of the reference genomic data is also to be uniform with the species of the sample) by TAGET software specific to the third generation sequencing data to map and annotate the sample to annotation and transcript information of the sample.

S104, carrying out structural variation analysis on the sample according to the annotation and the transcript information to obtain the structural variation information of the sample. In one embodiment, S104 may perform structural variation analysis on the sample by TAGET-sv software specific for the third generation sequencing data.

S105, carrying out SNP analysis on the sample according to the annotation, the transcript information and the structural variation information to obtain SNP information of the sample. In one embodiment, step S105 may include S10501-S10504:

s10501, marking the annotation information of each transcript according to the annotation and the transcript information and the structural variation information.

S10502, performing minimum 2 mapping on the transcripts to obtain mapping information.

S10503, comparing the annotation information with the mapping information, and reserving the mapping information consistent with the annotation information.

S2, obtaining specific polypeptide information according to the SNP information of the tumor sample and the SNP information of the control sample. In one embodiment, step S2 may include S201-S211:

s201, comparing SNP information of a tumor sample with SNP information of a control sample, and retaining the genomic SNP information (Somatic mutation SNP information).

S202, filtering out the information of the Somatic SNP which only appears once in the tumor sample.

S203, comparing the structural variation information of the tumor sample with the structural variation information of the control sample, and retaining the structural variation information of the Somatic.

S204, merging transcripts containing filtered-out Somatic SNP information and transcripts containing Somatic structural variation information in tumor samples to obtain a transcript set.

S205, transcript base sequence information is obtained for the transcript set according to the reference information (for example, including the reference sequence, the annotation file and the like).

S206, correcting the base sequence information of the transcripts according to the filtered information of the Somatic SNP and the filtered information of the Somatic structural variation.

S207, retaining CDS region sequence information which can be translated into polypeptide in the transcript according to the reference information and the corrected transcript base sequence information.

S208, filtering CDS region sequence information without an initiation codon at the position of the reference initiation codon.

S209, translating according to the filtered CDS region sequence information to obtain peptide chain information.

S210, cutting the peptide chain into a plurality of polypeptides according to the peptide chain information to obtain the polypeptide information. In one embodiment, S210 may cleave a peptide chain into polypeptides containing 8-15 peptides, respectively.

S211, filtering the polypeptide information of which the reference sequence can be translated normally to obtain specific polypeptide information.

S3, according to the specific polypeptide information and HLA typing information of the tumor sample, carrying out NetMHCpan analysis on the tumor sample so as to predict tumor neoantigens. In one embodiment, S3 may perform NetMHCpan analysis on tumor samples by existing NetMHCpan software with version number 4.1.

NetMHCpan is an artificial neural network-based immune epitope prediction algorithm for predicting the affinity of peptide fragments to MHC class I molecules, and is constructed by taking the combination of more than 180000 quantitative binding data from MHC molecules of multiple species such as human, mouse, pig and the like and MS-derived MHC eluting ligand data from HLA alleles of 55 persons and mice as a training set. The algorithm does not require any prior knowledge of the specific MHC molecule and has a high degree of accuracy.

And S4, carrying out information filtering, backtracking and arrangement according to the NetMHCpan analysis result to obtain antigen prediction information. In one embodiment, step S4 may include S401-S402:

s401, according to NetMHCpan analysis result information, retaining peptide related information of which EL rank is less than or equal to 2, wherein the mark of which EL rank is less than or equal to 0.5 is strong binding, and the mark of which EL rank is within the range of 0.5-2 is weak binding.

S402, backtracking the reserved polypeptide according to the related information and the specific polypeptide information, and finishing the position information and start-stop information of the polypeptide in the transcript and the structural variation information and SNP information contained in the transcript to obtain antigen prediction information so as to record the polypeptide required by antigen prediction and the position and the production reason of the polypeptide, thereby facilitating the verification and the test of subsequent experiments.

It should be noted that, the execution subject of the tumor neoantigen prediction method provided by the present invention may be any terminal-side device meeting technical requirements, such as a tumor neoantigen prediction apparatus.

According to the tumor neoantigen prediction method provided by the invention, the advantage that each transcript is sequenced independently in the whole exon sequencing process by the third generation sequencing technology compared with the second generation sequencing technology is taken into consideration, so that the specificity of translating into a peptide chain can be ensured according to the condition that each transcript has different structural variation and SNP, after the third generation whole exon sequencing and HLA analysis are carried out on a sample, mapping and annotation are carried out on the third generation whole exon sequencing data, then structural variation analysis and SNP analysis are carried out, and the analysis treatment of the somatic is carried out according to the information of a control sample, so that the specific transcript related to the cancer is finally obtained, which is not achieved by the second generation sequencing technology, and is the core of the invention; the specific transcripts are translated to obtain specific peptide chains, polypeptides with the length of 8-15 amino acids are cut according to the requirement, and normal polypeptides which can be translated by the reference transcripts under normal conditions are filtered out to obtain specific polypeptides related to cancers; finally, the netMHCpan technology is combined with sample HLA typing to carry out high-efficiency and high-quality tumor neoantigen predictive analysis on the specific polypeptides, then the result is filtered, and the generation reason of the specific polypeptides is traced back, so that the subsequent experimental verification and test are facilitated. The tumor neoantigen prediction method provided by the invention has low calculation resource requirement and high benefit, can bring important innovation value and huge economic benefit for tumor neoantigen prediction, and provides a substantial knowledge base for medical research.

Furthermore, the tumor neoantigen prediction method provided by the invention is adopted to predict the tumor neoantigen of the tumor sample of the mouse. Third generation full length transcriptome sequencing was performed on mouse tumor samples and mouse control samples, respectively, and a total of 7 groups of mouse tumor samples were used for tumor neoantigen prediction, and the HLA class of the experimental mice was known to be H-2-Db, H-2-Kb class (HLA class of the experimental mice was provided by the mouse company, and HLA analysis was not necessary). Specific mouse information is shown in table 1 below:

mapping and annotating the sample according to the GRCm39 standard reference file, searching structural variation and SNP analysis steps, obtaining polypeptides with the length of 8-15 amino acids, obtaining specific polypeptide information, carrying out NetMHCpan analysis according to the obtained specific polypeptide information and H-2-Db, H-2-Kb typing, and predicting new antigen results as shown in the following table 2.

Annotation:

sample: tumor sample number;

8-15 mer peptides: number of tumour neoantigens meeting the screening criteria in polypeptides of 8-15 amino acids in length.

Experimental results show that the tumor neoantigen prediction method provided by the invention can be used for efficiently predicting the tumor neoantigen of the sample, and fully considers specific polypeptides generated during translation of transcripts due to structural variation and SNP, and has low false positive and high precision of the prediction result.

The tumor neoantigen predicting system provided by the invention is described below, and the tumor neoantigen predicting system described below and the tumor neoantigen predicting method described above can be referred to correspondingly.

Referring to fig. 2, the tumor neoantigen prediction system based on the third generation sequencing data provided by the present invention may include:

tumor-control sample processing module a for: steps S101-S105 are performed on the tumor sample and the control sample, respectively, to obtain SNP information of the tumor sample and SNP information of the control sample, where in steps S101-S105, the tumor sample and the control sample are collectively referred to as a sample:

s101, receiving third-generation full exon sequencing data of a sample;

s102, receiving HLA typing information of a sample;

specific polypeptide information is obtained as module B for: obtaining specific polypeptide information according to SNP information of a tumor sample and SNP information of a control sample;

tumor neoantigen prediction module C for: netMHCpan analysis is performed on tumor samples based on specific polypeptide information and HLA typing information for tumor samples to predict tumor neoantigens.

It should be noted that, steps S101-S105 may be executed uniformly by the tumor-control sample processing module a, or the tumor-control sample processing module a may include five sub-modules:

a sequencing sub-module for: receiving third generation whole exon sequencing data of a sample;

an HLA analysis submodule for: receiving HLA typing information of a sample;

a mapping and annotation sub-module for: mapping and annotating the third generation full exon sequencing data to obtain annotation and transcript information of the sample;

a structural variation analysis sub-module for: carrying out structural variation analysis on the sample according to the annotation and the transcript information to obtain structural variation information of the sample;

a SNP analysis submodule to: and carrying out SNP analysis on the sample according to the annotation, the transcript information and the structural variation information to obtain SNP information of the sample.

In one embodiment, specific polypeptide information obtaining module B may include:

the submodule is used for obtaining the SNP information: comparing SNP information of the tumor sample with SNP information of a control sample, and reserving the information of the genomic SNP;

a first filtering sub-module for: filtering out the information of the Somatic SNP which only appears once in the tumor sample;

the submodule is used for obtaining the mutation information of the Somatic structure and is used for: comparing the structural variation information of the tumor sample with the structural variation information of the control sample, and reserving the structural variation information of the soy;

the transcript sets result in a sub-module for: combining transcripts containing filtered Somatic SNP information and transcripts containing Somatic structural variation information to obtain a transcript set;

the base sequence information of the transcripts is used for obtaining a submodule for: acquiring transcript base sequence information from the transcript set according to the reference information;

a transcript base sequence information modifier module for: correcting the base sequence information of the transcript according to the filtered information of the Somatic SNP and the filtered information of the variation of the Somatic structure;

a second filtering sub-module for: according to the reference information and the corrected base sequence information of the transcript, preserving CDS region sequence information which can be translated into polypeptide in the transcript;

a third filtering sub-module for: filtering CDS region sequence information in which an initiation codon does not exist at a reference initiation codon position;

the peptide chain information yields a sub-module for: translating according to the filtered CDS region sequence information to obtain peptide chain information;

the polypeptide information is provided as a sub-module for: cutting a peptide chain into a plurality of polypeptides according to the peptide chain information to obtain polypeptide information;

specific polypeptide information is provided as a sub-module for: filtering the translatable polypeptide information to obtain specific polypeptide information.

In one embodiment, the tumor neoantigen prediction system may further comprise:

the prediction result sorting module D is configured to: and (5) carrying out information filtering, backtracking and finishing according to the NetMHCpan analysis result to obtain antigen prediction information.

In one embodiment, the prediction result sorting module D may include:

a fourth filtering sub-module for: according to NetMHCpan analysis result information, retaining peptide related information of which EL rank is less than or equal to 2;

antigen predictive information is obtained as a sub-module for: and backtracking the reserved polypeptide according to the peptide related information and the specific polypeptide information, and finishing the position information and start-stop information of the polypeptide in the transcript, and the structural variation information and SNP information contained in the transcript to obtain antigen prediction information.

Fig. 3 illustrates a physical schematic diagram of an electronic device, as shown in fig. 3, where the electronic device may include: processor 810, communication interface (Communications Interface) 820, memory 830, and communication bus 840, wherein processor 810, communication interface 820, memory 830 accomplish communication with each other through communication bus 840. The processor 810 may invoke logic instructions in the memory 830 to perform a tumor neoantigen prediction method comprising:

s101, receiving third-generation full exon sequencing data of a sample;

s102, receiving HLA typing information of a sample;

Further, the logic instructions in the memory 830 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of performing the method of predicting a tumor neoantigen provided by the methods described above, the method comprising:

s101, receiving third-generation full exon sequencing data of a sample;

s102, receiving HLA typing information of a sample;

In yet another aspect, the present invention provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the method of predicting a tumor neoantigen provided by the above methods, the method comprising:

s101, receiving third-generation full exon sequencing data of a sample;

s102, receiving HLA typing information of a sample;

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A tumor neoantigen prediction method based on third generation sequencing data, comprising:

s1, respectively carrying out steps S101-S105 on a tumor sample and a control sample to obtain SNP information of the tumor sample and SNP information of the control sample, wherein the tumor sample and the control sample are collectively called as a sample in the steps S101-S105;

s101, receiving third-generation full exon sequencing data of a sample;

s102, receiving HLA typing information of a sample;

s3, carrying out NetMHCpan analysis on the tumor sample according to the specific polypeptide information and HLA typing information of the tumor sample so as to predict tumor neoantigens;

s4, carrying out information filtering, backtracking and arrangement according to the NetMHCpan analysis result to obtain antigen prediction information;

wherein, step S2 includes:

s201, comparing SNP information of a tumor sample with SNP information of a control sample, and reserving the SNP information of the systemic;

s205, acquiring transcript base sequence information for transcript collection according to the reference information;

s211, filtering the polypeptide information which can be translated according to the reference information to obtain specific polypeptide information;

and, step S4 includes:

according to NetMHCpan analysis result information, retaining peptide related information of which EL rank is less than or equal to 2;

and backtracking the reserved polypeptide according to the peptide related information and the specific polypeptide information, and finishing the position information and start-stop information of the polypeptide in the transcript, and the structural variation information and SNP information contained in the transcript to obtain antigen prediction information.

2. The method of claim 1, wherein step S105 comprises:

3. The method of claim 1 or 2, wherein step S103 uses TAGET software to map and annotate the samples.

4. The method of claim 1 or 2, wherein step S104 uses TAGET-sv software to perform structural mutation analysis on the sample.

5. A tumor neoantigen prediction system based on third generation sequencing data, comprising:

s101, receiving third-generation full exon sequencing data of a sample;

s102, receiving HLA typing information of a sample;

a tumor neoantigen prediction module for: according to the specific polypeptide information and HLA typing information of the tumor sample, carrying out NetMHCpan analysis on the tumor sample so as to predict tumor neoantigens;

the prediction result arrangement module is used for: according to the NetMHCpan analysis result, information filtering, backtracking and finishing are carried out to obtain antigen prediction information;

wherein, the specific polypeptide information obtaining module comprises:

specific polypeptide information is provided as a sub-module for: filtering the translatable polypeptide information to obtain specific polypeptide information;

and, the prediction result sorting module includes:

6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the tumor neoantigen prediction method of any one of claims 1 to 4 when the program is executed.

7. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the tumor neoantigen prediction method according to any one of claims 1 to 4.