CN111627497B - Method for extracting immunotherapeutic new antigen based on tumor specific transcription region assembled by new transcripts and application - Google Patents

Method for extracting immunotherapeutic new antigen based on tumor specific transcription region assembled by new transcripts and application Download PDF

Info

Publication number
CN111627497B
CN111627497B CN202010426721.4A CN202010426721A CN111627497B CN 111627497 B CN111627497 B CN 111627497B CN 202010426721 A CN202010426721 A CN 202010426721A CN 111627497 B CN111627497 B CN 111627497B
Authority
CN
China
Prior art keywords
tumor
transcripts
protein
new
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010426721.4A
Other languages
Chinese (zh)
Other versions
CN111627497A (en
Inventor
万季
刘鹏
夏迪
潘有东
王奕
宋麒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Neocura Biotechnology Corp
Original Assignee
Shenzhen Neocura Biotechnology Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Neocura Biotechnology Corp filed Critical Shenzhen Neocura Biotechnology Corp
Priority to CN202010426721.4A priority Critical patent/CN111627497B/en
Publication of CN111627497A publication Critical patent/CN111627497A/en
Application granted granted Critical
Publication of CN111627497B publication Critical patent/CN111627497B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B35/00ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Biochemistry (AREA)
  • Library & Information Science (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
  • Peptides Or Proteins (AREA)

Abstract

The invention discloses a method for extracting an immunotherapeutic new antigen based on a tumor specific transcription region assembled by new transcripts and application thereof. The method comprises the following steps: s01, transcriptome deep sequencing data comparison; s02, transcript assembly; s03, filtering transcripts; s04, predicting a translation initiation codon; s05, translating transcripts; s06, obtaining a tumor specific full-length new transcript protein sequence; s07, obtaining a new transcript protein sequence with a tumor specific partial sequence difference; s08, combining protein fragments; s09, dividing protein fragments; s10, genotyping human leukocyte antigen; s11, predicting peptide fragment affinity; and optionally, S12, mass spectrometry validation. The tumor neoantigen discovered by the method of the invention is not limited to the annotated coding region, and more neoantigens can be discovered; the high-expression transcripts from non-mutation have certain universality in different tumor types; the mass spectrum experiment proves that the immune response is generated with higher probability.

Description

Method for extracting immunotherapeutic new antigen based on tumor specific transcription region assembled by new transcripts and application
Technical Field
The invention relates to the field of tumor immunotherapy, in particular to a method for extracting an immunotherapeutic new antigen based on a tumor specific transcription region assembled by new transcripts and application thereof.
Background
The tumor immunotherapy method using the new antigen vaccine has the characteristics of obvious therapeutic effect, wide application range of cancer species, small toxic and side effects and the like, and becomes an important member of the immunotherapy family. The effect of the treatment method is severely dependent on the selection of the neoantigen polypeptide, and further the selection of the neoantigen polypeptide is severely dependent on data and predictive algorithms. In theory, the generation of neoantigens may come from a variety of sources, while in actual clinical practice, only the DNA point mutations and indels are focused on neoantigens. Although neoantigen vaccines based on DNA point mutations and indels exhibit good clinical results, studies have shown that neoantigens generated based on other biological pathway sources may have a more immunogenic response. While for some malignancies with less mutation burden, selection of tumor neoantigen vaccine formulations is limited based on only a few sources due to insufficient predicted neoantigen data. Therefore, the development of more new antigen sources has important significance for research and clinical application of the new antigens.
Disclosure of Invention
In order to solve the problems of obtaining tumor neoantigens, the invention fully considers the fact that a large number of new transcripts exist in tumor genome, and develops a set of bioinformatics method for obtaining tumor specific neoantigens.
In a first aspect, the present invention provides a method for extracting immunotherapeutic neoantigens based on tumor-specific transcribed regions of neotranscript assembly, comprising the steps of:
s01, transcriptome deep sequencing data comparison;
s02, transcript assembly;
s03, filtering transcripts;
s04, predicting a translation initiation codon;
s05, translating transcripts;
s06, obtaining a tumor specific full-length new transcript protein sequence;
s07, obtaining a new transcript protein sequence with a tumor specific partial sequence difference;
s08, combining protein fragments;
s09, dividing protein fragments;
s10, genotyping human leukocyte antigen;
s11, predicting peptide fragment affinity;
and optionally, S12, mass spectrometry validation.
In some embodiments of the invention, S01 comprises the steps of:
s101, acquiring full transcriptome depth sequencing data containing coding RNA and non-coding RNA of a tumor sample and a normal control sample;
s102, filtering full transcriptome depth sequencing data of tumor samples and normal control samples;
s103, constructing an index for a reference genome;
s104, comparing the filtered data obtained in the S12 with the reference genome obtained in the S13;
preferably, in S101, adopting a ribosome-removing chain specific library construction method and a small fragment enrichment screening library construction method for library construction sequencing;
preferably, in S101, the sample data includes a plurality of overlapping or partially overlapping short read sequences, and the sequencing data of the tumor sample and the normal control sample are not less than 30G;
preferably, in S102, short read sequences are removed wherein the average base mass is below 20 or comprise sequencing primer adaptors.
In some embodiments of the invention, in S02, full transcriptome deep sequencing data alignment results that have mapped short read sequences to a reference genome are assembled into transcripts.
In some embodiments of the invention, in S03, known human full-length transcripts and repetitive sequences present in the assembled transcripts are removed.
In some embodiments of the invention, S04 comprises the steps of:
s401, calculating the new transcript coding capacity of the tumor sample and the normal control sample, and dividing the new transcript coding capacity into protein coding transcripts and non-protein coding transcripts according to the intensity of the coding capacity;
s402, predicting translation initiation codons of protein coding transcripts in tumor samples and normal control samples.
In some embodiments of the invention, in S05, the novel transcripts with coding capacity in tumor samples and normal control samples are translated according to predicted translational start codons to yield protein sequences.
In some embodiments of the invention, in S06, comparing the translated protein sequences of the tumor sample with the translated protein sequences of the normal control sample, traversing the protein sequences of the tumor sample to obtain a unique protein sequence of the tumor sample that cannot be searched in the normal control.
In some embodiments of the invention, S07 comprises the steps of:
s701, filtering a tumor sample specific protein;
s702, comparing all the filtered new transcription proteins with all the transcription protein sequences corresponding to the normal control sample, wherein the sequence inconsistent with the normal control sample in the comparison result is defined as a new transcription protein sequence with a tumor specific partial sequence difference.
In some embodiments of the invention, in S08, the tumor specific full-length novel transcript protein sequence obtained in S06 and the tumor specific partial sequence difference novel transcript protein sequence obtained in S07 are combined and sequences less than 9 in length are filtered.
In some embodiments of the invention, in S09, the protein sequence obtained in S08 is split, preferably into k-mer residue peptide fragments of 9 to 12 amino acids in length.
In some embodiments of the invention, in S11, the affinity of the k-mer residue peptide fragment after S09 cleavage to the HLA molecule is predicted and a candidate neoantigen having an affinity greater than a threshold is selected.
In some embodiments of the present invention, in S12, mass spectrometry is performed on a tumor sample, the generated data is imported into MaxQuant software, candidate neoantigens are added as a search library, and finally the obtained peptide fragments can be successfully identified as neoantigens.
According to an aspect of the present invention, there is provided a computer-implemented bioinformatics method of exploring a tumor neoantigen based on a result of assembling a new transcript, comprising the steps performed by a processor of: acquiring full transcriptome sequencing data of a tumor sample and a normal control sample; assembling transcripts of tumor samples and normal control samples; obtaining new transcripts of the tumor sample and the normal control sample; predicting new transcript encoding protein sequences of tumor samples and normal control samples; obtaining a new transcript protein sequence and a protein fragment sequence specific to a tumor sample; calculating the binding affinity of the specific proteins and protein fragments of the tumor sample and the MHC molecules to obtain candidate tumor neoantigens; screening and verifying candidate new antigens based on mass spectrum data.
Preferably, the sample is a fresh tissue sample; alternatively, paraffin tissue samples may be selected.
A second aspect of the invention provides the use of the method of the first aspect for the preparation of a medicament or medical device for extracting immunotherapeutic neoantigens.
Compared with the prior art, the scheme of the invention has the following advantages:
1. the tumour neoantigens found by the protocol of the invention are not limited in origin to the annotated coding regions, and more neoantigens can be found. The current common method mainly adopts a target region capture sequencing or exome sequencing treatment process, and obtains a new antigen through affinity prediction after recognizing somatic cell mutation. This essentially localizes the analysis region to a known coding region on the genome.
2. The tumor new antigen obtained by the invention is derived from non-mutated high-expression transcripts (such as endogenous reverse transcription), so that the tumor new antigen has certain universality in different tumor types.
3. The mass spectrum experiment proves that the obtained tumor neoantigen has the advantages that the obtained peptide fragment is expressed in real existence and has higher probability of generating immune response.
Drawings
FIG. 1 is a flow chart of the extraction of immunotherapeutic neoantigens according to one embodiment of the invention.
Detailed Description
Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention.
In order that those skilled in the art will better understand the present invention, embodiments of the present invention will be described in detail with reference to the accompanying drawings. The terms "first," "second," "again," "then," "next," and the like as used in the specific embodiments herein are not intended to be limiting of the order.
FIG. 1 is a flow chart of an embodiment of the invention for extracting immunotherapeutic neoantigens, the method comprising the following steps performed by a processor:
s01, transcriptome deep sequencing data alignment
Specifically, firstly, a ribosome-removing chain specific library construction method and a small fragment enrichment screening library construction method are adopted for library construction and sequencing, full transcriptome depth sequencing data comprising coding RNA and non-coding RNA of a tumor sample and a normal control sample are obtained, the sample data comprise a plurality of overlapping or partially overlapping short reading sequences, the difference of the overlapping degree is related to the depth of sequencing, and the tumor sample and the normal control sample are required to respectively obtain the sequencing data of not less than 30G.
And secondly, filtering the full transcriptome deep sequencing data of the tumor sample and the normal control sample, and removing the short-reading sequence with average base quality lower than 20 or containing sequencing primer joints, so that the accuracy and the efficiency of subsequent analysis can be improved.
Again, indexing the reference genome, which refers to the base sequence data on each chromosome of humans, typically in FASTA format, which can be downloaded via UCSC using version hg38/GRCh38; the filtered data is then aligned to the reference genome for sequence alignment to locate the short read sequence above the reference genome. Specifically, the software HISAT2 can be used for sequence alignment of the data after filtration of tumor samples and normal control samples.
S02, transcript Assembly
The whole transcriptome deep sequencing data is aligned, the short reading sequence is positioned to a reference genome, and the aligned result can be assembled into transcripts by relying on the reference gene and considering de novo assembly. Specifically, the tumor samples and normal control samples can be assembled into transcripts using the software StringTie.
S03, transcript filtration
The assembled transcripts have a large number of known human full-length transcripts, and the transcripts are expressed in normal tissues, so that the removal of the transcripts is beneficial to the improvement of the subsequent analysis speed. Specifically, known transcripts in tumor samples and normal control samples were filtered according to transcript numbering in the StringTie assembly results.
Second, about 55% of the repeats in the human genome, because of the large number of simple repeats, often the short reads align to incorrect locations on the genome when aligned to the reference genome, thereby affecting transcript assembly based on the alignment, and therefore require removal of the repeats. Specifically, the transcript sequences were evaluated using the software repoatmask, and transcripts containing repeat sequences were then removed from tumor samples and normal control samples.
S04, predictive translation initiation codon
Specifically, firstly, calculating new transcript coding capacity of a tumor sample and a normal control sample by using software CPAT, and dividing the new transcript coding capacity into protein coding transcripts and non-protein coding transcripts according to the intensity of the coding capacity; second, the translation initiation codons of the protein-encoding transcripts in tumor samples and normal control samples were predicted.
S05, translation transcripts
Specifically, the novel transcripts with coding capacity in tumor samples and normal control samples are translated according to predicted translation initiation codons by using autonomously developed software to obtain protein sequences. Similarly, the protein sequence may be obtained by translating the new transcript using the software ORFfinder or gelator.
S06, obtaining the tumor specific full-length new transcript protein sequence
Tumor specific protein sequences refer to proteins that are only translationally expressed in tumor samples and not expressed in normal control samples. Specifically, the protein sequences obtained by comparing the tumor sample with the protein sequences obtained by translating the normal control sample by using the autonomously developed software are traversed through the tumor sample protein sequences, and the specific protein sequences of the tumor sample which cannot be searched in the normal control are obtained.
S07, obtaining a new transcript protein sequence with a tumor specific partial sequence difference
In addition to the full-length new transcript protein sequence obtained in S06, there is also a new transcript in the tumor sample that has a partial sequence difference from the normal control sample transcript. Such new transcripts may be due to different cleavage patterns, insertional deletion variants, etc. The translation results are generally expressed in that a part of the protein sequence is only present in the tumor sample, and such a part of the differential protein sequence is also likely to form a neoantigen. Specifically, the specific proteins of the tumor sample are filtered first, and then all the new transcript proteins obtained by filtration are compared with all the transcript protein sequences corresponding to the normal control sample by using the software developed independently. Sequences in the alignment that are inconsistent with the normal control sample will be defined as tumor specific partial sequence differences in the new transcript protein sequence.
S08, pooled protein fragments
Specifically, the full-length novel transcript protein sequence specific to the tumor obtained in the step S06 and the novel transcript protein sequence with the partial sequence difference specific to the tumor obtained in the step S07 are combined, and the sequences with the length less than 9 are filtered.
S09, protein fragment segmentation
Specifically, the protein sequence obtained in the previous step is divided into k-mers with smaller lengths. k-mers refer to all possible sub-string sets of length k comprised by a string, and for an input protein sequence, sequences of fixed length k are extracted sequentially from the first amino acid residue using a sliding window of step size 1, these sequences being k-mers. More specifically, the protein sequence obtained in S08 is split into k-mers of 9 to 12 amino acids in length using autonomously developed software.
S10, genotyping human leukocyte antigen
The human leukocyte antigen gene is a polymorphic region of a short arm of a chromosome 6 participating in immune response, is a gene complex with highest allelic polymorphism in the gene, and the coded MHC class I molecules mainly mediate the recognition and the killing of the antigen by CD8+ T cells, and class II molecules are mainly combined with CD4+ T cells, so that the immune response is started. The affinities of different HLA subtype molecules for the same polypeptide may be different, so determining the HLA subtype of a sample is a prerequisite for HLA and candidate neoantigen binding screening. Specifically, the human leukocyte antigens of the normal control samples were genotyped using the software HLA-LA.
S11, peptide fragment affinity prediction
The mutant proteins expressed by tumor cells are not expressed by normal cells, and these abnormal protein sequences are processed into short peptides by proteasomes in cells, then bound by human leukocyte antigens, presented on the cell surface, and recognized by T cells as foreign antigens. And predicting the affinity between the specific HLA subtype and the polypeptide through an algorithm, and screening out peptide fragments with strong affinity with HLA molecules. Specifically, the affinity of k-mer residue peptide fragments after S09 segmentation to HLA molecules was predicted using software NetMHCpan 4.0, selecting as candidate neoantigens with affinities greater than a threshold (typically <500 nm).
S12, mass spectrum verification
Specifically, mass spectrometry experiment analysis is carried out on a tumor sample, the generated data is imported into MaxQuant software, candidate neoantigens are added as a search library, and finally the obtained peptide fragments can be successfully identified as the neoantigens.
The specific parameters of the software used in the invention are as follows:
filtering of raw data is performed using a trimmatic, an example command of which is:
Figure BDA0002498930780000061
wherein, trimmable-0.36. Jar is a trimmable tool executable file, PE indicates double-ended sequencing, and Phred33 indicates the mass format of bases, sample_1.Fastq.gz and sample_2.Fastq.gz are input raw data, sample.clear.R1. Fq.gz, sample.unpaired.R1.Fq.gz, sample.clear.R2. Fq.gz and sample.unpaired.R2.Fq.gz are output data, ILLUNACINACIP: adapter.fa:2:30:10:8:true indicates the sequence of the cut sequencing primer, parameters are respectively followed by a linker sequence file, the allowed maximum number of mismatches, a threshold number of bases matched in the parlindrome mode, and a threshold number of bases matched in the simple mode; the leader indicates that the base at the head end is excised by a base of less than 20; trail indicates that the base with a mass of less than 20 of the base at the end of the excision is removed; MINLEN indicates the minimum sequence length.
The genome index is constructed using HISAT, first the cut sites and the exon sequences in the genome annotation file are extracted separately, and then the genome index is constructed, with example commands:
Figure BDA0002498930780000071
where hg38.fa is the human genome sequence and gencode. Exactsplicsi_sites. Py, exactextrans. Py, HISAT2-build are the software contained in the HISAT2 package.
Sequences were aligned using HISAT2, an example command of which is:
Figure BDA0002498930780000072
where hg38 represents the reference genome index that has been constructed, and the results of the alignment are ranked using the SAMtools after alignment. SAMtools view represents the view command of SAMtools, used here to make further filtering of results.
Transcript assembly using StringTie, an example command of which is:
Figure BDA0002498930780000073
wherein gencode. Section. Gtf is a human genome annotation file.
Transcripts of the repeated sequences were removed using a repeater mask, an example command of which is:
Figure BDA0002498930780000074
wherein the constructed transcript sequences are first extracted by using software bedtools and then the repeat sequences therein are marked by using a repeat mask.
The ability to encode transcripts is predicted using CPAT, an example command of which is:
Figure BDA0002498930780000075
wherein-d and-x parameters correspond to a model built for the software, -o is a prediction result file.
The transcripts are translated using autonomously developed software, an example command of which is:
Figure BDA0002498930780000081
the autonomously developed software was used to find the differential protein sequence, an example command of which is:
Figure BDA0002498930780000082
wherein-t is the protein sequence of the tumor sample, -n is the protein sequence of the normal sample, -out1 is the protein sequence expressed only in the tumor sample, -out2 is the differential partial sequence of the protein expressed in both the normal sample and the tumor sample but with a different sequence.
HLA genotyping was performed using HLA-LA, an example command of which is:
Figure BDA0002498930780000083
wherein, the map PRG_MHC_GRCh38_withIMGT indicates the group gene structure index file, which can be built by HLA-LA program itself or can be downloaded by a download page provided by the program.
Peptide fragment affinity prediction using netMHCpan 4.0, an example command is:
Figure BDA0002498930780000084
wherein-BA indicates that a classification prediction is to be made, -l indicates the peptide fragment length, -a indicates the HLA genotype, -inptype indicates that the input is the HAL genotype, -xls and-xlfile together indicate the output file.
Mass spectrum verification was performed using MaxQuant, and after mass spectrum data of the sample was imported, a digest mode was set as No digest, and Global Fasta File was set as a candidate neoantigen Fasta file.
While the preferred embodiments and examples of the present invention have been described in detail, the present invention is not limited to the above-described embodiments and examples, and various changes may be made within the knowledge of those skilled in the art without departing from the spirit of the present invention.

Claims (11)

1. A method for extracting immunotherapeutic neoantigens based on tumor specific transcribed regions of neotranscript assembly comprising the steps of:
s01, transcriptome deep sequencing data comparison;
s02, transcript assembly;
s03, filtering transcripts;
s04, predicting a translation initiation codon;
s05, translating transcripts;
s06, obtaining a tumor specific full-length new transcript protein sequence;
s07, obtaining a new transcript protein sequence with a tumor specific partial sequence difference;
s08, combining protein fragments;
s09, dividing protein fragments;
s10, genotyping human leukocyte antigen;
s11, predicting peptide fragment affinity;
s01, comprising the following steps:
s101, acquiring full transcriptome depth sequencing data containing coding RNA and non-coding RNA of a tumor sample and a normal control sample;
s102, filtering full transcriptome depth sequencing data of tumor samples and normal control samples;
s103, constructing an index for a reference genome;
s104, comparing the filtered data obtained in the S102 with the reference genome obtained in the S103;
s02, assembling complete transcriptome deep sequencing data comparison results with short reading sequences positioned to a reference genome into transcripts;
s06, comparing the protein sequences obtained by translation of the tumor sample and the normal control sample, traversing the protein sequences of the tumor sample, and obtaining a specific protein sequence of the tumor sample which cannot be searched in the normal control;
s07, the method comprises the steps of:
s701, filtering a tumor sample specific protein;
s702, comparing all the filtered new transcription proteins with all the transcription protein sequences corresponding to the normal control sample, wherein the sequence inconsistent with the normal control sample in the comparison result is defined as a new transcription protein sequence with a tumor specific partial sequence difference;
in S08, combining the tumor specific full-length novel transcript protein sequence obtained in S06 and the tumor specific partial sequence difference novel transcript protein sequence obtained in S07, and filtering the sequences with the length less than 9.
2. The method of claim 1, further comprising S12, mass spectrometry validation.
3. The method of claim 1, wherein in S101, the library is sequenced using a ribosome strand-specific library construction method and a small fragment enrichment screening library construction method.
4. The method of claim 1, wherein in S101, the sample data comprises a plurality of overlapping or partially overlapping short read sequences, and the tumor sample and the normal control sample have no less than 30G of sequencing data.
5. The method of claim 1, wherein in S102 short read sequences are removed wherein the average base mass is below 20 or comprise sequencing primer adaptors.
6. The method according to any one of claims 1 to 5, wherein in S03, known human full-length transcripts and repetitive sequences present in the assembled transcripts are removed.
7. The method according to any one of claims 1-5, characterized in that in S04, the following steps are included:
s401, calculating the new transcript coding capacity of the tumor sample and the normal control sample, and dividing the new transcript coding capacity into protein coding transcripts and non-protein coding transcripts according to the intensity of the coding capacity;
s402, predicting translation initiation codons of protein coding transcripts in tumor samples and normal control samples.
8. The method according to any one of claims 1 to 5, wherein in S05, the novel transcripts having the ability to encode in tumor samples and normal control samples are translated according to predicted translational start codons to give protein sequences.
9. The method according to any one of claims 1 to 5, wherein in S09 the protein sequence obtained in S08 is split;
and/or, in S11, predicting the affinity of the k-mer residue peptide segment after S09 segmentation and HLA molecules, and selecting the k-mer residue peptide segment with the affinity being greater than a threshold value as a candidate new antigen;
and/or S12, carrying out mass spectrometry experimental analysis on the tumor sample, importing the generated data into MaxQuant software, adding candidate neoantigens as a search library, and finally, successfully identifying the obtained peptide fragment as the neoantigen.
10. The method according to claim 9, wherein in S09 the protein sequence obtained in S08 is split into k-mer residue peptide fragments of 9 to 12 amino acids in length.
11. Use of a method according to any one of claims 1-10 for the preparation of a medicament or medical device for extracting immunotherapeutic neoantigens.
CN202010426721.4A 2020-05-19 2020-05-19 Method for extracting immunotherapeutic new antigen based on tumor specific transcription region assembled by new transcripts and application Active CN111627497B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010426721.4A CN111627497B (en) 2020-05-19 2020-05-19 Method for extracting immunotherapeutic new antigen based on tumor specific transcription region assembled by new transcripts and application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010426721.4A CN111627497B (en) 2020-05-19 2020-05-19 Method for extracting immunotherapeutic new antigen based on tumor specific transcription region assembled by new transcripts and application

Publications (2)

Publication Number Publication Date
CN111627497A CN111627497A (en) 2020-09-04
CN111627497B true CN111627497B (en) 2023-06-13

Family

ID=72259860

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010426721.4A Active CN111627497B (en) 2020-05-19 2020-05-19 Method for extracting immunotherapeutic new antigen based on tumor specific transcription region assembled by new transcripts and application

Country Status (1)

Country Link
CN (1) CN111627497B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113284556A (en) * 2021-04-29 2021-08-20 安徽农业大学 Method for mining endogenous microbiome information from animal and plant transcriptome data
CN113362896A (en) * 2021-06-23 2021-09-07 深圳市新合生物医疗科技有限公司 Tumor neoantigen prediction method based on HPV integration
CN115240773B (en) * 2022-09-06 2023-07-28 深圳新合睿恩生物医疗科技有限公司 New antigen identification method and device, equipment and medium of tumor specific circular RNA
CN117198409A (en) * 2023-09-15 2023-12-08 云南省农业科学院农业环境资源研究所 microRNA prediction method and system based on transcriptome data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491689A (en) * 2018-02-01 2018-09-04 杭州纽安津生物科技有限公司 Tumour neoantigen identification method based on transcript profile
WO2018183544A1 (en) * 2017-03-31 2018-10-04 Dana-Farber Cancer Institute, Inc. Method for identification of retained intron tumor neoantigens from patient transcriptome
CN109801678A (en) * 2019-01-25 2019-05-24 上海鲸舟基因科技有限公司 Based on the tumour antigen prediction technique of full transcript profile and its application
CN110534156A (en) * 2019-09-02 2019-12-03 深圳市新合生物医疗科技有限公司 A kind of method and system for extracting immunization therapy neoantigen

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018183544A1 (en) * 2017-03-31 2018-10-04 Dana-Farber Cancer Institute, Inc. Method for identification of retained intron tumor neoantigens from patient transcriptome
CN108491689A (en) * 2018-02-01 2018-09-04 杭州纽安津生物科技有限公司 Tumour neoantigen identification method based on transcript profile
CN109801678A (en) * 2019-01-25 2019-05-24 上海鲸舟基因科技有限公司 Based on the tumour antigen prediction technique of full transcript profile and its application
CN110534156A (en) * 2019-09-02 2019-12-03 深圳市新合生物医疗科技有限公司 A kind of method and system for extracting immunization therapy neoantigen

Also Published As

Publication number Publication date
CN111627497A (en) 2020-09-04

Similar Documents

Publication Publication Date Title
CN111627497B (en) Method for extracting immunotherapeutic new antigen based on tumor specific transcription region assembled by new transcripts and application
CN109801678B (en) Tumor antigen prediction method based on complete transcriptome and application thereof
US20200243164A1 (en) Systems and methods for patient-specific identification of neoantigens by de novo peptide sequencing for personalized immunotherapy
El-Metwally et al. Next generation sequencing technologies and challenges in sequence assembly
CN110600077B (en) Prediction method of tumor neoantigen and application thereof
EP3323070A1 (en) Neoantigen analysis
CN106599614B (en) High-throughput sequencing data processing and analysis flow control method and system
WO2012034251A2 (en) Methods and systems for detecting genomic structure variations
CN111415707B (en) Prediction method of clinical individuation tumor neoantigen
US20200176076A1 (en) Scansoft: a method for the detection of genomic deletions and duplications in massive parallel sequencing data
CN115747327A (en) Novel antigen prediction methods involving frameshift mutations
CN110534156B (en) Method and system for extracting immunotherapy new antigen
CN116864007B (en) Analysis method and system for gene detection high-throughput sequencing data
Kumar et al. FusionNeoAntigen: a resource of fusion gene-specific neoantigens
WO2024051097A1 (en) Neoantigen identification method and device for tumor-specific circular rnas, apparatus and medium
CN116779028A (en) Method, device and computer readable storage medium for predicting neoepitope based on structural variation detection
CN114882951B (en) Method and device for detecting MHC II tumor neoantigen based on next generation sequencing data
JP2015228819A (en) Dna typing method for hla gene, and computer program used for data analysis of the same method
CN114464256A (en) Method, computing device and computer storage medium for detecting tumor neoantigen burden
KR20200125549A (en) A Method for automatic analysis of Chromatin-immunoprecipitation-Sequencing data
CN111599410B (en) Method for extracting microsatellite unstable immunotherapy new antigen by integrating multiple sets of chemical data and application
CN113362896A (en) Tumor neoantigen prediction method based on HPV integration
KR101977976B1 (en) Method for increasing read data analysis accuracy in amplicon based NGS by using primer remover
CN116083587B (en) Method and device for predicting tumor neoantigen based on abnormal variable shear
Hung et al. Genetic diversity and structural complexity of the killer-cell immunoglobulin-like receptor gene complex: A comprehensive analysis using human pangenome assemblies

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant