CN111192632A - Method and device for extracting gene fusion immunotherapy novel antigen by integrating deep sequencing data of DNA and RNA - Google Patents

Method and device for extracting gene fusion immunotherapy novel antigen by integrating deep sequencing data of DNA and RNA Download PDF

Info

Publication number
CN111192632A
CN111192632A CN201911293011.2A CN201911293011A CN111192632A CN 111192632 A CN111192632 A CN 111192632A CN 201911293011 A CN201911293011 A CN 201911293011A CN 111192632 A CN111192632 A CN 111192632A
Authority
CN
China
Prior art keywords
gene fusion
sample
gene
sequence
obtaining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911293011.2A
Other languages
Chinese (zh)
Other versions
CN111192632B (en
Inventor
万季
潘有东
汪健
徐韵婉
宋麒
刘鹏
夏迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Neocura Biotechnology Corp
Original Assignee
Shenzhen Neocura Biotechnology Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Neocura Biotechnology Corp filed Critical Shenzhen Neocura Biotechnology Corp
Priority to CN201911293011.2A priority Critical patent/CN111192632B/en
Publication of CN111192632A publication Critical patent/CN111192632A/en
Application granted granted Critical
Publication of CN111192632B publication Critical patent/CN111192632B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search

Abstract

The invention discloses a method and a device for extracting gene fusion immunotherapy new antigen by integrating deep sequencing data of DNA and RNA. The method comprises the following steps: s10, obtaining a genome gene fusion sequence of the sample; s20, obtaining a transcriptome gene fusion sequence S30 of the sample, and constructing a gene fusion proteome; and S40, obtaining the sample new antigen. The tumor specific new antigen discovered by the scheme of the invention is all from gene fusion, the screening range of the new antigen is expanded, and an ammunition library of an immunotherapy method based on the new antigen is enriched. By analyzing and integrating the sequencing data of the whole exome and the sequencing data of the transcriptome of the tumor sample, the gene fusion event in the tumor tissue is comprehensively detected, the false positive rate of the new antigen generated by fusion is reduced, the effectiveness of the new antigen vaccine is improved, and the method has important significance for improving the clinical immunotherapy effect.

Description

Method and device for extracting gene fusion immunotherapy novel antigen by integrating deep sequencing data of DNA and RNA
Technical Field
The invention relates to the field of tumor immunotherapy, in particular to a method and a device for extracting a gene fusion immunotherapy new antigen by integrating deep sequencing data of DNA and RNA.
Background
Therapeutic concepts and methods for malignant tumors have been extensively developed over the past few decades. The traditional tumor treatment methods comprise operations, radiotherapy and targeted therapy based on mutation typing, but the treatment methods have certain limitations in the aspects of toxic and side effects, drug resistance and the like. In recent years, the concept of immunotherapy by activating the immune system to inhibit and kill tumor cells has been a new breakthrough. Existing immunotherapeutic approaches can be divided into three categories according to their mechanism of action: (1) an immune checkpoint inhibitor that activates the immune system by inhibiting the suppressive pathway of the immune system, (2) adoptive cellular immunotherapy that is modified against T lymphocytes so that they recognize antigens, (3) a method of immunotherapy by identifying tumor tissue-specific antigens and preparing polypeptide and mRNA vaccines based on the predicted antigens for reinfusion into a new antigen vaccine. Compared with other two immunotherapy methods, the new antigen vaccine immunotherapy method has the characteristics of no limitation to specific cancer species and small toxic and side effects. Prediction of neoantigens relies on whole exome sequencing and transcriptome sequencing of DNA and RNA, respectively, of tissue samples to predict mutant polypeptides. Existing procedures generally consider mutant polypeptides resulting from point mutations and small indels in DNA. In addition, gene fusions are also an important source of mutant polypeptides. However, since genetic fusion identification based on a single data source (DNA or RNA) usually has a high false positive, prediction of neoantigens produced by fusion requires more extensive data and a rigorous screening procedure to ensure high efficacy of the neoantigen vaccine. Therefore, integrating multiple data to extract the new antigen generated by gene fusion has important significance for expanding the screening range of the new antigen and improving the clinical application effect.
Disclosure of Invention
Aiming at the problems, the invention comprehensively considers the possibility of generating mutant polypeptide by the fusion, transcription and translation of tumor specific genes, and develops a bioinformatics method for acquiring tumor specific new antigen.
The first aspect of the invention provides a method for extracting gene fusion immunotherapy neoantigen by integrating deep sequencing data of DNA and RNA, which comprises the following steps:
s10, obtaining genome gene fusion of the sample;
s20, obtaining a transcriptome gene fusion sequence of the sample;
s30, constructing a gene fusion protein group;
and S40, obtaining the sample new antigen.
In some embodiments of the invention, the genomic gene fusion sequence of the obtained sample is based on whole exome sequencing.
In some embodiments of the invention, the transcriptome gene fusion sequence of the obtained sample is based on transcriptome sequencing.
In some embodiments of the invention, the obtaining of the genomic gene fusion sequence of the sample comprises the steps of:
s101, detecting the structural variation of the genome of the tumor sample;
s102, screening gene fusion events;
s103, obtaining a genome gene fusion sequence.
In some embodiments of the invention, the obtaining of the transcriptome gene fusion sequence of the sample comprises the steps of:
s201, detecting a tumor sample transcriptome gene fusion event;
s202, obtaining a transcriptome gene fusion sequence.
In some embodiments of the present invention, the step S30 includes reading frame translation of the obtained genomic gene fusion sequence and the transcriptome gene fusion sequence, respectively, to obtain a gene fusion protein sequence, i.e., a gene fusion proteome;
preferably, when the reading frame is translated to the breakpoint position where fusion occurs, it is determined whether frame shift translation occurs, if frame shift translation occurs, all protein sequences after the breakpoint position are the sources of potential neoantigen peptides, and if frame shift translation does not occur, only the sequences near the breakpoint can generate the neoantigen peptides.
In some embodiments of the present invention, the S30 step generates a peptide sequence according to a specific length according to requirements;
preferably, the default peptide fragment is 9 to 12 amino acids in length.
In some embodiments of the present invention, the step of S40 includes the steps of:
s401, identifying human leukocyte antigen molecule (HLA) typing;
s402, predicting the peptide fragment affinity;
and S403, screening a sample neoantigen based on the peptide fragment integration information.
In some embodiments of the invention, the sample is a tumor tissue, preferably a human tumor tissue.
In some embodiments of the present invention, in the step S102, a gene fusion event in which the breakpoint position is located inside a gene rather than in an intergenic region is selected.
In some embodiments of the present invention, in the step S103, gene sequences are extracted and spliced according to positions of upstream and downstream gene breakpoints involved in gene fusion;
preferably, the method comprises the following steps:
s1031, determining breakpoint positions of upstream and downstream genes;
s1032, judging whether the breakpoint occurs in an exon region or an intron region;
s1033, judging which transcripts in the gene are affected by the breakpoint;
s1034, corresponding each affected upstream gene transcript to each affected downstream gene transcript, and obtaining a complete gene fusion transcript sequence according to the conventional transcription rule.
The second aspect of the present invention provides an apparatus for extracting gene fusion immunotherapy neoantigen by integrating deep sequencing data of DNA and RNA, comprising:
(1) a first unit for obtaining a genomic gene fusion sequence of a sample;
(2) a second unit for obtaining a transcriptome gene fusion sequence of the sample;
(3) a third unit for constructing a specific gene fusion proteome;
(4) and the fourth unit is used for obtaining the sample new antigen.
The invention has the beneficial effects that:
compared with the prior art, the scheme of the invention has the following advantages:
1. from the source, the tumor specific new antigens discovered by the scheme of the invention all come from gene fusion, and the gene fusion events are widely existed in different types of tumors; the current method is mainly to identify somatic variation to obtain new antigen. Therefore, the invention expands the screening range of the new antigen and enriches the ammunition bank of the immunotherapy method based on the new antigen.
2. The invention comprehensively detects the gene fusion event in the tumor tissue by analyzing and integrating the sequencing data of the whole exome of the tumor sample and the sequencing data of the transcriptome, reduces the false positive rate of the new antigen generated by fusion, further improves the effectiveness of the new antigen vaccine and has important significance for improving the clinical immunotherapy effect.
Drawings
FIG. 1 is a flow chart of a method for extracting gene fusion immunotherapy neoantigens by integrating DNA and RNA deep sequencing data according to one embodiment of the present invention;
FIG. 2 is a schematic diagram of a gene fusion sequence in a method for extracting a gene fusion immunotherapy neoantigen by integrating deep sequencing data of DNA and RNA according to an embodiment of the present invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention.
Before the present embodiments are further described, it is to be understood that the scope of the invention is not limited to the particular embodiments described below; it is also to be understood that the terminology used in the examples is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the present invention.
In order that those skilled in the art will better understand the present invention, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. The expressions "first," "second," "again," "then," "next," and the like as used in the specific embodiments herein are not intended to limit the order of precedence.
As shown in FIG. 1, the left part is a flow chart of the method for obtaining the genomic gene fusion sequence of tumor tissue based on whole exome sequencing, which is implemented by computer. As shown in fig. 1, the method comprises the following steps performed by a processor:
s101, detecting the genome structural variation of the tumor sample.
First, using genome alignment software bwa to align the original data of genome sequencing to the human reference genome; then, the bam file generated in the previous step is used as input, and the mutation detection software lumpy-sv is used for detecting the structural mutation of the genome.
S102, screening gene fusion events.
Specifically, the SVTyper software was used to type structural variations, screen out the gene fusion events therein, and then use the package pygeno to annotate the fusion genes, selecting gene fusion events with breakpoint locations within the gene rather than in the intergenic regions.
S103, obtaining a genome gene fusion sequence.
And respectively extracting gene sequences according to the positions of upstream and downstream gene breakpoints participating in gene fusion and splicing. Specifically, after the breakpoint positions of upstream and downstream genes are determined, it is first determined whether the breakpoint occurs in an exon (exon) region or an intron (intron) region, and since many genes include a plurality of transcripts, it is also necessary to determine which transcripts in the gene are affected by the breakpoint, and then each affected upstream gene transcript corresponds to a downstream gene transcript one by one, and according to a conventional transcription rule, a complete and complete gene fusion transcript sequence is obtained, so as to facilitate the subsequent reading frame translation process. These gene fusion transcript sequences are collectively referred to herein as genomic gene fusion sequences.
The middle part of FIG. 1 is a flow chart of the computer-implemented transcriptome-based sequencing-based acquisition of tumor tissue transcriptome gene fusion sequences according to an embodiment of the present invention. As shown in fig. 1, the method comprises the following steps performed by a processor:
s201, detecting the tumor sample transcriptome gene fusion event.
Firstly, comparing original transcriptome sequencing data with a human reference genome through sequence comparison software STAR; gene fusions were then detected using the arriba software.
S202, obtaining a transcriptome gene fusion sequence.
And respectively extracting gene sequences according to the positions of upstream and downstream gene breakpoints participating in gene fusion and splicing. It should be noted that since the data sequenced in RNAseq is the transcript sequence, not the genomic sequence, the position of the gene fusion breakpoint determined is not the position in the genome where gene fusion actually occurs, but the boundary of the mature mRNA sequence after transcription occurs. Therefore, when generating the gene fusion sequence, the sequences at the upstream and downstream gene breakpoint are directly spliced together, and the transcription rule is not required to be considered. The rest of the processing is similar to the step S103, and the transcripts influenced by the breakpoint positions are determined to obtain complete gene fusion transcript sequences. These gene fusion transcript sequences are collectively referred to herein as transcriptome gene fusion sequences.
The lower left part of FIG. 1 is a computer-implemented process diagram for constructing a tumor-specific gene fusion proteome according to an embodiment of the present invention. As shown in fig. 1, the method comprises the following steps performed by a processor:
s301, constructing a tumor specific gene fusion protein group.
And (3) respectively carrying out reading frame translation on the genome gene fusion sequence and the transcriptome gene fusion sequence obtained in the steps S103 and S202 to obtain a gene fusion protein sequence, namely the tumor specific gene fusion protein group. In order to obtain a peptide segment sequence which does not exist in normal cells of a human body, whether frame shift translation occurs or not needs to be judged when a reading frame is translated to a breakpoint position where fusion occurs, if frame shift translation occurs, all protein sequences behind the breakpoint position are sources of potential new antigen peptides, and if frame shift translation does not occur, only the sequences near the breakpoint can generate the new antigen peptides. Finally, peptide sequences are generated according to specific lengths as required. In the present invention, the peptide fragment length set by default is 9 to 12 amino acids.
FIG. 1 is a schematic diagram of a process for obtaining a novel antigen generated by tumor-specific gene fusion according to another embodiment of the present invention. As shown in fig. 1, the method comprises the following steps performed by a processor:
s401, Human Leukocyte Antigen (HLA) molecular typing.
And calculating the human leukocyte antigen molecular typing by utilizing leukocyte antigen molecular typing software HLA-LA.
S402, peptide fragment affinity prediction.
The affinity prediction of the tumor specific gene fusion protein group is carried out by using the software netMHCpan-4.0 and the molecular typing result of Human Leukocyte Antigen (HLA).
S403, integrating and screening the tumor specific gene fusion new antigen.
Integrating the peptide fragment information, specifically, defining the source of each candidate peptide fragment, including determining the upstream and downstream genes involved in the fusion and the corresponding transcript numbers, whether the gene fusion event is from DNA sequencing or RNA sequencing, and annotating the affinity of the peptide fragment with the HLA molecular type identified in step S401, the expression level of the fusion gene in RNA sequencing, the allele frequency corresponding to the gene fusion event in DNA sequencing, the specific position of the peptide fragment in the fusion protein sequence, and the like. In the screening stage, the candidate peptide fragment is compared with the human normal protein group, and the sequence existing in the normal protein group is filtered out; then, different indexes are used for carrying out sequencing screening on the candidate new antigens by corresponding weights, and the final tumor specific gene fusion new antigen is obtained. Specific indexes comprise the affinity of the peptide segment and HLA, the expression quantity of fusion genes, allelfequequency of gene fusion and the physicochemical property of the peptide segment.
In some embodiments, the specific parameters of the software used in the present invention are as follows:
the genome sequencing data was aligned using bwa, with example commands being:
bwa mem\
-R‘@RG\tID:sample\tLB:library\tSM:sample’\
-t 20\
-M bwa_index\
sample_1.DNA.fq.gz sample_2.DNA.fq.gz
wherein, -R designates the alignment header file, -t designates the number of run-lines, -M designates the index file used, sample _1.DNA. fq. gz, sample _2.DNA. fq. gz are the input sequencing raw data.
Structural variations are detected using lumpy-sv, and the software runs, requiring some intermediate files to be generated with the commands it provides. An example command for this is:
first, the discordant reads alignment (sample. disc. bam) and softclip reads alignment (sample. pitch. bam) in the bam file are extracted using the lumpy _ filter command.
lumpy_filter sample.bam sample.splt.bam sample.disc.bam
Then, the mean and standard deviation of insert length of the sequencing data were calculated.
samtools view sample.bam|python paired_distro.py\
-r readlen\
-X 4\
-N 10000\
-o sample.lib1.histo
Where sampools view sample.bam specifies reading the bam file as a standard input to a paired _ control.py script, which is a script provided by lumpy-sv software, -r specifies the sequencing fragment length, -X specifies a threshold for the standard deviation, -N specifies the number of lines the program reads data from the standard input, -o specifies the output file name.
Finally, the lumpy command is used to detect structural variations.
lumpy\
-mw 4\
-tt 0\
-pe\
id:sample,bam_file:sample.disc.bam,histo_file:sample.lib1.histo,mean:500,stdev:50,read_length:readlen,discordant_z:5,back_distance:10,weight:1,min_mapping_threshold:20\
-sr\
id:sample,bam_file:sample.splt.bam,back_distance:10,weight:1,min_mapping_threshold:20\
>sample.vcf
Wherein, -mw indicates a minimum weight for each structural variant event, -tt indicates a threshold size, -pe indicates a series of parameters for processing a discordant reads bam file, -sr indicates a series of parameters for processing a softclip reads bam file, and sample. Specifically, id represents a sample name, bam _ file represents a corresponding bam file, custom _ file represents a recording insert length distribution file, mean represents an insert length average value, stdev represents an insert length standard deviation, read _ length represents a sequencing length, recordant _ z represents a standard score value, back _ distance represents a base number of extension of a structural variation site, weight represents a sample weight, and min _ mapping _ threshold represents a minimum comparison quality value.
Structural variants were typed using svtype, example commands are:
svtyper\
-i sample.vcf\
-B sample.bam\
-o sample.svtyper.vcf
wherein, -i indicates the structural variation result generated by lumpy-sv in the previous step, -B indicates bwa alignment file, and-o indicates the file name of the output result.
Compiling a script, processing the output of the SVTyper, screening out a gene fusion event, and annotating the gene fusion event by using a program package pygeno to obtain the position of the gene fusion in the genome. Specifically, it is inferred which of the two genes the upstream and downstream breakpoints of gene fusion are located in. For the fusion gene with the breakpoint in the intergenic region, the specific transcription process cannot be accurately deduced, so that no consideration is made. The functions to be imported in the script are:
from pyGeno.Transcript import Transcript
from pyGeno.Genome import Genome
from pyGeno.Gene import Gene
from pyGeno.Exon import Exon
from pyGeno.Chromosome import Chromosome
it is readily known that the gene fusion sequence is composed of two parts, the former part, or 5 '-terminal sequence, is a partial sequence derived from a certain gene (upstream gene), and the latter part, i.e., 3' -terminal sequence, is a partial sequence derived from another gene (downstream gene). In order to obtain a genome gene fusion sequence, a script is compiled to respectively extract partial sequences of upstream and downstream genes and the partial sequences are spliced together. Specifically, it is first determined whether the upstream and downstream break points are located in exon regions or intron regions of the gene, and then the treatment is performed in four situations according to the region where the break points are located, namely exon-exon, intron-intron, exon-intron, and intron-exon, as shown in fig. 2. Wherein exon-exon means that the upstream and downstream breakpoints are located in respective exon regions, and the gene fusion sequence can be formed by connecting a 5 'end exon sequence before the upstream gene breakpoint and a 3' end exon sequence after the downstream gene breakpoint; intron-intron means that the upstream and downstream breakpoints are located in respective intron regions, and since the intron sequences do not appear in the mature mRNA sequence, the gene fusion sequence is formed by connecting the 5 'end exon sequence before the upstream gene breakpoint with the 3' end exon sequence after the downstream gene breakpoint, and does not contain the intron sequence at the breakpoint; the two situations of exon-intron and intron-exon are slightly complicated, two transcript sequences can be deduced according to the transcription rule, in order to comprehensively and comprehensively select new antigens, the two gene fusion sequences are output together (type 1 and type2 shown in figure 2), the type1 sequence does not contain the exon residual part sequence with the breakpoint positioned in the exon region, and the type2 sequence contains the intron residual part sequence with the breakpoint positioned in the intron region.
Alignment of transcriptome sequencing data was performed using STAR, example commands are:
STAR\
--runThreadN 20\
--genomeDir star_index\
--readFilesIn sample_1.RNA.fq.gz sample_2.RNA.fq.gz\
--readFilesCommand zcat\
--outSAMtype BAM SortedByCoordinate\
--outSAMunmapped Within\
--outFilterMultimapNmax 1\
--outFilterMismatchNmax 3\
--chimSegmentMin 10\
--chimOutType WithinBAM SoftClip\
--chimJunctionOverhangMin 10\
--chimScoreMin 1\
--chimScoreDropMax 30\
--chimScoreJunctionNonGTAG 0\
--chimScoreSeparation 1\
--alignSJstitchMismatchNmax 5-1 5 5\
--chimSegmentReadGapMax 3
wherein-runThreadN indicates the number of threads run; -the geneDir specifies the index file path; readfilessin indicates the original sequencing data read in; -readFilesCommand indicates a read file command; -an output SAMtype BAM sortedByCoordinate designates the output format as BAM and sorts it; -outsamunmapped within indicates that reads not compared are also output to the result file; -outfilter multimapnmax indicates the maximum logarithm of ratios allowed; -outfiltermixchnmax indicates the maximum number of mismatches allowed; - -chimSegmentMin designates the export of the fusion transcript, 10 represents the shortest number of bases aligned; - -chimOutTypeWithinBAM SoftClip specifies the output format of the chimerization alignment; - -chimJunctionOverhangMin indicates the shortest number of bases for the chimerization alignment; - -chimScoreMin designates the minimum score for the chimeric fragment; - -chimscoreddropmax indicates the maximum difference in score between all chimeric fragments; - -chimscorejunctionNonGTAG indicates a penalty for not having bases in the form of "GT/AG" at the chimeric junction; - -chimscoreseeparation specifies the minimum difference between the best chimerism score and the next best chimerism score; - -align SJStatchMismatchNmax specifies the maximum number of mismatches for a splice; - -chimSegmentReadGapMax indicates the maximum number of bases of the discontinuity between chimeric fragments in reads.
Gene fusions were detected using the arriba software. An example command is:
arriba\
-x Aligned.out.bam-o fusions.tsv\
-a reference.fa-g annotation.gtf\
-b blacklist.tsv
where-x indicates the input bam file; -o designates the output file; -a designates a reference genomic sequence; -g indicates gtf annotation files; b indicates a blacklist file for reducing false positives.
And compiling a script to extract a transcriptome gene fusion sequence according to the gene fusion detection result. This process is generally similar to the previously described process for obtaining genomic gene fusion sequences, except that the transcriptome sequences are mature mRNA, and the sequence at the site of the break in the fusion upstream and downstream genes is ligated without inferring the transcription process.
And finally, translating the reading frame of the obtained genome gene fusion sequence and the transcriptome gene fusion sequence into a fusion protein sequence, extracting a potential new antigen peptide sequence, and constructing a tumor specific gene fusion protein group.
Human leukocyte antigen molecular typing was calculated using the leukocyte antigen molecular typing software HLA-LA, example commands are as follows:
HLA-LA.pl--BAM sample.bam\
--graph PRG_MHC_GRCh38_withIMGT\
--sampleID sample--maxThreads threads\
--workingDir out_dir--picard_sam2fastq_bin SamToFastq.jar
wherein-BAM designates the input BAM file; -graph indicates a population reference map; -sampleID specifies a sample unique identifier; -maxThreads indicates the maximum number of threads; -the workgdir indicates the output path; - "picard _ sam2fastq _ bin indicates the tool for converting the bam file into a fastq file.
The affinity prediction of the tumor specific gene fusion protein group is carried out by using the software netMHCpan-4.0 and the molecular typing result of Human Leukocyte Antigen (HLA). An example command is:
netMHCpan-BA-l 9-a HLA_type\
-f filename-inptype 1-xls-xlsfile peptide.xls
wherein, -BA indicates that affinity prediction is performed; -l indicates the length of the peptide fragment; -a indicates Human Leukocyte Antigen (HLA) molecular typing; -f designates the entered file; -input type indicates the type of file entered, 0 is fasta file 1 is a peptide sequence; -xls indicates the output as xls file; xlfile indicates the name of the file to be exported.
Compiling a script, integrating the peptide segment information, comparing with a human normal protein group, filtering out the peptide segments existing in the normal protein group, and sequencing and screening the candidate new antigens by using different indexes and corresponding weights to obtain the final tumor specific gene fusion new antigen.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive.
While the preferred embodiments and examples of the present invention have been described in detail, the present invention is not limited to the embodiments and examples, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims (10)

1. The method for extracting the gene fusion immunotherapy neoantigen by integrating the deep sequencing data of DNA and RNA is characterized by comprising the following steps:
s10, obtaining a genome gene fusion sequence of the sample;
s20, obtaining a transcriptome gene fusion sequence of the sample;
s30, constructing a gene fusion protein group;
s40, obtaining a sample new antigen;
preferably, the genomic gene fusion sequence of the obtained sample is based on whole exome sequencing;
preferably, the transcriptome gene fusion sequences of the obtained sample are based on transcriptome sequencing.
2. The method of claim 1, wherein the obtaining of the genomic gene fusion event of the sample comprises the steps of:
s101, detecting the structural variation of the genome of the tumor sample;
s102, screening gene fusion events;
s103, obtaining a genome gene fusion sequence.
3. The method of claim 1 or 2, wherein obtaining the transcriptome gene fusion sequence of the sample comprises the steps of:
s201, detecting a tumor sample transcriptome gene fusion event;
s202, obtaining a transcriptome gene fusion sequence.
4. The method according to any one of claims 1 to 3, wherein the step S30 comprises performing in-frame translation of the obtained genomic gene fusion sequence and the transcriptome gene fusion sequence, respectively, to obtain a gene fusion protein sequence, i.e., a gene fusion proteome;
preferably, when the reading frame is translated to the breakpoint position where fusion occurs, it is determined whether frame shift translation occurs, if frame shift translation occurs, all protein sequences after the breakpoint position are the sources of potential neoantigen peptides, and if frame shift translation does not occur, only the sequences near the breakpoint can generate the neoantigen peptides.
5. The method according to any one of claims 1 to 4, wherein the peptide fragment sequence is generated according to a specific length as required in the step S30;
preferably, the default peptide fragment is 9 to 12 amino acids in length.
6. The method according to any one of claims 1 to 5, wherein the step of S40 includes the steps of:
s401, identifying Human Leukocyte Antigen (HLA) molecular typing;
s402, predicting the peptide fragment affinity;
and S403, screening a sample neoantigen based on the peptide fragment integration information.
7. The method according to any one of claims 1 to 6, wherein the sample is a tumor tissue, preferably a human tumor tissue.
8. The method according to any one of claims 1 to 7, wherein in step S102, a gene fusion event is selected in which the breakpoint location is located within a gene, but not in an intergenic region.
9. The method according to any one of claims 1 to 8, wherein in step S103, gene sequences are extracted and spliced according to positions of upstream and downstream gene breakpoints involved in gene fusion;
preferably, the method comprises the following steps:
s1031, determining breakpoint positions of upstream and downstream genes;
s1032, judging whether the breakpoint occurs in an exon region or an intron region;
s1033, judging which transcripts in the gene are affected by the breakpoint;
s1034, corresponding each affected upstream gene transcript to each affected downstream gene transcript, and obtaining a complete gene fusion transcript sequence according to the conventional transcription rule.
10. Device for extracting gene fusion immunotherapy neoantigen by integrating deep sequencing data of DNA and RNA, which is characterized by comprising the following steps:
(1) a first unit for obtaining a genomic gene fusion sequence of a sample;
(2) a second unit for obtaining a transcriptome gene fusion sequence of the sample;
(3) a third unit for constructing a specific gene fusion proteome
(4) And the fourth unit is used for obtaining the sample new antigen.
CN201911293011.2A 2019-12-16 2019-12-16 Method and device for extracting gene fusion immunotherapy new antigen by integrating DNA and RNA deep sequencing data Active CN111192632B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911293011.2A CN111192632B (en) 2019-12-16 2019-12-16 Method and device for extracting gene fusion immunotherapy new antigen by integrating DNA and RNA deep sequencing data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911293011.2A CN111192632B (en) 2019-12-16 2019-12-16 Method and device for extracting gene fusion immunotherapy new antigen by integrating DNA and RNA deep sequencing data

Publications (2)

Publication Number Publication Date
CN111192632A true CN111192632A (en) 2020-05-22
CN111192632B CN111192632B (en) 2023-06-13

Family

ID=70707362

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911293011.2A Active CN111192632B (en) 2019-12-16 2019-12-16 Method and device for extracting gene fusion immunotherapy new antigen by integrating DNA and RNA deep sequencing data

Country Status (1)

Country Link
CN (1) CN111192632B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113035272A (en) * 2021-03-08 2021-06-25 深圳市新合生物医疗科技有限公司 Method and apparatus for obtaining new antigens for immunotherapy based on endosomal cell variation
CN115240773A (en) * 2022-09-06 2022-10-25 深圳新合睿恩生物医疗科技有限公司 Method, device, equipment and medium for identifying novel antigen of tumor specific circular RNA

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491689A (en) * 2018-02-01 2018-09-04 杭州纽安津生物科技有限公司 Tumour neoantigen identification method based on transcript profile
US20180341746A1 (en) * 2017-05-25 2018-11-29 Koninklijke Philips N.V. System and method for detecting gene fusion
CN109706065A (en) * 2018-12-29 2019-05-03 深圳裕策生物科技有限公司 Tumor neogenetic antigen load detection device and storage medium
CN109801678A (en) * 2019-01-25 2019-05-24 上海鲸舟基因科技有限公司 Based on the tumour antigen prediction technique of full transcript profile and its application

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180341746A1 (en) * 2017-05-25 2018-11-29 Koninklijke Philips N.V. System and method for detecting gene fusion
CN108491689A (en) * 2018-02-01 2018-09-04 杭州纽安津生物科技有限公司 Tumour neoantigen identification method based on transcript profile
CN109706065A (en) * 2018-12-29 2019-05-03 深圳裕策生物科技有限公司 Tumor neogenetic antigen load detection device and storage medium
CN109801678A (en) * 2019-01-25 2019-05-24 上海鲸舟基因科技有限公司 Based on the tumour antigen prediction technique of full transcript profile and its application

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113035272A (en) * 2021-03-08 2021-06-25 深圳市新合生物医疗科技有限公司 Method and apparatus for obtaining new antigens for immunotherapy based on endosomal cell variation
CN113035272B (en) * 2021-03-08 2023-09-05 深圳市新合生物医疗科技有限公司 Method and device for obtaining immunotherapeutic new antigen based on intein cell variation
CN115240773A (en) * 2022-09-06 2022-10-25 深圳新合睿恩生物医疗科技有限公司 Method, device, equipment and medium for identifying novel antigen of tumor specific circular RNA
CN115240773B (en) * 2022-09-06 2023-07-28 深圳新合睿恩生物医疗科技有限公司 New antigen identification method and device, equipment and medium of tumor specific circular RNA
WO2024051097A1 (en) * 2022-09-06 2024-03-14 深圳新合睿恩生物医疗科技有限公司 Neoantigen identification method and device for tumor-specific circular rnas, apparatus and medium

Also Published As

Publication number Publication date
CN111192632B (en) 2023-06-13

Similar Documents

Publication Publication Date Title
CN109033749B (en) Tumor mutation load detection method, device and storage medium
CN108388773B (en) A kind of identification method of tumor neogenetic antigen
Heather et al. High-throughput sequencing of the T-cell receptor repertoire: pitfalls and opportunities
Latysheva et al. Discovering and understanding oncogenic gene fusions through data intensive computational approaches
CN110600077B (en) Prediction method of tumor neoantigen and application thereof
CN109801678B (en) Tumor antigen prediction method based on complete transcriptome and application thereof
Zhou et al. TSNAD: an integrated software for cancer somatic mutation and tumour-specific neoantigen detection
Dressler et al. Comparative assessment of genes driving cancer and somatic evolution in non-cancer tissues: an update of the Network of Cancer Genes (NCG) resource
CN113035272B (en) Method and device for obtaining immunotherapeutic new antigen based on intein cell variation
Chen et al. GeneFuse: detection and visualization of target gene fusions from DNA sequencing data
CN110621785B (en) Method and device for haplotyping diploid genome based on three-generation capture sequencing
CN111192632B (en) Method and device for extracting gene fusion immunotherapy new antigen by integrating DNA and RNA deep sequencing data
Thingholm et al. Strategies for integrated analysis of genetic, epigenetic, and gene expression variation in cancer: addressing the challenges
CN114974412B (en) Method, apparatus, and medium generating tumor detection data of target object
CN111816253A (en) Gene detection reading method and device
WO2018064547A1 (en) Methods for classifying somatic variations
WO2024051097A1 (en) Neoantigen identification method and device for tumor-specific circular rnas, apparatus and medium
Florea et al. Detection of Alu exonization events in human frontal cortex from RNA-seq data
CN114882951B (en) Method and device for detecting MHC II tumor neoantigen based on next generation sequencing data
Oreper et al. The peptide woods are lovely, dark and deep: hunting for novel cancer antigens
CN114464256A (en) Method, computing device and computer storage medium for detecting tumor neoantigen burden
Alenda et al. FFPE samples from cavitational ultrasonic surgical aspirates are suitable for RNA profiling of gliomas
Garin-Muga et al. Proteogenomic analysis of single amino acid polymorphisms in cancer research
Sharpnack et al. TSAFinder: exhaustive tumor-specific antigen detection with RNAseq
CN110619926B (en) Analysis method and analysis system for recognizing all RNA (ribonucleic acid) cleavage sites

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant