CN111718982A - Tumor tissue single sample somatic mutation detection method and device - Google Patents

Tumor tissue single sample somatic mutation detection method and device Download PDF

Info

Publication number
CN111718982A
CN111718982A CN202010658213.9A CN202010658213A CN111718982A CN 111718982 A CN111718982 A CN 111718982A CN 202010658213 A CN202010658213 A CN 202010658213A CN 111718982 A CN111718982 A CN 111718982A
Authority
CN
China
Prior art keywords
variation
tumor tissue
copy number
sequencing
single sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010658213.9A
Other languages
Chinese (zh)
Inventor
许明炎
陈亚如
周衍庆
王丹丹
汪周阳
陈实富
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Haplox Biotechnology Shenzhen Co ltd
Original Assignee
Haplox Biotechnology Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Haplox Biotechnology Shenzhen Co ltd filed Critical Haplox Biotechnology Shenzhen Co ltd
Priority to CN202010658213.9A priority Critical patent/CN111718982A/en
Publication of CN111718982A publication Critical patent/CN111718982A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6858Allele-specific amplification
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis

Abstract

The invention belongs to the technical field of tumor gene detection, and provides a tumor tissue single sample somatic mutation detection method and device based on high-throughput capture sequencing, and further provides a computer readable medium and detection equipment for operating the method. Compared with the prior art, the method can solve the problem of tumor tissue mutation detection under the condition of no control sample, can better detect various types of mutations (such as SNV, INDEL, CNV, Fusion and MSI) and can calculate the tumor mutation load (TMB). The invention can realize the somatic mutation detection of a single tumor tissue sample under the condition of no control sample.

Description

Tumor tissue single sample somatic mutation detection method and device
Technical Field
The invention belongs to the technical field of tumor gene detection, and particularly relates to a tumor tissue single sample somatic mutation detection method and device based on high-throughput capture sequencing.
Background
With the rapid development of high-throughput sequencing technology, gene detection based on high-throughput capture sequencing has been increasingly used to guide the detection and treatment of clinical tumors. Genetic mutations play an important role in the development and progression of tumors. Mutations are classified into somatic mutations, which occur in somatic cells and are not transmitted to offspring, and germline mutations, which occur in fertilized eggs and are transmitted to offspring. Typically, tumors are the result of somatic mutations. Somatic mutations include single nucleotide site variation (SNV), small fragment insertion deletion variation (INDEL), Structural Variation (SV), Copy Number Variation (CNV), microsatellite instability (MSI). Generally, detecting somatic mutation of a tumor sample requires detecting a normal control sample at the same time, and the true mutation obtained by removing germline mutation in the normal control sample is the somatic mutation. However, it is often difficult to obtain normal control samples from patients at advanced stages, and therefore detection of somatic mutations in tumor tissue without control samples is a major problem in the art.
Disclosure of Invention
The invention aims to provide a tumor tissue single sample somatic mutation detection method and a tumor tissue single sample somatic mutation detection device based on high-throughput capture sequencing, so as to solve the technical problem that the somatic mutation in the tumor tissue is difficult to detect without a control sample in the field.
Accordingly, in a first aspect, the present invention provides a method for detecting somatic mutation in a single sample of tumor tissue based on high-throughput capture sequencing, the method comprising the steps of:
s1: extracting DNA from a single sample of tumor tissue;
s2: capturing a tumor-associated gene using a probe;
s3: sequencing the tumor-related genes by a high-throughput sequencing method to obtain high-throughput sequencing original data;
s4: filtering the high-throughput sequencing original data to remove low-quality sequences and obtain high-throughput sequencing filtered data;
s5: aligning the high-throughput sequencing filtration data to the human reference genome hg19, and removing redundant sequences;
s6: reading single nucleotide site variation and small fragment insertion deletion variation, analyzing gene copy number variation, analyzing gene fusion variation, and analyzing microsatellite site stability, wherein:
analysis of gene copy number variation was performed as follows: selecting a detected normal control sample to establish a normal copy number baseline, and calculating the gene copy number variation of a tumor tissue single sample based on the normal copy number baseline;
analysis of microsatellite locus stability was performed as follows: and constructing a model capable of accurately screening the stability of the microsatellite loci of the single sample by using the detected normal control sample, counting the screened microsatellite loci, and judging the stability of the microsatellite loci of the single sample of the tumor tissue based on the model.
In a second aspect, the present invention provides a tumor tissue single sample somatic mutation detection device based on high-throughput capture sequencing, comprising the following modules:
m1: the DNA extraction module is used for extracting DNA from a single tumor tissue sample;
m2: a capture module for capturing a tumor-associated gene using a probe;
m3: the sequencing module is used for sequencing the tumor-related genes by a high-throughput sequencing method to obtain high-throughput sequencing original data;
m4: the filtering module is used for filtering the high-throughput sequencing original data to remove low-quality sequences and obtain high-throughput sequencing filtered data;
m5: the alignment module is used for aligning the high-throughput sequencing filtered data to the human reference genome hg19 and removing redundant sequences;
m6: an analysis module for reading single nucleotide site variation and small fragment insertion deletion variation, analyzing gene copy number variation, analyzing gene fusion variation, and analyzing microsatellite site stability, wherein:
analysis of gene copy number variation was performed as follows: selecting a detected normal control sample to establish a normal copy number baseline, and calculating the gene copy number variation of a tumor tissue single sample based on the normal copy number baseline;
analysis of microsatellite locus stability was performed as follows: and constructing a model capable of accurately screening the stability of the microsatellite loci of the single sample by using the detected normal control sample, counting the screened microsatellite loci, and judging the stability of the microsatellite loci of the single sample of the tumor tissue based on the model.
Further, in the method and the device, when the high-throughput sequencing raw data is filtered, the fastp software is used, and the parameters are set to be-3-W4-M25.
Further, in the methods and apparatus of the present invention, the bwa mem software was used with the parameter set to-k 32 when aligning the high throughput sequencing filtered data to the human reference genome hg 19.
Further, in the method and apparatus of the present invention, when removing redundant sequences, the genecore software is used, and the parameter is set to-s 1.
Further, in the methods and devices of the present invention, in reading single nucleotide site variation and small fragment indel variation, using varscan software, at the time of reading the mutation site, it is required that: the minimum coverage depth of a mutation site is 10, the minimum mutation frequency is 2%, the quality value of a base supporting mutation is higher than 25, the quality value of sequence alignment is higher than 30, and no chain preference exists, namely, reads supporting mutation are uniformly distributed on the positive and negative chains, the total number of molecules supporting mutation is more than or equal to 3, and the frequency of 1000genome and other people is lower than 0.1%.
Further, in the method and apparatus of the present invention, when analyzing gene copy number variation, the gene copy number variation is calculated according to CBS algorithm.
Further, in the method and apparatus of the present invention, when analyzing gene fusion variation, two methods are used for analysis, the first method uses geneuse software, the second method uses factera software, and then the results of the two methods are filtered and the results are collected.
Further, in the method and the device of the present invention, when analyzing the stability of the microsatellite locus, according to the statistical result, if there are more than 20% unstable loci, the single sample of the tumor tissue is judged to be microsatellite unstable, otherwise, the single sample of the tumor tissue is judged to be stable.
In a third aspect, the present invention provides a computer readable medium storing computer program instructions, wherein when the computer program instructions are executed by a processor, the tumor tissue single sample somatic mutation detection method based on high-throughput capture sequencing of the first aspect of the present invention is executed.
In a fourth aspect, the present invention provides a tumor tissue single sample somatic mutation detection device based on high-throughput capture sequencing, comprising:
a memory for storing the computer program instructions according to the third aspect of the invention, and
a processor for executing the computer program instructions of the third aspect of the invention,
wherein the computer program instructions, when executed by the processor, cause the apparatus to perform the method for detecting somatic mutations in a single sample of tumor tissue based on high throughput capture sequencing according to the first aspect of the present invention.
The invention has the beneficial effects that:
compared with the prior art, the method can solve the problem of tumor tissue mutation detection under the condition of no control sample, can better detect various types of mutations (such as SNV, INDEL, CNV, Fusion and MSI) and can calculate the tumor mutation load (TMB). The method of the invention is tested by a large number of samples, and has higher accuracy compared with other methods. For TMB calculation, the invention carries out comparison method test, carries out Whole Exon Sequencing (WES) and tumor panel capture sequencing of the method respectively on 45 tumor tissue samples and normal control samples, uses the TMB result of the whole exon sequencing as a gold standard, compares the accuracy of the method of the invention, and shows that the TMB similarity result reaches 95%. Because the TMB similarity result of the method is high with the TMB similarity result of the normal control sample, the method can realize the somatic mutation detection of the tumor tissue single sample under the condition of no control sample.
Drawings
FIG. 1 shows a flow chart of tumor tissue single sample somatic mutation detection based on high throughput capture sequencing, as exemplified in example 1 of the present invention.
Detailed Description
In order to make the technical problems solved by the present invention, the technical solutions adopted and the advantages obtained by the present invention more clearly understood, the present invention is further described in detail below with reference to the accompanying drawings and the specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Scientific and technical terms used herein have meanings commonly understood by those skilled in the art unless otherwise defined. Specifically, some terms used in the present invention have the following definitions:
high-throughput sequencing: also called second generation sequencing, compared with the first generation sequencing technology represented by Sanger, the method has the characteristics of high flux, high yield, high accuracy, automatic analysis and the like.
High-throughput capture sequencing: the high-density synthesized probe is adopted, and the interested part on the genome is enriched by base complementation and then sequenced by using a high-throughput sequencing technology.
Fastq files: the sequence file obtained by converting bcl data of the second generation sequencer contains sequence name, base sequence information, base quality value and the like.
BAM file: BWA alignment software is used to align the off-line sequence to a file generated on the human reference gene containing details of the sequence's position on the reference gene, alignment quality, etc.
SNV: a single nucleotide site variation, the base at the position on the sample genome may be replaced with another type of base relative to the corresponding position on the reference genome.
INDEL: insertion deletion variants, the base at the position on the sample genome may be several bases inserted or several bases deleted relative to the corresponding position on the reference genome.
TMB: tumor mutation burden, which refers to the total number of mutations per megabase in tumor tissue, was calculated using SNV and INDEL.
Fusion: fusion, a type of structural variation occurring on chromosomes, is primarily a new chimeric gene made up of two or more genes whose coding regions are joined end to end.
CNV: variation in copy number. The increase or decrease of the copy number of large fragment sequences on genome can be divided into deletion (deletion) and duplication (duplication), which are important molecular mechanisms.
MSI: microsatellite instability, which refers to any change in the length of a microsatellite in a tumor due to the insertion or deletion of repeat units, as compared to normal tissue, presents a new microsatellite allele.
The invention provides a tumor tissue single sample somatic mutation detection method based on high-throughput capture sequencing, which comprises the following steps:
s1: extracting DNA from a single sample of tumor tissue;
s2: capturing a tumor-associated gene using a probe;
s3: sequencing the tumor-related genes by a high-throughput sequencing method to obtain high-throughput sequencing original data;
s4: filtering the high-throughput sequencing original data to remove low-quality sequences and obtain high-throughput sequencing filtered data;
s5: aligning the high-throughput sequencing filtration data to the human reference genome hg19, and removing redundant sequences;
s6: reading single nucleotide site variation and small fragment insertion deletion variation, analyzing gene copy number variation, analyzing gene fusion variation, analyzing microsatellite site stability, wherein,
analysis of gene copy number variation was performed as follows: selecting a detected normal control sample to establish a normal copy number baseline, and calculating the gene copy number variation of a tumor tissue single sample based on the normal copy number baseline;
analysis of microsatellite locus stability was performed as follows: and constructing a model capable of accurately screening the stability condition of the microsatellite loci of the single sample by using the sample, counting the screened microsatellite loci, and judging the stability condition of the microsatellite loci of the single sample of the tumor tissue based on the model.
In a specific embodiment of the present invention, in S1, DNA can be extracted from a single sample of tumor tissue by referring to a technique that is conventional in the art.
In a specific embodiment of the present invention, in S2, a probe is used to capture a tumor-associated gene based on a target region capture principle, which is well known in the art.
In a specific embodiment of the present invention, the tumor associated genes are sequenced by high throughput sequencing methods, as is well known in the art, at S3.
In a specific embodiment of the present invention, in S4, when filtering the high throughput sequencing raw data using fastp software, the parameters are set to-3-W4-M25.
In a specific embodiment of the present invention, the removing of low-quality sequences in S4 includes removing low-quality sequences such as sequences with a large percentage of N bases, and after obtaining high-throughput sequencing-filtered data, the Q20/Q30 ratio and GC content information are counted.
In a specific embodiment of the invention, at S5, the parameter was set to-k 32 when the high throughput sequencing filtered data was aligned to human reference genome hg19 using the bwa mem software.
In a specific embodiment of the present invention, in S5, when the gene software is used to remove the redundant sequence, the parameter is set to-S1, and the redundant sequence includes a PCR amplification sequence and the like.
In a specific embodiment of the present invention, in S5, after removing redundant sequences such as PCR amplified sequences, information such as alignment rate, de-duplication rate, and capture rate is counted.
In a specific embodiment of the present invention, in S6, when reading the single nucleotide site variation and the small fragment indel variation using varscan software, it is required to read the mutation site: the minimum coverage depth of a mutation site is 10, the minimum mutation frequency is 2%, the quality value of a base supporting mutation is higher than 25, the quality value of sequence alignment is higher than 30, and no chain preference exists, namely, reads supporting mutation are uniformly distributed on the positive and negative chains, the total number of molecules supporting mutation is more than or equal to 3, and the frequency of 1000genome and other people is lower than 0.1%.
In a specific embodiment of the present invention, in S6, when analyzing gene copy number variation, a normal control sample is selected to establish a normal copy number baseline because there is no control sample, and then gene copy number variation is calculated according to CBS algorithm. In one specific embodiment, several tens of normal control samples that have been tested are selected to establish a normal copy number baseline.
In a specific embodiment of the present invention, in S6, when analyzing gene fusion variation, two methods are used for analysis, the first method uses geneuse software, the second method uses factera software, and then the results of the two methods are filtered separately and the results are collected.
In the specific embodiment of the present invention, in S6, when the stability of the microsatellite loci is analyzed, since there is no control sample, a model that can accurately screen the stability of the microsatellite loci of a single sample is constructed from the selected samples, and statistics is performed on the conditions of the screened microsatellite loci, and if there are more than 20% unstable loci, the sample is determined to be unstable, otherwise, the sample is stable. In a specific embodiment, a model capable of accurately screening the stable conditions of the microsatellite loci of the single samples is constructed by selecting nearly 200 samples.
In a second aspect, the present invention provides a tumor tissue single sample somatic mutation detection device based on high-throughput capture sequencing, comprising the following modules:
m1: the DNA extraction module is used for extracting DNA from a single tumor tissue sample;
m2: a capture module for capturing a tumor-associated gene using a probe;
m3: the sequencing module is used for sequencing the tumor-related genes by a high-throughput sequencing method to obtain high-throughput sequencing original data;
m4: the filtering module is used for filtering the high-throughput sequencing original data to remove low-quality sequences and obtain high-throughput sequencing filtered data;
m5: the alignment module is used for aligning the high-throughput sequencing filtered data to the human reference genome hg19 and removing redundant sequences;
m6: an analysis module for reading single nucleotide site variation and small fragment insertion deletion variation, analyzing gene copy number variation, analyzing gene fusion variation, and analyzing microsatellite site stability, wherein:
analysis of gene copy number variation was performed as follows: selecting a detected normal control sample to establish a normal copy number baseline, and calculating the gene copy number variation of a tumor tissue single sample based on the normal copy number baseline;
analysis of microsatellite locus stability was performed as follows: and constructing a model capable of accurately screening the stability of the microsatellite loci of the single sample by using the detected normal control sample, counting the screened microsatellite loci, and judging the stability of the microsatellite loci of the single sample of the tumor tissue based on the model.
It should be noted that, the tumor tissue single sample somatic mutation detection device based on high-throughput capture sequencing of the present invention substantially implements each step of the method of the present invention through each module to implement tumor tissue single sample somatic mutation detection, and therefore, further features of the device of the present invention can refer to the description of the first aspect of the present invention on the specific embodiment of the method of the present invention, which is not repeated herein.
It is understood that all or part of the functions of the tumor tissue single sample somatic mutation detection method based on high-throughput capture sequencing can be realized by a hardware mode, and also can be realized by a computer program mode. When implemented by way of a computer program, the program may be stored in a computer-readable storage medium, which may include read-only memory, random access memory, magnetic disks, optical disks, hard disks, etc., and the program is run by a computer to implement the method of the present invention. For example, the program is stored in a memory of the device, and the method of the present invention is implemented when the program stored in the memory is executed by a processor. When all or part of the functions in the method of the present invention are implemented by a computer program, the program may also be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk or a mobile hard disk, and stored in a memory of a local device by downloading or copying, or performing version update on a system of the local device, and when the program in the memory is executed by a processor, all or part of the functions of the method for detecting somatic mutation in a tumor tissue single sample based on high throughput capture sequencing of the present invention may be implemented.
Accordingly, in a third aspect, the present invention provides a computer readable medium having stored thereon computer program instructions, wherein the method for detecting somatic mutations in a single sample of tumor tissue based on high-throughput capture sequencing of the first aspect of the present invention is performed when the computer program instructions are executed by a processor.
In a fourth aspect, the present invention provides a tumor tissue single sample somatic mutation detection device based on high-throughput capture sequencing, comprising:
a memory for storing the computer program instructions according to the third aspect of the invention, and
a processor for executing the computer program instructions of the third aspect of the invention,
wherein the computer program instructions, when executed by the processor, cause the apparatus to perform the method for detecting somatic mutations in a single sample of tumor tissue based on high throughput capture sequencing according to the first aspect of the present invention.
The invention is illustrated by the following non-limiting examples in connection with the accompanying drawings.
Example 1
A. DNA was extracted from 45 tumor tissue paraffin-embedded FFPE samples and for comparison, from the corresponding 45 normal control samples (see table 1). Then, the probe is used for capturing the tumor-related genes, and the Novaseq6000 sequencer is used for sequencing the tumor-related genes by a high-throughput sequencing method, so that high-throughput sequencing original data, namely original off-machine fastq data, is obtained. As shown in fig. 1 (a).
B. And performing data filtering on the original machine-off fastq data by using fastp software, removing low-quality sequence information, removing sequences with high N base content, removing a linker sequence to obtain a filtered sequence file, and counting information such as Q20 information, Q30 information, GC content and the like. As shown in fig. 1 (B).
C. And comparing the filtered clean fastq data with the ginseng reference genome hg19 by using the bwa mem software to perform data comparison to obtain an original comparison bam file, counting information such as comparison rate and the like, and performing quality control. As shown in fig. 1 (C).
D. Using gencore software to process the original comparison bam file for data deduplication, removing repeated sequences such as PCR (polymerase chain reaction) and the like to obtain a deduplicated bam file, and counting information such as repetition rate, capture rate, effective sequencing depth and the like and performing quality control. As shown in fig. 1 (D).
E. And processing the deduplicated bam file by using samtools software to generate the mpieup file. Based on the generated mpieup file, mutation reading is performed by varscan software to obtain snv and indel vcf files. And annotating the generated vcf file of snv and indels by using annovar software to obtain an annotated vcf file. And (3) processing the deduplicated bam file and the annotated vcf file by using MrBam software to count the sequence situation supporting the variation. Baseline database annotation was performed on the statistically generated result file using the self-created script. And then filtering the result file generated by the annotation by using the self-built script, and filtering out the mutation with the sequence support number less than 7 or the mutation abundance lower than 2%, or the mutation with the database population frequency higher than 0.1% such as 1000genome, or the mutation with the occurrence frequency more than 10 in the baseline database. Based on the filtered results, sample TMB results are calculated. As shown in fig. 1 (E).
F. Copy number was analyzed using the cnvkit software based on the bam file generated in D, and the reference baseline used as a control was a reference established using data from tens of healthy persons. As shown in fig. 1 (F).
G. Based on the bam file generated in D, MSI was analyzed using self-built software. And constructing a model capable of accurately screening the stable conditions of the microsatellite loci of the single samples by using nearly 200 samples. And counting the conditions of the screened microsatellite loci, if more than 20 percent of unstable loci exist, judging that the sample is unstable, otherwise, the sample is stable. As shown in fig. 1 (G).
H. And (4) analyzing and fusing by using geneuse software based on the filtered fastq data obtained in the step B, analyzing and fusing again by using factera software based on the bam file generated in the step D, and filtering two obtained fusion results to obtain a final fusion result. As shown in fig. 1 (H).
For the calculation of sample TMB (total number of mutations per megabase of tumor tissue), comparative method tests were performed, Whole Exon Sequencing (WES) was performed on 45 tumor tissue samples and corresponding normal control samples, and we TMB was counted; and performing tumor panel capture sequencing on 45 tumor tissue samples by using the method disclosed by the invention, and counting the tumor panel TMB. The results are shown in table 1.
Table 1: TMB results of total exon sequencing (WES) and tumor panel capture sequencing by the method of the invention were performed on 45 tumor tissue samples and normal tissue samples, respectively
Figure BDA0002577528300000081
Figure BDA0002577528300000091
The TMB result of the whole exon sequencing is used as a gold standard, the accuracy of the method is compared by using the calculation of the Pearson correlation coefficient, and the similarity result of the TMB and the WES TMB calculated by the method reaches 95 percent. Because the TMB similarity result of the method is high with that of the gold standard WES, the method can realize the somatic mutation detection of a single tumor tissue sample under the condition of no control sample, and solves the problem of tumor tissue mutation detection under the condition of no control sample in the field.
The present invention has been described above using specific examples, which are only for the purpose of facilitating understanding of the present invention, and are not intended to limit the present invention. Numerous simple deductions, modifications or substitutions may be made by those skilled in the art in light of the teachings of the present invention. Such deductions, modifications or alternatives also fall within the scope of the claims of the present invention.

Claims (10)

1. A tumor tissue single sample somatic mutation detection method based on high-throughput capture sequencing is characterized by comprising the following steps:
s1: extracting DNA from a single sample of tumor tissue;
s2: capturing a tumor-associated gene using a probe;
s3: sequencing the tumor-related genes by a high-throughput sequencing method to obtain high-throughput sequencing original data;
s4: filtering the high-throughput sequencing original data to remove low-quality sequences and obtain high-throughput sequencing filtered data;
s5: aligning the high-throughput sequencing filtration data to a human reference genome hg19 and removing redundant sequences;
s6: reading single nucleotide site variation and small fragment insertion deletion variation, analyzing gene copy number variation, analyzing gene fusion variation, and analyzing microsatellite site stability, wherein:
the analysis of gene copy number variation was performed as follows: selecting a detected normal control sample to establish a normal copy number baseline, and calculating the gene copy number variation of the tumor tissue single sample based on the normal copy number baseline;
the analysis of microsatellite locus stability was carried out as follows: and constructing a model capable of accurately screening the stability of the microsatellite loci of the single sample by using the detected normal control sample, counting the screened microsatellite loci, and judging the stability of the microsatellite loci of the single sample of the tumor tissue based on the model.
2. The method of claim 1, wherein in S4, when filtering the high-throughput sequencing raw data, using fastp software, parameters are set to-3-W4-M25.
3. The method of claim 1, wherein in S5, when aligning the high throughput sequencing-filtered data onto human reference genome hg19, the bwa mem software is used with parameters set to-k 32; in removing redundant sequences, the genecore software was used with the parameters set to-s 1.
4. The method according to claim 1, wherein, in S6,
in reading single nucleotide site variation and small fragment indel variation, using varscan software, at the time of reading the mutation site, it is required to: the minimum coverage depth of a mutation site is 10, the minimum mutation frequency is 2%, the quality value of a base supporting mutation is higher than 25, the quality value of sequence alignment is higher than 30, and no chain preference exists, namely, reads supporting mutation are uniformly distributed on a positive chain and a negative chain, the total number of molecules supporting mutation is more than or equal to 3, and the frequency of 1000genome and other people is lower than 0.1%;
when the gene copy number variation is analyzed, calculating the gene copy number variation according to a CBS algorithm;
when analyzing gene fusion variation, two methods are used for analysis, wherein the first method uses geneuse software, the second method uses factera software, then results of the two methods are filtered respectively, and a result union is obtained;
when the stability condition of the microsatellite locus is analyzed, if more than 20% of unstable loci exist, the single sample of the tumor tissue is judged to be microsatellite unstable, otherwise, the single sample of the tumor tissue is stable according to the statistical result.
5. The device for detecting the somatic mutation of the tumor tissue single sample based on the high-throughput capture sequencing is characterized by comprising the following modules:
m1: the DNA extraction module is used for extracting DNA from a single tumor tissue sample;
m2: a capture module for capturing a tumor-associated gene using a probe;
m3: the sequencing module is used for sequencing the tumor related genes by a high-throughput sequencing method to obtain high-throughput sequencing original data;
m4: the filtering module is used for filtering the high-throughput sequencing original data to remove low-quality sequences and obtain high-throughput sequencing filtered data;
m5: an alignment module for aligning the high throughput sequencing filtered data to a human reference genome hg19 and removing redundant sequences;
m6: an analysis module for reading single nucleotide site variation and small fragment insertion deletion variation, analyzing gene copy number variation, analyzing gene fusion variation, and analyzing microsatellite site stability, wherein:
the analysis of gene copy number variation was performed as follows: selecting a detected normal control sample to establish a normal copy number baseline, and calculating the gene copy number variation of the tumor tissue single sample based on the normal copy number baseline;
the analysis of microsatellite locus stability was carried out as follows: and constructing a model capable of accurately screening the stability of the microsatellite loci of the single sample by using the detected normal control sample, counting the screened microsatellite loci, and judging the stability of the microsatellite loci of the single sample of the tumor tissue based on the model.
6. The apparatus of claim 5, wherein in M4, when filtering the high throughput sequencing raw data, a fastp software is used, and the parameters are set to-3-W4-M25.
7. The device of claim 5, wherein in M5, when aligning the high throughput sequencing filtration data onto human reference genome hg19, the parameters are set to-k 32 using the bwa mem software; in removing redundant sequences, the genecore software was used with the parameters set to-s 1.
8. The apparatus according to claim 5, wherein, in M6,
in reading single nucleotide site variation and small fragment indel variation, using varscan software, at the time of reading the mutation site, it is required to: the minimum coverage depth of a mutation site is 10, the minimum mutation frequency is 2%, the quality value of a base supporting mutation is higher than 25, the quality value of sequence alignment is higher than 30, and no chain preference exists, namely, reads supporting mutation are uniformly distributed on a positive chain and a negative chain, the total number of molecules supporting mutation is more than or equal to 3, and the frequency of 1000genome and other people is lower than 0.1%;
when the gene copy number variation is analyzed, calculating the gene copy number variation according to a CBS algorithm;
when analyzing gene fusion variation, two methods are used for analysis, wherein the first method uses geneuse software, the second method uses factera software, then results of the two methods are filtered respectively, and a result union is obtained;
when the stability condition of the microsatellite locus is analyzed, if more than 20% of unstable loci exist, the single sample of the tumor tissue is judged to be microsatellite unstable, otherwise, the single sample of the tumor tissue is stable according to the statistical result.
9. A computer readable medium storing computer program instructions, wherein when the computer program instructions are executed by a processor, the method according to any of claims 1-4 is performed.
10. A tumor tissue single sample somatic mutation detection device based on high-throughput capture sequencing, which is characterized by comprising:
a memory for storing the computer program instructions according to claim 9, and
a processor for executing the computer program instructions according to claim 9,
wherein the apparatus executes the method according to any one of claims 1-4 when the computer program instructions are executed by the processor.
CN202010658213.9A 2020-07-09 2020-07-09 Tumor tissue single sample somatic mutation detection method and device Pending CN111718982A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010658213.9A CN111718982A (en) 2020-07-09 2020-07-09 Tumor tissue single sample somatic mutation detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010658213.9A CN111718982A (en) 2020-07-09 2020-07-09 Tumor tissue single sample somatic mutation detection method and device

Publications (1)

Publication Number Publication Date
CN111718982A true CN111718982A (en) 2020-09-29

Family

ID=72572219

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010658213.9A Pending CN111718982A (en) 2020-07-09 2020-07-09 Tumor tissue single sample somatic mutation detection method and device

Country Status (1)

Country Link
CN (1) CN111718982A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112592976A (en) * 2020-12-30 2021-04-02 深圳市海普洛斯生物科技有限公司 Method and device for detecting MET gene amplification
CN112766428A (en) * 2021-04-08 2021-05-07 臻和(北京)生物科技有限公司 Tumor molecule typing method and device, terminal device and readable storage medium
CN112837748A (en) * 2021-01-26 2021-05-25 南京医科大学 System and method for distinguishing tumors of different anatomical origins
CN113628683A (en) * 2021-08-24 2021-11-09 慧算医疗科技(上海)有限公司 High-throughput sequencing mutation detection method, equipment, device and readable storage medium
CN113724788A (en) * 2021-07-29 2021-11-30 哈尔滨医科大学 Method for identifying extrachromosomal circular DNA (deoxyribonucleic acid) constitutive genes of tumor cells
CN114067908A (en) * 2021-11-23 2022-02-18 深圳基因家科技有限公司 Method, device and storage medium for evaluating single-sample homologous recombination defects
CN114694750A (en) * 2022-05-31 2022-07-01 江苏先声医疗器械有限公司 Single-sample tumor somatic mutation distinguishing and TMB (Tetramethylbenzidine) detecting method based on NGS (Next Generation System) platform
CN115458051A (en) * 2022-09-28 2022-12-09 北京泛生子基因科技有限公司 Method, device and computer readable storage medium for simulating small variation in sequencing data and capable of retaining molecular tag information

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107723351A (en) * 2016-08-12 2018-02-23 嘉兴允英医学检验有限公司 A kind of high-flux detection method of Circulating tumor DNA lung cancer driving gene
CN110468189A (en) * 2019-08-29 2019-11-19 北京优迅医学检验实验室有限公司 The method and device of detection sample somatic variation is sequenced based on single two generation of sample
CN110570904A (en) * 2019-08-27 2019-12-13 深圳百诺精准医疗科技有限公司 tumor mutation analysis method, system, terminal and readable storage medium
WO2020076900A1 (en) * 2018-10-09 2020-04-16 Genecentric Therapeutics, Inc. Detecting tumor mutation burden with rna substrate
CN111118167A (en) * 2020-03-31 2020-05-08 菁良基因科技(深圳)有限公司 Tumor mutation load standard substance and preparation method and kit thereof
CN111321140A (en) * 2020-03-03 2020-06-23 苏州吉因加生物医学工程有限公司 Tumor mutation load detection method and device based on single sample

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107723351A (en) * 2016-08-12 2018-02-23 嘉兴允英医学检验有限公司 A kind of high-flux detection method of Circulating tumor DNA lung cancer driving gene
WO2020076900A1 (en) * 2018-10-09 2020-04-16 Genecentric Therapeutics, Inc. Detecting tumor mutation burden with rna substrate
CN110570904A (en) * 2019-08-27 2019-12-13 深圳百诺精准医疗科技有限公司 tumor mutation analysis method, system, terminal and readable storage medium
CN110468189A (en) * 2019-08-29 2019-11-19 北京优迅医学检验实验室有限公司 The method and device of detection sample somatic variation is sequenced based on single two generation of sample
CN111321140A (en) * 2020-03-03 2020-06-23 苏州吉因加生物医学工程有限公司 Tumor mutation load detection method and device based on single sample
CN111118167A (en) * 2020-03-31 2020-05-08 菁良基因科技(深圳)有限公司 Tumor mutation load standard substance and preparation method and kit thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李顺: "基于下一代测序技术的拷贝数变异检测方法研究", 《中国优秀硕士学位论文全文数据库 医药卫生科学科技辑》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112592976A (en) * 2020-12-30 2021-04-02 深圳市海普洛斯生物科技有限公司 Method and device for detecting MET gene amplification
CN112837748A (en) * 2021-01-26 2021-05-25 南京医科大学 System and method for distinguishing tumors of different anatomical origins
CN112766428A (en) * 2021-04-08 2021-05-07 臻和(北京)生物科技有限公司 Tumor molecule typing method and device, terminal device and readable storage medium
CN112766428B (en) * 2021-04-08 2021-07-02 臻和(北京)生物科技有限公司 Tumor molecule typing method and device, terminal device and readable storage medium
CN113724788A (en) * 2021-07-29 2021-11-30 哈尔滨医科大学 Method for identifying extrachromosomal circular DNA (deoxyribonucleic acid) constitutive genes of tumor cells
CN113724788B (en) * 2021-07-29 2023-09-12 哈尔滨医科大学 Method for identifying extrachromosomal circular DNA (deoxyribonucleic acid) constitutive genes of tumor cells
CN113628683A (en) * 2021-08-24 2021-11-09 慧算医疗科技(上海)有限公司 High-throughput sequencing mutation detection method, equipment, device and readable storage medium
CN113628683B (en) * 2021-08-24 2024-04-09 慧算医疗科技(上海)有限公司 High-throughput sequencing mutation detection method, device and apparatus and readable storage medium
CN114067908A (en) * 2021-11-23 2022-02-18 深圳基因家科技有限公司 Method, device and storage medium for evaluating single-sample homologous recombination defects
CN114694750A (en) * 2022-05-31 2022-07-01 江苏先声医疗器械有限公司 Single-sample tumor somatic mutation distinguishing and TMB (Tetramethylbenzidine) detecting method based on NGS (Next Generation System) platform
CN115458051A (en) * 2022-09-28 2022-12-09 北京泛生子基因科技有限公司 Method, device and computer readable storage medium for simulating small variation in sequencing data and capable of retaining molecular tag information
CN115458051B (en) * 2022-09-28 2023-03-21 北京泛生子基因科技有限公司 Method, device and computer readable storage medium for simulating small variation in sequencing data and capable of retaining molecular tag information

Similar Documents

Publication Publication Date Title
CN111718982A (en) Tumor tissue single sample somatic mutation detection method and device
CN111341383B (en) Method, device and storage medium for detecting copy number variation
Bzikadze et al. Automated assembly of centromeres from ultra-long error-prone reads
CN109949861B (en) Tumor mutation load detection method, device and storage medium
US20220130488A1 (en) Methods for detecting copy-number variations in next-generation sequencing
CN107423578B (en) Device for detecting somatic cell mutation
CN110444255B (en) Biological information quality control method and device based on second-generation sequencing and storage medium
CN109767810B (en) High-throughput sequencing data analysis method and device
CN111916150A (en) Method and device for detecting genome copy number variation
CN110021355B (en) Haploid typing and variation detection method and device for diploid genome sequencing segment
CN111180010A (en) Tumor somatic mutation site detection method and device thereof
CN115083529A (en) Method and device for detecting sample pollution rate
CN109461473B (en) Method and device for acquiring concentration of free DNA of fetus
CN114694750A (en) Single-sample tumor somatic mutation distinguishing and TMB (Tetramethylbenzidine) detecting method based on NGS (Next Generation System) platform
CN112735517A (en) Method, device and storage medium for detecting joint deletion of chromosomes
CN116825193A (en) Method, device and storage medium for correcting mitochondrial genome sequencing mutation
CN114530200B (en) Mixed sample identification method based on calculation of SNP entropy
WO2019132010A1 (en) Method, apparatus and program for estimating base type in base sequence
CN108504734B (en) Method for judging specific individual affiliation of malignant tumor tissue and application thereof
CN113793641B (en) Method for rapidly judging sample gender from FASTQ file
Hesse K-Mer-Based Genome Size Estimation in Theory and Practice
Hesse Check Chapter 4 updates for
CN116469468B (en) Editing gene carrier residue detection method and system based on Bayes model
CN116543835B (en) Method and device for detecting microsatellite state of plasma sample
CN111653312B (en) Method for exploring disease subtype affinity by using genome data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination