CN111968701A - Method and device for detecting somatic copy number variation of designated genome region - Google Patents
Method and device for detecting somatic copy number variation of designated genome region Download PDFInfo
- Publication number
- CN111968701A CN111968701A CN202010880479.8A CN202010880479A CN111968701A CN 111968701 A CN111968701 A CN 111968701A CN 202010880479 A CN202010880479 A CN 202010880479A CN 111968701 A CN111968701 A CN 111968701A
- Authority
- CN
- China
- Prior art keywords
- value
- region
- target capture
- significance
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/10—Ploidy or copy number detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Abstract
The invention relates to the field of genes, in particular to a method and a device for detecting somatic copy number variation of a specified genome region. Comprehensive log of the invention2The copy Ratio value, the B Allele Frequency (BAF) difference significance, the uniform read length coverage difference significance in the target capture region and the uniform read length coverage difference significance in the non-target capture region are multiple indexes, and multivariate analysis is applied to synthesize multiple index analysis results, so that the high-efficiency and accurate copy number variation detection of the specified gene or interval is realized.
Description
Technical Field
The invention relates to the field of genes, in particular to a method and a device for detecting somatic copy number variation of a specified genome region.
Background
Copy Number Variations (CNV) are a form of structural variation of DNA sequences, including the replication and deletion of specific DNA fragments (>1kb), and are a significant source of normal and pathogenic variation in the human genome. The development of Next Generation Sequencing (NGS) technology has greatly improved the ability to detect all types of genomic variations, from single nucleotide variations and small indels to CNVs and other forms of Structural Variation (SV). The ability to detect CNV using whole genome sequencing data is strongest, but most of the tools and methods for detecting CNV use whole exon sequencing data due to its high cost. However, Whole Exon Sequencing (WES) introduces more bias and noise than Whole Genome Sequencing (WGS), making CNV detection very challenging. Furthermore, the complexity of the tumor makes detection of cancer-specific CNVs more difficult.
The final output result of the Control-FREEC software is a whole copy number, so that the detection effect of the sample with an unclear or low tumor cell ratio is poor, and the algorithm cannot detect the CNV with the copy number below 2.5.
At present, CNV is calculated from two data, log2(copy Ratio) and B-Allole Frequency (BAF). log (log)2(copy Ratio) was used to calculate CNV fragments, BAF was used to calculate Loss of heterozygote (LOH) and Allelic Imbalance (Allelic Imbalance).
log2(copy Ratio) values were calculated by reading depth of control and tumor samples. The read depth method is to find amplification or deletion based on the distribution density of reads on a chromosome, and the basic principle is that an amplification region has a higher read density than the surrounding region, and a deletion region has a lower read density than the surrounding region. The read depth method generally divides the genome into several windows and then performs density calculation, so that the accuracy of the calculated break point does not exceed the size of the window.
CNV kit software gives only one relative copy numberRelative magnitude of change log of2(copy Ratio) estimation, wherein whether the fragment is CNV or not is not checked, and a user needs to set a threshold value for judgment; the output result has no prompt statistical significance and is difficult to interpret. For long genes, it is often the case that individual genes are divided into different CNV segments and the estimated copy number is different.
BAF value of 0-1, representing the ratio of a certain SNP allele relative to the total copy number, BAF of 0.5 representing heterozygote (AB), 0 and 1 representing homozygote (e.g., AA and BB), respectively. If a deletion in a region is present, it will be homozygous, and the BAF value will be either 0 or 1. If a single copy repeat is present in a region, the BAF values will be 0.33(AAB) and 0.67(ABB) except for the homozygous SNP, which has a BAF value of 0 or 1(AAA or BBB).
These software and statistics are basically applied to the copy number variation detection of samples individually, and in the prior art, the copy number variation information of genes or intervals can be determined only through the variation detection process and the annotation process. Therefore, it would be of great significance to be able to develop a copy number variation detection method that can efficiently, accurately, and at low cost, for a given gene or region.
Disclosure of Invention
Therefore, the technical problem to be solved by the present invention is to overcome the defect in the prior art that copy number variation analysis of a specific gene can be realized only by a variation detection process and an annotation process, so as to provide a method and a device capable of performing one-step operation, i.e., efficiently detecting copy number variation of a sample cell of the specific gene or region, and having the advantages of low cost, high efficiency or high accuracy.
The invention provides a method for detecting somatic copy number variation of a designated genome region, which comprises the following steps:
obtaining a tumor sample with known copy number variation condition of a designated genome region, and taking a matched sample as a control sample; comparing the sequencing data of the tumor sample and the reference sample with the reference genome respectively to obtain comparison result files;
based on the comparison result file, drawing windows in a target capture region or a non-target capture region of a designated genome region, respectively calculating the uniform read length coverage of the tumor sample and the control sample in each window, then calculating the difference significance P value of the uniform read length coverage of the tumor sample and the control sample, obtaining a difference significance P1 value corresponding to the target capture region, and obtaining a difference significance P2 value corresponding to the non-target capture region;
calculating logs of the tumor sample and the control sample at the target capture region of the designated genomic region based on the normalized read length coverage of the tumor sample and the control sample within each window of the target capture region of the designated genomic region2A (copy Ratio) value;
respectively calculating the B allele frequencies of the tumor sample and the control sample in the target capture region of the designated genome region, and then calculating the difference significance P3 value of the B allele frequencies of the tumor sample and the control sample in the target capture region of the designated genome region;
using logs obtained as described above2Constructing a machine learning model by using a (copy Ratio) value, a P1 value, a P2 value, a P3 value and the known copy number variation condition of the designated genome region, and obtaining a judgment threshold value; or log obtained by using the above2(copy Ratio) value, P1 value, P3 value and known copy number variation condition of the designated genome region construct a machine learning model, and obtain a judgment threshold value;
respectively calculating log of the tumor sample to be detected and the matched sample according to the steps2(copy Ratio) value, P1 value, P2 value and P3 value, or log2And (copy Ratio) value, P1 value and P3 value, analyzing by a machine learning model, comparing the analysis result with a threshold value, and determining whether the tumor sample to be detected has somatic copy number variation in a designated analysis area.
Aiming at the sequencing data of the whole exon and the sequencing data of the target capture, the target capture area is a probe capture area of a sequencing chip, and the non-target capture area is a whole genome area with a blacklist area (such as an N area, a centromere area and the like) removed and other areas outside the target capture area.
There is no non-target capture region data for whole genome sequencing.
Further, in the step of calculating the difference significance P value of the normalized read length coverage, the T test and the KS test are respectively selected to calculate the difference significance, and the difference significance P1 is obtained corresponding to the target capture areaTValue sum P1ksValue, corresponding to non-target capture area, resulting in a significance of difference P2tValue sum P2ksThe value is obtained.
Further, when a window is drawn for a target capture region of a designated genomic region, each window has at least 200 reads; or when the non-target capture region of a given genomic region is windowed, there are at least 200 reads per window.
The length of each of 30 non-target capture regions before and after the nearest non-target capture region to the specified genome region is specifically the length of a window.
Further, in the step of calculating the difference significance P3 value of the B allele frequency, the T test and the KS test are respectively selected to calculate the difference significance, and the difference significance P3 of the B allele frequency is obtainedTValue sum P3KSThe value is obtained.
Furthermore, acquiring an embryonic system mutation site with the mutation frequency of more than or equal to 0.05 from a known database for later use, and then screening out a spare embryonic system mutation site with the sequencing depth of more than or equal to 10X based on the comparison result file, wherein the spare embryonic system mutation site which is heterozygous in the control sample is used as a BAF check site set.
Further, when the analysis of the significance of the difference of the B allele frequencies is carried out, the BAF check sites of the target capture region in the designated genome region of the tumor sample and the control sample are at least 30, and when the number is less than 30, the BAF check sites with the target number are obtained by extending the designated genome region to two sides for supplement.
Further, at log2(copy Ratio) value step, dividing windows of the target capture region of the designated genome region, ensuring that each window has at least 200 reads, and respectively calculating log of each window2(copy Ratio) value, then log for all windows2The (copy Ratio) value takes the median.
Further, the machine learning model is selected from a gradient boosting decision tree, a support vector machine, naive Bayes, Adaboost algorithm, logistic regression, or random forest.
Further, an ROC curve is drawn by utilizing the machine learning model analysis result, and the analysis result is selected as a judgment threshold value under the conditions of high AUC value, high sensitivity and high specificity.
Further, in the invention, the respective range values of the high AUC value, the high sensitivity and the high specificity are respectively AUC more than or equal to 95%, the sensitivity more than or equal to 95% and the specificity more than or equal to 90%.
Further, the sequencing data is whole genome sequencing data, whole exon sequencing data or targeted capture sequencing data.
The invention provides a device for detecting somatic copy number variation of a designated genome region, which comprises:
the comparison unit is used for comparing the sequencing data of the tumor sample and the control sample with the reference genome respectively to obtain comparison result files; the copy number variation of the appointed genome region of the tumor sample is known, and a matched sample of the tumor sample is used as a control sample;
a difference significance P value calculation unit of the normalized read length coverage, which is used for drawing windows in a target capture area or a non-target capture area of the designated genome area based on the comparison result file, respectively calculating the normalized read length coverage of the tumor sample and the control sample in each window, then calculating the difference significance P value of the normalized read length coverage of the tumor sample and the control sample, obtaining a difference significance P1 value corresponding to the target capture area, and obtaining a difference significance P2 value corresponding to the non-target capture area;
log2(copy Ratio) value calculating unit for calculating log of the tumor sample and the control sample at the target capture region of the designated genomic region based on the normalized read length coverage of the tumor sample and the control sample at the target capture region of the designated genomic region within each window2A (copy Ratio) value;
a calculation unit of the difference significance P3 value of the B allele frequencies, which is used for respectively calculating the B allele frequencies of the tumor sample and the control sample in the target capture region of the designated genome region and then calculating the difference significance P3 value of the B allele frequencies of the tumor sample and the control sample in the target capture region of the designated genome region;
constructing a machine learning model unit for obtaining logs2Constructing a machine learning model by using a (copy Ratio) value, a P1 value, a P2 value, a P3 value and the known copy number variation condition of the designated genome region, and obtaining a judgment threshold value; or log obtained by using the above2(copy Ratio) value, P1 value, P3 value and known copy number variation condition of the designated genome region construct a machine learning model, and obtain a judgment threshold value;
a detection unit for calculating log of the tumor sample to be detected and the matched sample according to the above units2(copy Ratio) value, P1 value, P2 value and P3 value, or log2And (copy Ratio), a P1 value and a P3 value, and then determining whether the tumor sample to be detected has somatic copy number variation in a designated analysis area or not by constructing a machine learning model unit for analysis and comparing the obtained analysis result with the judgment threshold value.
Further, the normalized difference significance P value calculation unit for read length coverage further comprises a T test unit and a KS test unit, which are respectively used for calculating the difference significance of the normalized read length coverage and obtaining the difference significance P1 corresponding to the target capture areaTValue sum P1ksValue, corresponding to non-target capture area, resulting in a significance of difference P2TValue sum P2ksThe value is obtained.
Furthermore, the calculation unit for the difference significance of the B allele frequencies P3 value also comprises a T test unit and a KS test unit which are respectively used for calculating the difference significance of the B allele frequencies to obtain the difference significance of the B allele frequencies P3TValue sum P3KSThe value is obtained.
Furthermore, the calculating unit for the difference significance P3 value of the B allele frequency further comprises a BAF check site set obtaining unit for obtaining the germline variant site with the mutation frequency of more than or equal to 0.05 from the known database for later use, then screening out the germline variant site with the sequencing depth of more than or equal to 10X based on the comparison result file, and using the spare germline variant site which is heterozygous in the control sample as the BAF check site set.
Further, the unit for constructing the machine learning model further includes a gradient boosting decision tree unit.
Further, the method also comprises the following steps:
and the ROC curve drawing unit is used for drawing an ROC curve by using the machine learning model analysis result and selecting the analysis result as a threshold value under the conditions of high AUC value, high sensitivity and high specificity.
The technical scheme of the invention has the following advantages:
1. the invention provides a method for detecting somatic copy number variation of a specified genome region, which integrates log2The method comprises the following steps of establishing a machine learning model and obtaining a judgment threshold value according to multiple indexes of (copy Ratio) value, B Allele Frequency (BAF) difference significance, difference significance of normalized read length coverage in a target capture region and difference significance of normalized read length coverage in a non-target capture region, and the copy number variation condition of a known specified genome region, analyzing a sample to be detected through the machine learning model, comparing an analysis result with the judgment threshold value, and determining whether somatic cell copy number variation occurs in the tumor sample to be detected, wherein the detection result is more accurate and the detection is more efficient.
2. The method for detecting somatic copy number variation of a designated genome region can detect copy number variation of a designated gene or segment, and can carry out brief analysis by using data of a non-target capture region except a capture region of a chip even if the analysis region is not captured by the chip.
3. The method for detecting the somatic copy number variation of the designated genome region comprises the steps of respectively selecting T test and KS test to calculate the difference significance in the step of calculating the difference significance P value of the normalized read length coverage, and obtaining the difference significance P1 corresponding to the target capture regionTValue sum P1ksValue, corresponding to non-target capture area, resulting in a significance of difference P2tValue sum P2ksA value; by selecting the T test and the KS test, different sample conditions are considered, the analysis accuracy is improved,false positives are reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flowchart of a method for detecting variations in the somatic copy number of a given genomic region according to example 1 of the present invention;
FIG. 2 is a graph of ROC (machine learning model) curves for determining somatic copy number variation in a specified genomic region by using a gradient boosting decision tree in example 1 of the present invention;
FIG. 3 is a flowchart of a method for detecting variations in the somatic copy number of a given genomic region in example 2 of the present invention.
Detailed Description
The following examples are provided to further understand the present invention, not to limit the scope of the present invention, but to provide the best mode, not to limit the content and the protection scope of the present invention, and any product similar or similar to the present invention, which is obtained by combining the present invention with other prior art features, falls within the protection scope of the present invention.
The examples do not show the specific experimental steps or conditions, and can be performed according to the conventional experimental steps described in the literature in the field. The reagents or instruments used are not indicated by manufacturers, and are all conventional reagent products which can be obtained commercially.
Example 1
The present embodiment provides a method for detecting somatic copy number variation in a specified genomic region, the flowchart of which is shown in fig. 1, comprising:
(1) obtaining a tumor sample with known copy number variation condition of a designated genome region, and taking a matched sample as a control sample; comparing the sequencing data of the tumor sample and the reference sample with the reference genome respectively to obtain comparison result files;
in this example, 8 tumor tissue samples (non-small cell lung cancer in this example) were selected, and their matched samples (control samples) were leukocytes from peripheral blood of the same patient, and whole exon sequencing was performed using the Gene + Seq 2000 sequencing platform, respectively. The sequencing data are aligned with a ginseng reference genome (GRCh37) through BWA-MEM software to obtain an alignment result file. It is known that in a given genomic region (each sample has a gene to be analyzed, and the total of 8 samples is 240 genes), 240 genes among them have copy number variation in tumor tissue samples, and the copy number variation occurs in 100 of them and does not occur in 140 of them.
The target capture area is a probe capture area of a sequencing chip (the gene detection product of the oncogene-coded D-C1021 tumor drug gene, 240 genes belong to the range of genes which can be captured by the chip), the non-target capture area is a whole genome area (except 1021 genes), and areas of blacklist areas (such as an N area, a centromere area and the like) are removed.
(2) Based on the comparison result file, drawing windows in a target capture region or a non-target capture region of a designated genome region, respectively calculating the uniform read length coverage of the tumor sample and the control sample in each window, then calculating the difference significance P value of the uniform read length coverage of the tumor sample and the control sample, obtaining a difference significance P1 value corresponding to the target capture region, and obtaining a difference significance P2 value corresponding to the non-target capture region;
further, dividing the tumor sample and the control sample into windows according to the designated length in the target capture region of the designated genome region, ensuring that each window has at least 200 reads, and respectively calculating the read length coverage of the tumor sample and the control sample normalized in each window of the target capture region of the designated genome region as the calculated value for the significance analysis of the control sample and the tumor sample. Wherein each window of the tumor sampleUniform read length coverage ofNormalized read length coverage per window for control samplesWhereinAndrepresenting the number of reads covering the tumor sample and control sample genome segmentation windows, respectively; n is a radical oftAnd NnThe total number of reads for the tumor sample and the control sample, respectively. Selecting a T test and a KS (Kolmogorov-Smirnov) test to respectively calculate the difference significance, and obtaining the difference significance P1 by a target capture region read length coverage T test and a target capture region read length coverage KS testTValue sum P1ksThe value is obtained.
Further, dividing the tumor sample and the control sample into windows in the non-target capture region of the designated genome region according to the designated length, ensuring that each window has at least 200 reads, and respectively calculating the normalized read length coverage in each window as the calculated value for the significance analysis of the control sample and the tumor sample. The non-target capture area has a range limitation, specifically, the window length of each of 30 non-target capture areas before and after the non-target capture area is closest to the designated analysis area. Selecting a T test and a KS (Kolmogorov-Smirnov) test to respectively calculate the difference significance, and obtaining a difference significance P2 by a non-target capture region reading length coverage T test and a non-target capture region reading length coverage KS test P valuetValue sum P2ksThe value is obtained.
(3) Calculating logs of tumor samples and control samples at the target capture region of the designated genomic region based on the normalized read length coverage of the tumor samples and control samples at the target capture region of the designated genomic region within each window2A (copy Ratio) value;
go toRespectively dividing windows of the tumor sample and the control sample in the target capture region of the designated genome region, ensuring that each window has at least 200 reads, and respectively calculating the log of each window2(copy Ratio) value, and then taking the median of all the window values as the target capture area log2A (copy Ratio) value; log therein2The formula for calculating the (copy Ratio) value is as follows:
in the formulaAndrepresenting the number of reads covering the tumor sample and control sample genome segmentation windows, respectively; n is a radical oftAnd NnThe total number of reads for the tumor sample and the control sample, respectively.
(4) Respectively calculating the B allele frequencies of the tumor sample and the control sample in the target capture region of the designated genome region, and then calculating the difference significance P3 value of the B allele frequencies of the tumor sample and the control sample in the target capture region of the designated genome region;
furthermore, all germline variant sites with mutation frequency of more than or equal to 0.05 in the gnomAD database are taken for standby, the sequencing depth (screened by using the comparison result of the tumor sample) is more than or equal to 10X, and the standby germline variant sites which are stably heterozygous in the control sample are taken as a BAF check site set, so that at least 30 BAF check sites of the tumor sample and the control sample in a target capture area in a specified genome area are ensured, and when the number is less than 30, the specified analysis area is extended towards two sides for supplement to obtain the BAF check sites with the target number.
Counting the number of BAF check sites in the target capture region of the designated genome region, and calculating the B allele frequency of the tumor sample and the control sample respectively.
Further, a T test and a KS (Kolmogorov-Smirnov) test are selected to carry out a target capture region B allele frequency T test and a target capture region B allele frequency KS test respectively, and the difference significance P3 of the B allele frequencies is obtainedTValue sum P3KSValues with significance P value of 0.05 or less are significant differences. If the difference is significant, the tumor sample and the control sample are considered to be different in the area, and copy number variation may exist.
(5) Using logs obtained as described above2(copy Ratio) value, P1 value, P2 value, P3 value and known copy number variation of the designated genome region to construct a machine learning model; or log obtained by using the above2(copy Ratio) values, P1 values, P3 values, and known copy number variation for a given genomic region construct a machine learning model.
In this example, model construction was performed by gradient boosting decision tree (R-package gbm) to synthesize the log of the target capture region for a given genomic region2(copy Ratio) median, B Allele Frequency (BAF) significance P value (P3)TValue sum P3KSValue), significance P value of read length coverage normalized within the target capture region in the designated genomic region of the control and tumor samples (P1)TValue sum P1ksValue) and a significance P value of read length coverage normalized within a specified range of non-target capture regions around a specified genomic region (P2)tValue sum P2ksValue) of the multiple indexes are integrated by multivariate analysis, a machine learning model is constructed, an ROC curve is drawn by R software according to the analysis result of the machine learning model, specifically, referring to FIG. 2, a gradient lifting decision tree is used for constructing an ROC curve graph AUC of a machine learning model analysis structure for determining the somatic cell copy number variation of a specified genome region can reach 97.7%, a determination threshold value can be selected under the conditions of a high AUC value, high sensitivity and high specificity, and in the embodiment, the determination threshold value is 0.7 under the conditions of 92.6% of specificity and 95% of sensitivity.
(6) Calculating log of the tumor sample to be detected and the matched sample according to the steps2(copy Ratio) value, P1 value, P2 value and P3 value, or log2The method comprises the steps of (copy Ratio) value, P1 value and P3 value, constructing a machine learning model, comparing an analysis result analyzed by the machine learning model with a judgment threshold value, and determining whether the tumor sample to be detected has somatic copy number variation in a designated analysis area, namely the sample with the analysis result higher than 0.7 is the sample with the somatic copy number variation.
Example 2
In this example, tumor samples and matched samples with known copy number variation of the specified genomic region are used as validation sets, and the detection process is shown in fig. 3, and the method in example 1 includes:
(1) respectively comparing the sequencing data of the tumor sample to be detected and the reference sample with the reference genome to obtain comparison result files;
tumor tissue sample number 190031660FD (provided in Gein plus) matched against the same patient peripheral blood leukocyte 190031660BD (provided in Gein plus), and whole exon sequencing was performed using the Gene + Seq 2000 sequencing platform, respectively. The sequencing data are aligned with a ginseng reference genome (GRCh37) through BWA-MEM software to obtain an alignment result file. Whether copy number variation occurs in the tumor tissue sample 190031660FD of 4 genes, namely CDKN2A gene (chromosome 9, position information on the genome is 21968174-. It is known that copy number variation of the above 4 genes occurs in 190031660FD of tumor tissue samples.
The target capture area is a probe capture area of a sequencing chip (a Gene detection product of a Gigen plus sequencing chip Panel1021 gene chip (oncogene for tumor drugs, 4 genes belong to the range of genes which can be captured by the chip)), and the non-target capture area is a whole genome area from which a blacklist area (such as an N area, a centromere area and the like) and other areas outside the target capture area (except 1021 genes) are removed.
(2) Based on the comparison result file, drawing windows in a target capture region or a non-target capture region of a designated genome region, respectively calculating the uniform read length coverage of the tumor sample and the control sample in each window, then calculating the difference significance P value of the uniform read length coverage of the tumor sample and the control sample, obtaining a difference significance P1 value corresponding to the target capture region, and obtaining a difference significance P2 value corresponding to the non-target capture region;
further, windows are respectively divided for 4 gene target capture areas of the tumor sample to be detected and the control sample, each window is guaranteed to have at least 200 reads, and the normalized read length coverage in each window of the 4 gene target capture areas is respectively calculated and used as a calculation value for the significance analysis of the control sample and the tumor sample. Wherein the tumor sample has a normalized read length coverage per window ofNormalized read length coverage per window for control samples
WhereinAndrepresenting the number of reads covering the tumor sample and control sample genome segmentation windows, respectively; n is a radical oftAnd NnThe total number of reads for the tumor sample and the control sample, respectively.
Selecting a T test and a KS (Kolmogorov-Smirnov) test to respectively calculate difference significance, wherein specific values are shown in Table 1, and a target capture region read length coverage T test and a target capture region read length coverage KS test are used for obtaining a P value of the difference significance P1TValue sum P1ksThe value is obtained.
Further, dividing windows of the non-target capture area according to the specified length, ensuring that each window has at least 200 reads, respectively calculating the uniform read length coverage in each window, and using the read length coverage as a comparison sampleAnd calculated values for significance analysis of tumor samples. Wherein, there is a range limitation for the non-target capture area, specifically 30 non-target capture area windows before and after the nearest to the designated analysis area. Specifically, the length of the non-target capture region corresponding to each gene is shown in table 1, the extension length of the non-target capture region is selected from T test and KS (Kolmogorov-Smirnov) test to calculate the significance of difference, the specific value is shown in table 1, the reading length coverage of the non-target capture region T test and the reading length coverage of the non-target capture region KS test are performed on the P value, and the significance of difference P2 is obtainedtValue sum P2ksThe value is obtained.
(3) Calculating logs of the target capture regions of the genomic regions of the tumor sample to be detected and the control sample based on the normalized read length coverage (in step (2) of this example) of the tumor sample to be detected and the control sample in each window2A (copy Ratio) value;
further separately calculating log of each window2(copy Ratio) values, then taking the median of all windows, see Table 1 for log of target capture area2A (copy Ratio) value; log therein2The formula for calculating the (copy Ratio) value is as follows:
(4) respectively calculating the B allele frequencies of the tumor sample to be detected and the control sample in the target capture region of the designated genome region, and then calculating the difference significance P3 value of the B allele frequencies of the tumor sample and the control sample in the target capture region of the designated genome region;
further, according to the BAF check site set in step (4) of example 1, the number of B alleles in the capture region where four genes are located is counted, specifically referring to the number of B alleles in the target capture region in Table 1. And respectively calculating the allele frequencies of the B of the tumor sample to be detected and the control sample.
Further, T test and KS (Kolmogorov-Smirnov) test were selected for the target capture region, respectively(iii) field B allele frequency t test and target Capture region B allele frequency ks test, significance P value (P3)TValue sum P3KSValue) 0.05 or less is a significant difference. If the difference is significant, the tumor sample and the control sample are considered to be different in the area, and copy number variation may exist. See table 1 for specific P value values.
(5) Using logs obtained as described above2Constructing a machine learning model of the tumor sample to be detected according to the copy Ratio value, the P1 value, the P2 value, the P3 value and the known copy number variation condition of the designated genome region; or log obtained by using the above2(copy Ratio) value, P1 value, P3 value and known copy number variation condition of the designated genome region are used for constructing a machine learning model of the tumor sample to be detected, the analysis result of the machine learning model analysis is compared with the judgment threshold value obtained in the step (5) in the embodiment 1, and whether the somatic copy number variation of the tumor sample to be detected occurs in the designated analysis region or not is determined, namely the sample with the analysis result higher than 0.7 is used as the sample with the somatic copy number variation;
further, model construction is carried out through a gradient lifting decision tree (R package gbm), and target capture areas log of 4 genes are integrated2(copy Ratio) median, B Allele Frequency (BAF) significance P value (P3)TValue sum P3KSValue), significance P value of read length coverage normalized within the target capture region of 4 genes in the control sample and the tumor sample to be tested (P1)TValue sum P1ksValue) and significance P value of read length coverage normalized within the non-target capture region of the specified range around 4 genes (P2)tValue sum P2ksValue) is integrated by applying multivariate analysis, the model analysis values are all larger than the threshold value of 0.7, and the somatic copy number variation of all four genes can be judged. The analysis result is consistent with the detection and analysis of the whole exon variation of 190031660 sample by applying the cnvkit software and the annotation result by applying the VEP annotation software, thus realizing the high-efficiency and accurate copy number variation detection of the designated gene or interval.
TABLE 1190031660 copy number variation related index statistical table for 4 designated genes in paired samples
Example 3
The present invention provides an apparatus for detecting somatic copy number variation in a specified genomic region, comprising:
the comparison unit is used for comparing the sequencing data of the tumor sample and the control sample with the reference genome respectively to obtain comparison result files; the copy number variation of the appointed genome region of the tumor sample is known, and a matched sample of the tumor sample is used as a control sample;
a difference significance P value calculation unit of the normalized read length coverage, which is used for drawing windows in a target capture area or a non-target capture area of the designated genome area based on the comparison result file, respectively calculating the normalized read length coverage of the tumor sample and the control sample in each window, then calculating the difference significance P value of the normalized read length coverage of the tumor sample and the control sample, obtaining a difference significance P1 value corresponding to the target capture area, and obtaining a difference significance P2 value corresponding to the non-target capture area;
log2(copy Ratio) value calculating unit for calculating log of the tumor sample and the control sample at the target capture region of the designated genomic region based on the normalized read length coverage of the tumor sample and the control sample at the target capture region of the designated genomic region within each window2A (copy Ratio) value;
a calculation unit of the difference significance P3 value of the B allele frequencies, which is used for respectively calculating the B allele frequencies of the tumor sample and the control sample in the target capture region of the designated genome region and then calculating the difference significance P3 value of the B allele frequencies of the tumor sample and the control sample in the target capture region of the designated genome region;
constructing a machine learning model unit usingIn log to be obtained2Constructing a machine learning model by using a (copy Ratio) value, a P1 value, a P2 value, a P3 value and the known copy number variation condition of the designated genome region, and obtaining a judgment threshold value; or log obtained by using the above2(copy Ratio) value, P1 value, P3 value and known copy number variation condition of the designated genome region construct a machine learning model, and obtain a judgment threshold value;
a detection unit for calculating log of the tumor sample to be detected and the matched sample according to the above units2(copy Ratio) value, P1 value and P3 value, and/or P2 value, and then determining whether the tumor sample to be detected has somatic copy number variation in the designated analysis area by constructing machine learning model unit analysis and comparing the analysis result with the judgment threshold value.
Further, the normalized difference significance P value calculation unit for read length coverage further comprises a T test unit and a KS test unit, which are respectively used for calculating the difference significance of the normalized read length coverage and obtaining the difference significance P1 corresponding to the target capture areaTValue sum P1ksValue, corresponding to non-target capture area, resulting in a significance of difference P2TValue sum P2ksThe value is obtained.
Furthermore, the calculation unit for the difference significance of the B allele frequencies P3 value also comprises a T test unit and a KS test unit which are respectively used for calculating the difference significance of the B allele frequencies to obtain the difference significance of the B allele frequencies P3TValue sum P3KSThe value is obtained.
Furthermore, the calculating unit for the difference significance P3 value of the B allele frequency further comprises a BAF check locus set obtaining unit for obtaining the germline variant locus with the mutation frequency of more than or equal to 0.05 from the known database for later use, then screening out the backup germline variant locus with the sequencing depth of more than or equal to 10X and the backup germline variant locus with stable heterozygosity in the control sample as the BAF check locus set based on the comparison result file.
Further, the unit for constructing the machine learning model further includes a gradient boosting decision tree unit.
Further, the method also comprises the following steps:
and the ROC curve drawing unit is used for drawing an ROC curve by using the machine learning model analysis result and selecting the analysis result as a threshold value under the conditions of high AUC value, high sensitivity and high specificity.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications therefrom are within the scope of the invention.
Claims (16)
1. A method for detecting somatic copy number variation for a given genomic region, comprising:
obtaining a tumor sample with known copy number variation condition of a designated genome region, and taking a matched sample as a control sample; comparing the sequencing data of the tumor sample and the reference sample with the reference genome respectively to obtain comparison result files;
based on the comparison result file, drawing windows in a target capture region or a non-target capture region of a designated genome region, respectively calculating the uniform read length coverage of the tumor sample and the control sample in each window, then calculating the difference significance P value of the uniform read length coverage of the tumor sample and the control sample, obtaining a difference significance P1 value corresponding to the target capture region, and obtaining a difference significance P2 value corresponding to the non-target capture region;
calculating logs of the tumor sample and the control sample at the target capture region of the designated genomic region based on the normalized read length coverage of the tumor sample and the control sample within each window of the target capture region of the designated genomic region2A (copy Ratio) value;
respectively calculating the B allele frequencies of the tumor sample and the control sample in the target capture region of the designated genome region, and then calculating the difference significance P3 value of the B allele frequencies of the tumor sample and the control sample in the target capture region of the designated genome region;
using logs obtained as described above2Constructing a machine learning model by using a (copy Ratio) value, a P1 value, a P2 value, a P3 value and the known copy number variation condition of the designated genome region, and obtaining a judgment threshold value; or log obtained by using the above2(copy Ratio) value, P1 value, P3 value and known copy number variation condition of the designated genome region construct a machine learning model, and obtain a judgment threshold value;
calculating log of the tumor sample to be detected and the matched sample according to the steps2(copy Ratio) value, P1 value, P2 value and P3 value, or log2And (copy Ratio) value, P1 value and P3 value, analyzing by a machine learning model, comparing the analysis result with the judgment threshold value, and determining whether the tumor sample to be detected has somatic copy number variation in the designated analysis area.
2. The method of claim 1, wherein in the step of calculating the P-value of the significance of difference of the normalized read length coverage, the T test and the KS test are selected to calculate the significance of difference, and the P1 of the significance of difference is obtained corresponding to the target capture regionTValue sum P1ksValue, corresponding to non-target capture area, resulting in a significance of difference P2tValue sum P2ksThe value is obtained.
3. The method of claim 1 or 2, wherein when the target capture region of the specified genomic region is windowed, there are at least 200 reads per window; or when the non-target capture region of a given genomic region is windowed, there are at least 200 reads per window.
4. The method for detecting the somatic copy number variation of a designated genomic region as claimed in claim 1 or 2, wherein in the step of calculating the difference significance P3 value of the B allele frequency, the T test and the KS test are respectively selected to calculate the difference significance, and the difference significance P3 of the B allele frequency is obtainedTValue sum P3KSThe value is obtained.
5. The method for detecting somatic copy number variation of a specified genomic region according to claim 1 or 2, characterized in that an germline variation site with a mutation frequency of 0.05 or more is obtained from a known database for later use, and then based on the comparison result file, a spare germline variation site with a sequencing depth of 10X or more and heterozygous in a control sample is selected as a BAF check site set.
6. The method of claim 4 or 5, wherein the difference between the B allele frequencies is analyzed for significance, the BAF loci of the tumor sample and the control sample at the target capture region in the specified genomic region are at least 30, and when the number is less than 30, the BAF loci are supplemented by extending the specified genomic region to both sides to obtain the target number of BAF loci.
7. The method of claim 1 or 2, wherein the variation in somatic copy number at log is determined by the method of claim 1 or claim 22(copy Ratio) value step, dividing windows of the target capture region of the designated genome region, ensuring that each window has at least 200 reads, and respectively calculating log of each window2(copy Ratio) value, then log for all windows2The (copy Ratio) value takes the median.
8. The method for detecting somatic copy number variation of a specified genomic region according to claim 1 or 2, wherein the machine learning model is selected from a gradient boosting decision tree, a support vector machine, na iotave bayes, an Adaboost algorithm, logistic regression, or random forests.
9. The method of claim 1 or 2, wherein the analysis result of the machine learning model is used to plot a ROC curve, and the analysis result is selected as the determination threshold under the conditions of high AUC value, high sensitivity and high specificity.
10. The method for detecting somatic copy number variation of a specified genomic region as claimed in claim 1 or 2 wherein the sequencing data is whole genome sequencing data, whole exon sequencing data or targeted capture sequencing data.
11. An apparatus for detecting somatic copy number variation for a given genomic region, comprising:
the comparison unit is used for comparing the sequencing data of the tumor sample and the control sample with the reference genome respectively to obtain comparison result files; the copy number variation of the appointed genome region of the tumor sample is known, and a matched sample of the tumor sample is used as a control sample;
a difference significance P value calculation unit of the normalized read length coverage, which is used for drawing windows in a target capture area or a non-target capture area of the designated genome area based on the comparison result file, respectively calculating the normalized read length coverage of the tumor sample and the control sample in each window, then calculating the difference significance P value of the normalized read length coverage of the tumor sample and the control sample, obtaining a difference significance P1 value corresponding to the target capture area, and obtaining a difference significance P2 value corresponding to the non-target capture area;
log2(copy Ratio) value calculating unit for calculating log of the tumor sample and the control sample at the target capture region of the designated genomic region based on the normalized read length coverage of the tumor sample and the control sample at the target capture region of the designated genomic region within each window2A (copy Ratio) value;
a calculation unit of the difference significance P3 value of the B allele frequencies, which is used for respectively calculating the B allele frequencies of the tumor sample and the control sample in the target capture region of the designated genome region and then calculating the difference significance P3 value of the B allele frequencies of the tumor sample and the control sample in the target capture region of the designated genome region;
constructing a machine learning model unit for obtaining logs2(copy Ratio), P1, P2, P3 and the copy number variation condition of the known designated genome region construct a machine learning model, and obtain a judgment threshold value; or log obtained by using the above2(copy Ratio) value, P1 value, P3 value and known copy number variation condition of the designated genome region construct a machine learning model, and obtain a judgment threshold value;
a detection unit for calculating log of the tumor sample to be detected and the matched sample according to the above units2(copy Ratio) value, P1 value, P2 value and P3 value, or log2And (copy Ratio), a P1 value and a P3 value, and then determining whether the tumor sample to be detected has somatic copy number variation in a designated analysis area or not by constructing a machine learning model unit for analysis and comparing the obtained analysis result with the judgment threshold value.
12. The apparatus according to claim 11, wherein the normalized difference significance P-value calculating unit for read length coverage further comprises a T-test unit and a KS-test unit for calculating the difference significance of the normalized read length coverage, respectively, to obtain the difference significance P1 corresponding to the target capture regionTValue sum P1ksValue, corresponding to non-target capture area, resulting in a significance of difference P2TValue sum P2ksThe value is obtained.
13. The apparatus for detecting the somatic copy number variation of a specified genomic region as claimed in claim 11 or 12, further comprising a T test unit and a KS test unit for calculating the significance of the difference in B allele frequency, respectively, to obtain P3 the significance of the difference in B allele frequency, in the calculation unit of the P3 value for the significance of the difference in B allele frequencyTValue sum P3KSThe value is obtained.
14. The apparatus for detecting somatic copy number variation of a specified genomic region according to claim 11 or 12, wherein the calculating unit for calculating the P3 value representing the significance of the difference in B allele frequency further comprises a BAF check site set obtaining unit for obtaining a germline variation site with a mutation frequency of 0.05 or more from a known database for use, and then selecting a candidate germline variation site with a sequencing depth of 10X or more and a heterozygous candidate germline variation site in the control sample as the BAF check site set based on the comparison result file.
15. The apparatus for detecting somatic copy number variation of a specified genomic region according to claim 11 or 12, wherein the means for constructing a machine learning model further comprises a gradient boosting decision tree unit.
16. The apparatus for detecting somatic copy number variation for a specified genomic region as claimed in claim 11 or 12, further comprising:
and the ROC curve drawing unit is used for drawing an ROC curve by using the machine learning model analysis result and selecting the analysis result as a threshold value under the conditions of high AUC value, high sensitivity and high specificity.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010880479.8A CN111968701B (en) | 2020-08-27 | 2020-08-27 | Method and device for detecting somatic copy number variation of designated genome region |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010880479.8A CN111968701B (en) | 2020-08-27 | 2020-08-27 | Method and device for detecting somatic copy number variation of designated genome region |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111968701A true CN111968701A (en) | 2020-11-20 |
CN111968701B CN111968701B (en) | 2022-10-04 |
Family
ID=73399693
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010880479.8A Active CN111968701B (en) | 2020-08-27 | 2020-08-27 | Method and device for detecting somatic copy number variation of designated genome region |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111968701B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112768000A (en) * | 2021-01-25 | 2021-05-07 | 深圳吉因加医学检验实验室 | Method and device for predicting MET gene copy number change type |
CN112980961A (en) * | 2021-05-11 | 2021-06-18 | 上海思路迪医学检验所有限公司 | Method and device for jointly detecting SNV (single nucleotide polymorphism), CNV (CNV) and FUSION (FUSION mutation) |
CN113539355A (en) * | 2021-07-15 | 2021-10-22 | 云康信息科技(上海)有限公司 | Tissue-specific source for predicting cfDNA (deoxyribonucleic acid), related disease probability evaluation system and application |
CN113889187A (en) * | 2021-09-24 | 2022-01-04 | 上海仁东医学检验所有限公司 | Single-sample allele copy number variation detection method, probe set and kit |
CN115148285A (en) * | 2022-06-09 | 2022-10-04 | 北京齐碳科技有限公司 | Information screening method, information screening device, electronic equipment, medium and program product |
WO2023030233A1 (en) * | 2021-08-30 | 2023-03-09 | 广州燃石医学检验所有限公司 | Copy number variation detection method and application thereof |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013107048A1 (en) * | 2012-01-20 | 2013-07-25 | 深圳华大基因健康科技有限公司 | Method and system for determining whether copy number variation exists in sample genome, and computer readable medium |
CN106372459A (en) * | 2016-08-30 | 2017-02-01 | 天津诺禾致源生物信息科技有限公司 | Method and device for detecting copy number variation based on amplicon next generation sequencing |
CN107810502A (en) * | 2015-05-18 | 2018-03-16 | 瑞泽恩制药公司 | For copying the method and system of number variation detection |
US20180148765A1 (en) * | 2012-04-05 | 2018-05-31 | Bgi Diagnosis Co., Ltd. | Method and system for determining copy number variation |
CN108410970A (en) * | 2018-03-12 | 2018-08-17 | 博奥生物集团有限公司 | A kind of detection method and kit of unicellular genome copies number variation |
CN108664766A (en) * | 2018-05-18 | 2018-10-16 | 广州金域医学检验中心有限公司 | Copy analysis method, analytical equipment, equipment and the storage medium of number variation |
CN108959853A (en) * | 2018-05-18 | 2018-12-07 | 广州金域医学检验中心有限公司 | A kind of analysis method, analytical equipment, equipment and storage medium copying number variation |
US20190371428A1 (en) * | 2017-01-20 | 2019-12-05 | Sequenom, Inc. | Methods for non-invasive assessment of copy number alterations |
CN110808081A (en) * | 2019-09-29 | 2020-02-18 | 深圳吉因加医学检验实验室 | Model construction method for identifying tumor purity sample and application |
CN110808084A (en) * | 2019-09-19 | 2020-02-18 | 西安电子科技大学 | Copy number variation detection method based on single-sample second-generation sequencing data |
-
2020
- 2020-08-27 CN CN202010880479.8A patent/CN111968701B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013107048A1 (en) * | 2012-01-20 | 2013-07-25 | 深圳华大基因健康科技有限公司 | Method and system for determining whether copy number variation exists in sample genome, and computer readable medium |
US20180148765A1 (en) * | 2012-04-05 | 2018-05-31 | Bgi Diagnosis Co., Ltd. | Method and system for determining copy number variation |
CN107810502A (en) * | 2015-05-18 | 2018-03-16 | 瑞泽恩制药公司 | For copying the method and system of number variation detection |
CN106372459A (en) * | 2016-08-30 | 2017-02-01 | 天津诺禾致源生物信息科技有限公司 | Method and device for detecting copy number variation based on amplicon next generation sequencing |
US20190371428A1 (en) * | 2017-01-20 | 2019-12-05 | Sequenom, Inc. | Methods for non-invasive assessment of copy number alterations |
CN108410970A (en) * | 2018-03-12 | 2018-08-17 | 博奥生物集团有限公司 | A kind of detection method and kit of unicellular genome copies number variation |
CN108664766A (en) * | 2018-05-18 | 2018-10-16 | 广州金域医学检验中心有限公司 | Copy analysis method, analytical equipment, equipment and the storage medium of number variation |
CN108959853A (en) * | 2018-05-18 | 2018-12-07 | 广州金域医学检验中心有限公司 | A kind of analysis method, analytical equipment, equipment and storage medium copying number variation |
CN110808084A (en) * | 2019-09-19 | 2020-02-18 | 西安电子科技大学 | Copy number variation detection method based on single-sample second-generation sequencing data |
CN110808081A (en) * | 2019-09-29 | 2020-02-18 | 深圳吉因加医学检验实验室 | Model construction method for identifying tumor purity sample and application |
Non-Patent Citations (2)
Title |
---|
DAVID CURTIS: "A weighted burden test using logistic regression for integrated analysis of sequence variants, copy number variants and polygenic risk score", 《EUROPEAN JOURNAL OF HUMAN GENETICS》 * |
秦谦 等: "基于高通量测序技术的拷贝数变异筛选分析流程的建立及应用", 《中国循证儿科杂志》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112768000A (en) * | 2021-01-25 | 2021-05-07 | 深圳吉因加医学检验实验室 | Method and device for predicting MET gene copy number change type |
CN112980961A (en) * | 2021-05-11 | 2021-06-18 | 上海思路迪医学检验所有限公司 | Method and device for jointly detecting SNV (single nucleotide polymorphism), CNV (CNV) and FUSION (FUSION mutation) |
CN112980961B (en) * | 2021-05-11 | 2021-08-27 | 上海思路迪医学检验所有限公司 | Method and device for jointly detecting SNV (single nucleotide polymorphism), CNV (CNV) and FUSION (FUSION mutation) |
CN113539355A (en) * | 2021-07-15 | 2021-10-22 | 云康信息科技(上海)有限公司 | Tissue-specific source for predicting cfDNA (deoxyribonucleic acid), related disease probability evaluation system and application |
CN113539355B (en) * | 2021-07-15 | 2022-11-25 | 云康信息科技(上海)有限公司 | Tissue-specific source for predicting cfDNA (deoxyribonucleic acid), related disease probability evaluation system and application |
WO2023030233A1 (en) * | 2021-08-30 | 2023-03-09 | 广州燃石医学检验所有限公司 | Copy number variation detection method and application thereof |
CN113889187A (en) * | 2021-09-24 | 2022-01-04 | 上海仁东医学检验所有限公司 | Single-sample allele copy number variation detection method, probe set and kit |
CN115148285A (en) * | 2022-06-09 | 2022-10-04 | 北京齐碳科技有限公司 | Information screening method, information screening device, electronic equipment, medium and program product |
CN115148285B (en) * | 2022-06-09 | 2023-08-22 | 北京齐碳科技有限公司 | Information screening method, device, electronic equipment, medium and program product |
Also Published As
Publication number | Publication date |
---|---|
CN111968701B (en) | 2022-10-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111968701B (en) | Method and device for detecting somatic copy number variation of designated genome region | |
Hause et al. | Classification and characterization of microsatellite instability across 18 cancer types | |
Liu et al. | DNA methylation-calling tools for Oxford Nanopore sequencing: a survey and human epigenome-wide evaluation | |
Hansen et al. | Shimmer: detection of genetic alterations in tumors using next-generation sequence data | |
JP2022521492A (en) | An integrated machine learning framework for estimating homologous recombination defects | |
CN106909806A (en) | The method and apparatus of fixed point detection variation | |
Yang et al. | Target SSR-Seq: a novel SSR genotyping technology associate with perfect SSRs in genetic analysis of cucumber varieties | |
CN111304303B (en) | Method for predicting microsatellite instability and application thereof | |
US20230114581A1 (en) | Systems and methods for predicting homologous recombination deficiency status of a specimen | |
KR20190085667A (en) | Circulating Tumor DNA Detection Method Using Sample comprising Cell free DNA and Uses thereof | |
CN110016497B (en) | Method for detecting copy number variation of tumor single cell genome | |
Wood et al. | Recommendations for accurate resolution of gene and isoform allele-specific expression in RNA-Seq data | |
CN109461473B (en) | Method and device for acquiring concentration of free DNA of fetus | |
Zou et al. | eTumorType, An algorithm of discriminating cancer types for circulating tumor cells or cell-free DNAs in blood | |
WO2018064547A1 (en) | Methods for classifying somatic variations | |
CN111951893B (en) | Method for constructing tumor mutation load TMB panel | |
CN115064209A (en) | Malignant cell identification method and system | |
KR20220086603A (en) | Cancer classification using tissue-of-origin thresholding | |
CN109390034B (en) | Method for detecting normal tissue content and tumor copy number in tumor tissue | |
CN114703284A (en) | Blood free DNA methylation quantitative detection method and application thereof | |
US20210310050A1 (en) | Identification of global sequence features in whole genome sequence data from circulating nucleic acid | |
Wilmott et al. | Tumour procurement, DNA extraction, coverage analysis and optimisation of mutation-detection algorithms for human melanoma genomes | |
Ching et al. | Pan-cancer analysis of expressed somatic nucleotide variants in long intergenic non-coding RNA | |
Meng et al. | Identification and Validation of a Novel Prognostic Gene Model for Colorectal Cancer | |
US20220399079A1 (en) | Method and system for combined dna-rna sequencing analysis to enhance variant-calling performance and characterize variant expression status |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |