CN106845154B - A device for FFPE sample copy number variation detects - Google Patents

A device for FFPE sample copy number variation detects Download PDF

Info

Publication number
CN106845154B
CN106845154B CN201710067086.3A CN201710067086A CN106845154B CN 106845154 B CN106845154 B CN 106845154B CN 201710067086 A CN201710067086 A CN 201710067086A CN 106845154 B CN106845154 B CN 106845154B
Authority
CN
China
Prior art keywords
module
sample
sequencing
window
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710067086.3A
Other languages
Chinese (zh)
Other versions
CN106845154A (en
Inventor
荆瑞琳
张萌萌
董永芳
王旺
李雪峰
玄兆伶
李大为
梁峻彬
陈重建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Annoroad Genetic Technology (Beijing) Co., Ltd.
Annuo uni-data (Yiwu) Medical Inspection Co. Ltd.
Zhejiang Annuo uni-data Biotechnology Co. Ltd.
Original Assignee
Annoroad Gene Technology Beijing Co ltd
Annoroad Yiwu Medical Inspection Co ltd
Zhejiang Annoroad Bio Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Annoroad Gene Technology Beijing Co ltd, Annoroad Yiwu Medical Inspection Co ltd, Zhejiang Annoroad Bio Technology Co ltd filed Critical Annoroad Gene Technology Beijing Co ltd
Publication of CN106845154A publication Critical patent/CN106845154A/en
Application granted granted Critical
Publication of CN106845154B publication Critical patent/CN106845154B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Abstract

The invention relates to a device for detecting copy number variation of an FFPE sample, which has high detection sensitivity. The FFPE sample copy number variation detection device comprises a sequencing data acquisition module, a sequence comparison module, a preliminary data processing module, a normalization module, a background library screening module, a data fluctuation elimination module, a GC correction module and an output module.

Description

A device for FFPE sample copy number variation detects
Technical Field
The invention belongs to the field of molecular biology detection, and particularly relates to a device and a method for detecting copy number variation of an FFPE sample.
Background
Tissue specimens prepared by Formalin-fixed Paraffin-embedded (FFPE) methods are called Formalin-fixed Paraffin-embedded tissue samples, FFPE samples for short. The FFPE sample can be preserved for a long time, and particularly, a large number of tumor tissue sections are preserved in the form of the FFPE sample. The FFPE sample is commonly used for clinical pathological examination, tumor gene detection and medical science research, and provides valuable resources for aspects of disease mechanism elucidation, therapeutic target discovery, prognosis indication and the like.
Copy Number Variation (CNV) of genes is a clinically important structural Variation, and is related to prognosis of various tumors and sensitivity of targeted drugs. The reliable CNV detection result can provide important basis for clinical medication, disease condition evaluation and the like. The CNV detection technology used in clinical practice is mostly based on PCR or immunohistochemical experimental means (e.g. FISH, IHC, etc.). The method can only cover one gene in a single detection, and the detection result has lower sensitivity.
CNV detection based on a Next-Generation Sequencing (NGS) platform can provide CNV detection results of a plurality of genes at one time on the premise of ensuring detection performance. Most of the traditional NGS platform CNV detection technologies are researched and developed based on a whole genome sequencing technology platform, and with the continuous progress of the NGS technology, the high-depth sequencing technology based on target region capture gradually shows advantages in the application scene of clinical detection.
However, because there is a fundamental difference between whole genome sequencing data and target region capture sequencing data, the current traditional CNV detection method of the NGS platform is not suitable for target region capture sequencing data, and is difficult to ensure the accuracy of CNV detection, and the detection sensitivity needs to be improved. This problem is particularly pronounced in FFPE samples. The DNA fragmentation of the FFPE sample is serious, the influence is generated on the processes of target gene DNA capture, NGS sequencing and the like, and the key technical indexes such as the effective depth of a target area are finally influenced. Thus, the availability of low-depth sequencing data generated by low-quality FFPE samples becomes a major technical challenge.
Disclosure of Invention
In view of the above-described drawbacks of the prior art, an object of the present invention is to provide a detection apparatus and a detection method with higher detection sensitivity for CNV in an FFPE sample.
The inventors of the present invention have made intensive studies to solve the above-mentioned technical problems, and as a result, found that: in the CNV detection method of the FFPE sample, whether to perform reasonable noise reduction processing on the data or not and whether to use an appropriate background library directly affect the detection result, and particularly, such an effect is significant in the capture sequencing. The sensitivity of the FFPE sample CNV detection can be improved by more reasonable and comprehensive noise reduction treatment and application of a dynamic background library, thereby completing the invention.
Namely, the present invention comprises:
an apparatus for detecting copy number variation (which may occur in a genetic region or in a non-genetic region) in an FFPE sample, comprising:
the device comprises a sequencing data acquisition module, a sequencing data acquisition module and a sequencing data acquisition module, wherein the sequencing data acquisition module is used for acquiring capture sequencing data from an FFPE sample to be detected and sequencing data from a healthy population sample, and the healthy population sample is a plurality of healthy people (healthy normal people) samples;
a sequence comparison module, connected to the sequencing data acquisition module, for comparing the sequencing data acquired by the sequencing data acquisition module with a reference genome sequence to obtain a comparison result (including information such as a chromosome where each short sequence that can be compared with the reference genome is located, coordinates, matching condition of the short sequence and the reference genome), and calculating a depth value of each site (referring to each site on the genome, but depth values of some sites in captured sequencing may be 0) according to the comparison result;
the early-stage data processing module is connected with the sequence comparison module and is used for dividing a target region (100 k-100M, a whole genome or a key focus region) into windows with certain length (50-1000 bp) and overlapping (10-70%), removing depth extreme values (maximum value and minimum value) of sites in the windows, calculating a depth mean value or a median value, and calculating the GC content of a reference genome sequence in the windows;
the normalization module is connected with the early data processing module and used for normalizing the depth mean value or the depth median value in each window obtained by the early data processing module and calculating to obtain the Z value in each window of the FFPE sample to be detected and the healthy population sample;
a background library screening module connected with the normalization module and used for screening n healthy person samples (each healthy person sample corresponds to one healthy person) according to the Z values of the FFPE sample to be detected and the healthy population sample to obtain a background library sample set of the n healthy person samples, and then constructing a matrix X with m rows and n columns by using the Z values of the n healthy person samples in m windowsm×n
The data fluctuation elimination module is connected with the background library screening module and is used for eliminating inherent data fluctuation caused by capture sequencing;
the GC correction module is connected with the data fluctuation elimination module and is used for carrying out GC correction according to GC content in each window;
and an output module, connected to the GC correction module, for outputting CNV detection results (including, for example, a graph showing CNV detection results, determination results of negative/positive CNV variation, etc.).
The sequencing data acquisition module of the device for detecting the copy number variation of the FFPE sample acquires sequencing data obtained by sequencing DNA in the FFPE sample to be detected by adopting a second-generation sequencing method. The mainstream platform of the second-generation Sequencing generally adopts Sequencing By Synthesis (SBS) technology to perform nucleic acid Sequencing. Before sequencing, a nucleic acid (DNA or RNA) sample needs to be subjected to sequencing library construction, and the basic flow is as follows: firstly, repairing the tail end of a fragment of fragmented DNA, then adding an ' A ' base at the 3' end of the repaired fragment, then connecting the DNA fragment with a DNA adaptor (Adapter) containing a sequencing primer binding site, and finally amplifying by PCR to complete the construction of a sequencing library. There is no particular limitation on the specific secondary sequencing method, and any secondary sequencing method known to those skilled in the art may be employed.
Preferably, the sequencing data is sequencing data obtained using a capture sequencing method;
the target gene for the capture sequencing may vary for different target diseases. The target disease may be, for example, a solid cancer (e.g., gastric cancer, breast cancer, colorectal cancer, lung cancer, etc.).
For example, in the case where the target disease is breast cancer, the target gene may be, for example, an EGFR gene, ERBB2 gene, FGFR1 gene, KIT gene, PIK3CA gene, or/and PTEN gene; in case the target disease is colorectal cancer, the target gene may be, for example, EGFR gene, ERBB2 gene, FGFR2 gene, KRAS gene, MET gene, PTEN gene; in the case where the target disease is gastric cancer, the target gene may be, for example, an EGFR gene, an ERBB2 gene, an FGFR1 gene, an FGFR2 gene, a KRAS gene, a MET gene, a PIK3CA gene, or/and a PTEN gene; in the case where the target disease is lung cancer, the target gene may be, for example, ALK gene, BRAF gene, EGFR gene, ERBB2 gene, FGFR1 gene, KRAS gene, MET gene, PIK3CA, or/and PTEN.
Preferably, the early-stage data processing module divides the window by a sliding window method.
Preferably, the normalization module calculates the Z value in each window of the sample to be detected according to the following formula (1), where Zi in the formula (1) represents the Z value of the ith window,
Zi=trimScale(Zi,Zi)……(1)。
preferably, formula (2) is defined:
definition of
Figure BDA0001221588410000041
Wherein chr represents a chromosome, St represents a biological sample to be examined, and SNRepresenting a sample of healthy people;
the background library screening module screens out n healthy person samples with the minimum d value according to the Z values of the FFPE sample to be detected and the healthy person sample to obtain a screened background library sample set S1,S2,S3,…,Sn(N and N are both natural numbers and N < N).
Preferably, the data fluctuation elimination module is used for solving the background library matrix Xm×nSingular value decomposition is carried out to obtain an m-row r-column factor matrix Um×rR is the number of factors, then k factors with the largest contribution rate (namely k factors with the top rank, k is generally 4-10) are taken for LOESS regression, and residual error Z is obtainedp
Preferably, the GC correction module pairs Z according to GC content within each windowpPerforming GC correction based on LOESS regression to obtain residual error Zpg
Preferably, the FFPE sample copy number variation detection apparatus further comprises:
and the data quality detection module is connected with the sequencing module and the sequence comparison module and is used for performing quality detection on the sequencing data obtained by the sequencing module. Quality control includes, but is not limited to, removing short sequences with low quality, removing short sequences with high N content, removing short sequences related to Adapter, and finally counting quality control indexes related to each item.
In addition, the present invention further comprises:
a method for detecting copy number variation (which may occur in a genetic region or in a non-genetic region) in an FFPE sample, comprising:
a sequencing data acquisition step, wherein the sequencing data acquisition step is used for acquiring the captured sequencing data from the FFPE sample to be detected and the sequencing data from a healthy population sample, wherein the healthy population sample is a plurality of healthy people samples;
a sequence comparison step, comparing the sequencing data obtained in the sequencing data acquisition step with a reference genome sequence to obtain a comparison result (including, for example, information such as a chromosome where each short sequence that can be compared with the reference genome is located, coordinates, and matching conditions of the short sequences and the reference genome), and calculating a depth value of each site (referring to each site on the genome, but depth values of some sites in the captured sequencing may be 0) according to the comparison result;
the method comprises the steps of early data processing, namely dividing a target region (100 k-100M, a whole genome or an important attention region) into windows with certain length (50-1000 bp) and overlapping (10-70%), removing depth extreme values (maximum values and minimum values) of sites in the windows, calculating a depth mean value or a depth median value, and calculating the GC content of a reference genome sequence in the windows;
normalizing, namely normalizing the depth mean value or the depth median value in each window obtained in the previous data processing step, and calculating to obtain the Z value in each window of the FFPE sample to be detected and the healthy population sample;
a background library screening step, namely screening n healthy person samples (healthy person samples, wherein each background library sample corresponds to a healthy person) according to Z values of the FFPE sample to be detected and the healthy crowd sample to obtain a background library sample set, and then constructing a matrix X of m rows and n columns by using the Z values of the n healthy person samples in m windowsm×n
A data fluctuation elimination step, which is to eliminate inherent data fluctuation caused by capture sequencing;
a GC correction step, wherein GC correction is carried out according to the GC content in each window; and
and an output step of outputting the CNV detection result (including, for example, a graph showing the CNV detection result, a determination result of negative/positive CNV variation, and the like).
The sequencing data acquisition step of the method for detecting the copy number variation of the FFPE sample acquires sequencing data obtained by sequencing DNA in the FFPE sample to be detected by adopting a second-generation sequencing method. The mainstream platform of the second-generation Sequencing generally adopts Sequencing By Synthesis (SBS) technology to perform nucleic acid Sequencing. Before sequencing, a nucleic acid (DNA or RNA) sample needs to be subjected to sequencing library construction, and the basic flow is as follows: firstly, repairing the tail end of a fragment of fragmented DNA, then adding an ' A ' base at the 3' end of the repaired fragment, then connecting the DNA fragment with a DNA adaptor (Adapter) containing a sequencing primer binding site, and finally amplifying by PCR to complete the construction of a sequencing library. There is no particular limitation on the specific secondary sequencing method, and any secondary sequencing method known to those skilled in the art may be employed.
Preferably, the sequencing data is sequencing data obtained using a capture sequencing method;
the target gene for the capture sequencing may vary for different target diseases. The target disease may be, for example, a solid cancer (e.g., gastric cancer, breast cancer, colorectal cancer, lung cancer, etc.).
For example, in the case where the target disease is breast cancer, the target gene may be, for example, an EGFR gene, ERBB2 gene, FGFR1 gene, KIT gene, PIK3CA gene, or/and PTEN gene; in case the target disease is colorectal cancer, the target gene may be, for example, EGFR gene, ERBB2 gene, FGFR2 gene, KRAS gene, MET gene, PTEN gene; in the case where the target disease is gastric cancer, the target gene may be, for example, an EGFR gene, an ERBB2 gene, an FGFR1 gene, an FGFR2 gene, a KRAS gene, a MET gene, a PIK3CA gene, or/and a PTEN gene; in the case where the target disease is lung cancer, the target gene may be, for example, ALK gene, BRAF gene, EGFR gene, ERBB2 gene, FGFR1 gene, KRAS gene, MET gene, PIK3CA, or/and PTEN.
Preferably, the preliminary data processing step uses a sliding window method to divide the window.
Preferably, the normalization step calculates the Z value in each window of the sample to be detected according to the following formula (1), wherein Zi in the formula (1) represents the Z value of the ith window,
Zi=trimScale(Zi,Zi)……(1)。
preferably, formula (2) is defined:
definition of
Figure BDA0001221588410000071
Wherein chr represents a chromosome, STRepresenting the FFPE sample to be examined, SNRepresenting a sample of healthy people;
the background library screening step screens n healthy person samples with the minimum d value according to the Z values of the FFPE sample to be detected and the healthy person sample to obtain a screened background library sample set S1,S2,S3,…,Sn(N, N are natural numbers and N is less than N).
Preferably, the data fluctuation elimination step is performed on the background library matrix Xm×nSingular value decomposition is carried out to obtain an m-row r-column factor matrix Um×rR is the number of factors, then k factors with the largest contribution rate (namely k factors with the top rank, k is generally 4-10) are taken for LOESS regression, and residual error Z is obtainedp
Preferably, the GC correction step is performed on Z according to GC content in each windowpPerforming GC correction based on LOESS regression to obtain residual error Zpg
Preferably, the copy number variation detection method further comprises:
and a data quality detection step, wherein the sequencing data obtained in the sequencing step are subjected to quality detection. Quality control includes, but is not limited to, removing short sequences with low quality, removing short sequences with high N content, removing short sequences related to Adapter, and finally counting quality control indexes related to each item.
In the above, reference is made to the above-mentioned preferred embodiments of the respective steps.
According to the present invention, a detection apparatus and a detection method with higher detection sensitivity for the FFPE sample CNV are provided.
Drawings
FIG. 1 is a schematic diagram of an apparatus for detecting copy number variation of an FFPE sample according to the present invention.
FIG. 2 is a graph showing the results of CNV detection of multiple genes of breast cancer in example 1.
Detailed description of the invention
Technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art, and in case of conflict, the definitions in this specification shall control.
Definition of
Reference genome: a complete set of haploid sequences carried by a cell or organism, including the complete set of genes and spacer sequences.
And (3) comparison: generally refers to sequence alignment, which refers to the process of aligning two or more sequences according to a certain rule in order to determine their similarity or homology.
Depth value: for a certain site on the genome, according to the comparison result, the number of short sequences covering the site is the depth value of the site.
Window (sliding window): generally refers to a fixed length region on the genome.
Background library: a sample library is composed of a plurality of samples (generally ≧ 20) of healthy persons.
And (3) capturing and sequencing: the process of grabbing DNA fragments for a specific region (region of interest) on the genome through a pre-designed probe and finally performing NGS sequencing on the grabbed DNA fragments.
NGS (high throughput sequencing): high-throughput sequencing, also known as "Next-generation" sequencing technology, is marked by the ability to sequence hundreds of thousands to millions of DNA molecules in parallel at one time, and by the short read length.
Normalized (Z value):
Figure BDA0001221588410000081
trimScale (w, v): defining w as a certain value to be normalized and v as a certain data set
a. Removing a certain percentage of the data above and below v to obtain
Figure BDA0001221588410000082
b. Computing
Figure BDA0001221588410000083
Mean value μ and standard deviation σ of
c. Is calculated to obtain
Figure BDA0001221588410000084
As a final result
SVD (singular value decomposition): SVD is an important matrix decomposition in linear algebra, and is a generalization of unitary diagonalization of a normal matrix in matrix analysis. The method has important application in the fields of signal processing, statistics and the like. The effect is to map the data set into a low dimensional space. The eigenvalues of the data set (characterized by singular values in SVD) are arranged according to importance, the dimension reduction process is a process of discarding unimportant eigenvectors, and the space formed by the remaining eigenvectors is the space after dimension reduction.
Examples
The present invention will be described in more detail with reference to examples. It should be understood that the embodiments described herein are intended to illustrate, but not limit the invention.
Example 1
The CNV condition of the FFPE sample of the tissue of a female breast cancer patient is detected by adopting the device for detecting the copy number variation of the FFPE sample.
1.1 extraction of DNA from FFPE samples
The FFPE sample DNA was obtained by performing extraction procedures using the GeneRead DNA FFPE Kit (QIAGEN Co.) according to the manual.
1.2 sample disruption
And (3) using a Biorupter interrupt instrument to perform interruption, setting interrupt conditions for 30 cycles, and setting 30s ON/30s OFF to interrupt the FFPE sample DNA into fragments of about 200bp to obtain the fragmented DNA fragments.
1.3 End Repair (End Repair)
(1) The required reagents were removed from the kit stored at-20 ℃ in advance and the individual sample amounts are shown in Table 1.
TABLE 1
Figure BDA0001221588410000091
(2) End repair reaction: after the addition of the DNA sample, the 1.5mL centrifuge tube was placed in a Thermomixer and incubated at 20 ℃ for 30 minutes. After completion of the reaction, DNA in the purification reaction system was collected using 1.8X nucleic acid purification beads and dissolved in 32. mu.LEB.
1.4 adding A (A-Tailing) at the end
(1) The required reagents were removed from the kit stored at-20 ℃ in advance, and the single sample formulation amounts are shown in table 2:
TABLE 2
Figure BDA0001221588410000101
(2) And (3) adding A at the tail end for reaction: after adding 32. mu.L of the DNA recovered by the previous purification step, a 1.5mL centrifuge tube was placed in a Thermomixer and incubated at 37 ℃ for 30 minutes. DNA in the purification reaction system was recovered using 1.8X nucleic acid purification magnetic beads and dissolved in 18. mu.L EB.
1.5 connection of the Joint (Adapter Ligation)
(1) The required reagents were removed from the kit stored at-20 ℃ in advance, and the single sample formulation amounts are shown in table 3:
TABLE 3
Figure BDA0001221588410000102
(2) And (3) connecting the joint: after adding 18. mu.L of the recovered DNA purified in the previous step, the sample tube was incubated in a Thermomixer at 20 ℃ for 15 minutes. DNA in the purification reaction system was recovered using 1.8 Xnucleic acid purification magnetic beads and dissolved in 30. mu.L of EB.
1.6 PCR reaction
(1) Taking out the required reagent from the kit stored at the temperature of-20 ℃, and preparing a PCR reaction system in a 2mL PCR tube:
TABLE 4
Figure BDA0001221588410000111
(2) The PCR program was set, and the program for the PCR reaction was set as follows:
Figure BDA0001221588410000112
and (5) taking out the sample in time after the reaction is finished, storing the sample in a refrigerator at 4 ℃, and withdrawing or closing the instrument according to requirements.
(3) The DNA in the purification reaction system was recovered using 0.9X nucleic acid purification magnetic beads, and the purified library was dissolved in 20. mu.L of ddH 2O. The library was subjected to the Qubit assay and submitted to the Agilent 2100.
1.7 Breast cancer target region Capture chip library hybridization
(1) In this experiment, buffers for providing an ionic environment for the hybridization capture reaction, and washing solutions and rinsing solutions for eluting physical adsorption or nonspecific hybridization were commercially available.
(2) Preparing a hybridization library: the DNA library to be hybridized is thawed on ice and 1. mu.g of total mass is taken (this DNA library is referred to as sample library in the subsequent working up step).
(3) Preparation of an Ann primer Pool: the tag primer In1 (100. mu.M) and the common primer (1000. mu.M) corresponding to the sample library Index were mixed together at 1000pmol each (this mixture was called Ann primer pool In the subsequent operation).
(4) Preparation of hybridization samples: to a 1.5mL EP tube was added 5. mu.L of COT DNA (Human COT-1DNA, Life technologies, 1mg/mL), 1. mu.g of the sample library, and an primer pool. The prepared hybridization sample EP tube was sealed with a sealing film, and the EP tube containing the sample library pool/COT DNA/Ann primer pool was placed in a vacuum apparatus until completely dried.
(5) Solution of hybridization sample: to a dry powder of sample library pool/COT DNA/Ann primer pool was added:
7.5 μ L of 2 × hybridization buffer
3 μ L of hybridization fraction A
(6) After mixing well, the mixture was denatured for 10 minutes on a pre-prepared 95 ℃ heating module.
(7) The mixture was transferred to a 0.2mL flat-capped PCR tube containing 4.5. mu.L of the capture chip. Vortex well for 3 seconds and place the hybridization sample mixture on a 47 ℃ heating block for 16 hours. The temperature of the heat cover of the heating module needs to be set to 57 ℃, and the product after hybridization needs to be subjected to subsequent elution and recovery operation.
(8) 10 Xthe cleaning solution (I, II and III), 10 Xthe rinsing solution and 2.5 Xthe magnetic bead cleaning solution were prepared into 1 Xthe working solution.
TABLE 5
Figure BDA0001221588410000121
(9) The following reagents were preheated in a 47 ℃ heating module:
400 μ L of 1 × rinsing liquid
100 μ L of 1 XWash solution I
1.8 preparation of affinity adsorption magnetic beads
(1) Streptavidin magnetic beads (Dynabeads M-280Streptavidin, hereinafter referred to as magnetic beads) were equilibrated at room temperature for 30 minutes, and then the beads were vortexed thoroughly for 15 seconds.
(2) 100 mu L of magnetic beads are subpackaged in a 1.5mL centrifuge tube, the centrifuge tube containing 100 mu L of magnetic beads is placed on a magnetic frame, after about 5 minutes, the supernatant is carefully discarded, 1 Xmagnetic bead cleaning solution with the volume twice that of the initial volume of the magnetic beads is added, and the mixture is vortexed and mixed for 10 seconds. And (4) putting the centrifugal tube containing the magnetic beads back to the magnetic frame to adsorb the magnetic beads. After the solution was clear, the supernatant was discarded by aspiration. The procedure was repeated twice for a total of two washes.
(3) After washing, the magnetic bead washing solution was aspirated, and the magnetic beads were resuspended in a 1 × magnetic bead washing solution of the initial volume of the magnetic beads by vortexing and transferred to a 0.2mL PCR tube. Placing the PCR tube on a magnetic frame to adsorb magnetic beads for clarification, and then absorbing and removing the supernatant.
1.9 binding and rinsing of DNA and affinity adsorption magnetic beads
(1) And transferring the hybridized sample library into a 0.2mL PCR tube containing affinity adsorption magnetic beads, and performing vortex oscillation and uniform mixing.
(2) The 0.2mL PCR tube was placed in a 47 ℃ heating block for 45 minutes and vortexed once every 15 minutes to bind the DNA to the beads.
(3) After 45 min incubation, 100. mu.L of 1 XWash solution I pre-warmed at 47 ℃ was added to 15. mu.L of the captured DNA sample. Vortex for 10 seconds. All the components in the 0.2mL PCR tube were transferred to a 1.5mL centrifuge tube. A1.5 mL centrifuge tube was placed on a magnetic rack to adsorb magnetic beads, and the supernatant was discarded.
(4) A1.5 mL centrifuge tube was removed from the magnetic rack and 200. mu.L of a preheated 47 ℃ 1 Xrinse was added. Sucking and mixing for 10 times (rapid operation is needed, and the temperature of the reagent and the sample is prevented from being lower than 47 ℃). After mixing, the sample was placed on a heating module at 47 ℃ for 5 minutes. This procedure was repeated and washed twice with 1 × rinse at 47 ℃. A1.5 mL centrifuge tube was placed on a magnetic rack, magnetic beads were adsorbed, and the supernatant was discarded.
(5) 200. mu.L of room temperature 1 XWash I was added to the 1.5mL centrifuge tube and vortexed for 2 minutes. Placing the centrifuge tube on a magnetic frame, adsorbing magnetic beads, and discarding the supernatant. 200. mu.L of room temperature 1 XWash II was added to the 1.5mL centrifuge tube and vortexed for 1 minute. Placing the centrifuge tube on a magnetic frame, adsorbing magnetic beads, and discarding the supernatant. 200. mu.L of room temperature 1 XWash III was added to the above 1.5mL centrifuge tube and vortexed for 30 seconds. Placing the centrifuge tube on a magnetic frame, adsorbing magnetic beads, and discarding the supernatant.
(6) The 1.5mL centrifuge tube was removed from the magnetic rack, and 45. mu.L of PCR water was added to dissolve the eluted magnetic beads to capture the sample.
1.10 PCR amplification of captured DNA
(1) The post-capture PCR mix was prepared according to the following table, and vortexed and mixed well after preparation. Both the enriching primer F and the enriching primer R were purchased from Yingchi Weiji Co.
Figure BDA0001221588410000141
(2) The amplification program of magnetic bead adsorption DNA PCR was set as follows:
Figure BDA0001221588410000142
(3) recovery and purification of hybridization capture DNA PCR product: the DNA in the purification reaction system was recovered using nucleic acid purification magnetic beads in an amount of 0.9X, and the purified library was dissolved in 30. mu.L of ddH2And (4) in O.
1.11 library quantitation
The library was subjected to 2100 Bio Analyzer (Agilent)/LabChip GX (Caliper) and QPCR assays and the library concentration was recorded.
1.12 on-machine sequencing of libraries
The constructed library was sequenced with NextSeq 550 AR.
1.13 data processing and analysis
The FFPE sample copy number variation detection device provided by the invention is used for processing and analyzing the result of the on-machine sequencing of the 1.12 library.
The FFPE sample copy number variation detection apparatus of example 1 includes the following modules.
A sequencing data acquisition module:
the method is used for acquiring the sequencing data obtained by capturing and sequencing the FFPE sample of the breast cancer to be detected by using the breast cancer target region capturing chip.
The data quality inspection module:
and performing data quality inspection on the sequencing data, filtering out short sequences with low average quality value, filtering out short sequences with high N content, and filtering out short sequences related to Adapter to obtain filtered sequencing data C.
A sequence alignment module:
using the filtered sequencing data C, a short sequence alignment was performed with the human reference genome HG19 to obtain alignment result a. And calculating the depth value of each site on the genome according to the comparison result A to obtain a result D.
The early data processing module:
dividing a cancer target region into windows with certain lengths and overlapping, removing a depth extreme value in the window, calculating a depth median value, and calculating the GC content of a reference genome sequence in the window to obtain a result X.
A normalization module:
combining the results X and D according to the formula Zi=trimScale(Zi,Zi) And calculating to obtain the Z value in each window of the genomic DNA to be detected.
Background library screening module:
definition of
Figure BDA0001221588410000151
chr means chromosome, St means sample to be detected, and Sn means background pool sample.
According to the Z values of the genomic DNA to be detected and the background library, screening out the background library sample with the minimum d value to obtain a screened background library sample set S1,S2,S3,…,Sn
Constructing matrix X using the Z values of the n samples over m windowsm×nUsed as a background library for standby.
The data fluctuation elimination module:
to background library matrix Xm×nSingular value decomposition is carried out to obtain a factor matrix U with m rows and n columnsm×nAnd n is the number of factors. Taking several factors with the largest contribution rate to carry out LOESS regression to obtain residual error Zp
A GC correction module:
according to GC content in m windows, for ZpPerforming GC correction based on LOESS regression to obtain residual error Zpg
An output module:
and the output module is used for displaying a graph of the CNV detection result.
The detection result is shown in FIG. 2, where each small dot is a Z of a windowpgThe value is obtained. Wherein, copy numbers of both PIK3CA and ERBB2 genes are detected to be increased.
1.14 validation of results
And performing reverse transcription after extracting RNA from the fresh tissues of the original tumor of the same patient, and verifying whether the expression quantity of PIK3CA and ERBB2 genes is increased by using a QPCR method, wherein the verification result is consistent with the 1.13 detection result. The detection device provided by the invention can successfully detect copy number variation of the FFPE sample.
Industrial applicability
The FFPE sample CNV detection device and the detection method can obviously improve the detection sensitivity of CNV.

Claims (7)

1. An apparatus for FFPE sample copy number variation detection, comprising:
the system comprises a sequencing data acquisition module, a sequencing data analysis module and a sequencing data analysis module, wherein the sequencing data acquisition module is used for acquiring capture sequencing data from an FFPE sample to be detected and sequencing data from a healthy population sample, and the healthy population sample is a plurality of healthy people samples;
the sequence comparison module is connected with the sequencing data acquisition module and used for comparing the sequencing data acquired by the sequencing data acquisition module with a reference genome sequence to obtain a comparison result, and calculating the depth value of each site according to the comparison result;
the early data processing module is connected with the sequence comparison module and is used for dividing the target region into windows with certain length and overlapping, removing the depth extreme value of a locus in the window, calculating the depth mean value or median value and calculating the GC content of a reference genome sequence in the window;
the normalization module is connected with the early data processing module and used for normalizing the depth mean value or the depth median value in each window obtained by the early data processing module and calculating to obtain the Z value of the difference between the FFPE sample to be detected and the healthy population sample in each window;
a background library screening module connected with the normalization module and used for screening n healthy person samples according to the Z values of the FFPE samples to be detected and the healthy population samples, wherein each background library sample corresponds to one healthy person to obtain a background library sample set of the n healthy person samples, and then the n healthy person samples are used in m windowsThe Z value of (A) constructs a matrix X of m rows and n columnsm×n
The data fluctuation elimination module is connected with the background library screening module and is used for eliminating inherent data fluctuation caused by capture sequencing;
the GC correction module is connected with the data fluctuation elimination module and is used for carrying out GC correction according to GC content in each window;
an output module, connected to the GC correction module, for outputting a CNV detection result,
wherein the data fluctuation elimination module is used for solving the background library matrix Xm×nSingular value decomposition is carried out to obtain an m-row r-column factor matrix Um×rR is the number of factors, and then k factors with the largest contribution rate are taken to carry out LOESS regression to obtain residual error Zp
2. The apparatus of claim 1, wherein the sequencing data is sequencing data obtained using a capture sequencing method.
3. The apparatus of claim 1, wherein the early data processing module partitions the window using a sliding window method.
4. The apparatus of claim 1, wherein the normalization module calculates Z values in each window of the biological sample to be examined according to the following formula (1), wherein Zi in the formula (1) represents the Z value of the ith window,
Zi=trimScale(Zi,Zi) (1)。
5. the apparatus of claim 1, wherein equation (2) is defined:
definition of
Figure FDA0003430058150000021
Wherein chr represents a chromosome, STRepresenting the sample to be examined, SNA sample of a healthy population is represented,
the background isThe library screening module screens out n healthy person samples with the minimum d value according to the Z value of the difference between the FFPE sample to be detected and the healthy person sample, and a screened background library sample set S is obtained1,S2,S3,…,Sn
6. The apparatus of claim 1, wherein the GC correction module is to Z for GC content within each windowpPerforming GC correction based on LOESS regression to obtain residual error Zpg
7. The device of claim 1, further comprising a data quality inspection module connected to the sequencing module and the sequence alignment module for performing quality inspection on the sequencing data obtained by the sequencing module.
CN201710067086.3A 2016-12-29 2017-02-07 A device for FFPE sample copy number variation detects Active CN106845154B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611247393 2016-12-29
CN2016112473931 2016-12-29

Publications (2)

Publication Number Publication Date
CN106845154A CN106845154A (en) 2017-06-13
CN106845154B true CN106845154B (en) 2022-04-08

Family

ID=59121511

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710067086.3A Active CN106845154B (en) 2016-12-29 2017-02-07 A device for FFPE sample copy number variation detects

Country Status (1)

Country Link
CN (1) CN106845154B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108733979A (en) * 2017-10-30 2018-11-02 成都凡迪医疗器械有限公司 G/C content calibration method, device and the computer readable storage medium of NIPT
CN109979535B (en) * 2017-12-28 2021-03-02 浙江安诺优达生物科技有限公司 Genetics screening device before embryo implantation
CN109979529B (en) * 2017-12-28 2021-01-08 北京安诺优达医学检验实验室有限公司 CNV detection device
CN110797088B (en) * 2019-10-17 2020-09-15 南京医基云医疗数据研究院有限公司 Whole genome resequencing analysis and method for whole genome resequencing analysis
CN111477275B (en) * 2020-04-02 2020-12-25 上海之江生物科技股份有限公司 Method and device for identifying multi-copy area in microorganism target fragment and application

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104133914A (en) * 2014-08-12 2014-11-05 厦门万基生物科技有限公司 Method for removing GC deviations introduced by high throughout sequencing and detecting chromosome copy number variation
CN104560697A (en) * 2015-01-26 2015-04-29 上海美吉生物医药科技有限公司 Detection device for instability of genome copy number
CN104662156A (en) * 2012-08-17 2015-05-27 美国陶氏益农公司 Use of a maize untranslated region for transgene expression in plants
CN105483229A (en) * 2015-12-21 2016-04-13 广东腾飞基因科技有限公司 Method and system for detecting fetal chromosome aneuploidy
CN105555968A (en) * 2013-05-24 2016-05-04 塞昆纳姆股份有限公司 Methods and processes for non-invasive assessment of genetic variations
CN105574361A (en) * 2015-11-05 2016-05-11 上海序康医疗科技有限公司 Method for detecting variation of copy numbers of genomes
CN105722994A (en) * 2013-06-17 2016-06-29 维里纳塔健康公司 Method for determining copy number variations in sex chromosomes
CN105760712A (en) * 2016-03-01 2016-07-13 西安电子科技大学 Copy number variation detection method based on next generation sequencing
CN105814574A (en) * 2013-10-04 2016-07-27 塞昆纳姆股份有限公司 Methods and processes for non-invasive assessment of genetic variations
CN106156543A (en) * 2016-06-22 2016-11-23 厦门艾德生物医药科技股份有限公司 A kind of tumor ctDNA information statistical method

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104662156A (en) * 2012-08-17 2015-05-27 美国陶氏益农公司 Use of a maize untranslated region for transgene expression in plants
CN105555968A (en) * 2013-05-24 2016-05-04 塞昆纳姆股份有限公司 Methods and processes for non-invasive assessment of genetic variations
CN105722994A (en) * 2013-06-17 2016-06-29 维里纳塔健康公司 Method for determining copy number variations in sex chromosomes
CN105814574A (en) * 2013-10-04 2016-07-27 塞昆纳姆股份有限公司 Methods and processes for non-invasive assessment of genetic variations
CN104133914A (en) * 2014-08-12 2014-11-05 厦门万基生物科技有限公司 Method for removing GC deviations introduced by high throughout sequencing and detecting chromosome copy number variation
CN104560697A (en) * 2015-01-26 2015-04-29 上海美吉生物医药科技有限公司 Detection device for instability of genome copy number
CN105574361A (en) * 2015-11-05 2016-05-11 上海序康医疗科技有限公司 Method for detecting variation of copy numbers of genomes
CN105483229A (en) * 2015-12-21 2016-04-13 广东腾飞基因科技有限公司 Method and system for detecting fetal chromosome aneuploidy
CN105760712A (en) * 2016-03-01 2016-07-13 西安电子科技大学 Copy number variation detection method based on next generation sequencing
CN106156543A (en) * 2016-06-22 2016-11-23 厦门艾德生物医药科技股份有限公司 A kind of tumor ctDNA information statistical method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"CODEX:a normalization and copy number variation detection method for whole exome sequencing";Yuchao Jiang等;《Nucleic Acids Research》;20150123;第43卷(第6期);第1-12页 *
"新一代测序的拷贝数变异检测算法研究与设计";李燕等;《生物信息学》;20150930;第13卷(第3期);第186-191页 *
"苏尼特羊拷贝数变异的基因组分布特征研究";刘佳森等;《中国畜牧兽医》;20131231;第40卷(第10期);第173-178页 *

Also Published As

Publication number Publication date
CN106845154A (en) 2017-06-13

Similar Documents

Publication Publication Date Title
CN106650312B (en) Device for detecting copy number variation of circulating tumor DNA
CN106845154B (en) A device for FFPE sample copy number variation detects
CN107475375B (en) A kind of DNA probe library, detection method and kit hybridized for microsatellite locus related to microsatellite instability
CN114736968B (en) Application of plasma free DNA methylation marker in lung cancer early screening and lung cancer early screening device
CN106845150B (en) Device for detecting gene fusion of circulating tumor DNA sample
CN108595918B (en) Method and device for processing circulating tumor DNA repetitive sequence
CN114317762B (en) Three-marker composition for detecting early liver cancer and kit thereof
CN114164276B (en) Kit, device and method for lung cancer diagnosis
CN106815491B (en) Device for detecting gene fusion of FFPE sample
CN106282361B (en) Gene capturing kit for capturing genes related to blood diseases
CN109971857A (en) Breast cancer diagnosis and treatment biomarker
CN111484976A (en) Lung cancer circulating tumor cell detection kit and detection system
CN111020710A (en) ctDNA high-throughput detection of hematopoietic and lymphoid tissue tumors
CN115011695A (en) Multiple cancer species identification marker based on free circular DNA gene, kit and application
CN113817822B (en) Tumor diagnosis kit based on methylation detection and application thereof
Batool et al. Extrinsic and intrinsic preanalytical variables affecting liquid biopsy in cancer
CN109811052A (en) A kind of kit and gene panel detecting idiopathic azoospermatism
CN116656830B (en) Methylation markers, devices, apparatuses and storage media for gastric cancer assisted diagnosis
CN110144403B (en) New mutation SNP site of breast cancer treatment gene RBM12B and application thereof
CN116779025A (en) System for cancer screening
CN117070627A (en) Gene composition for detecting lung adenocarcinoma tumor mutation load and application thereof
CN117059163A (en) System and method for screening large fragment methylation markers
CN114807310A (en) Primer pair, kit and method for detecting multi-gene mutation of lung cancer based on multiple PCR targeted high-throughput sequencing
CN112251506A (en) UIMC1 gene mutation site detection kit based on Taqman probe method and application thereof
CN117165679A (en) Liver cancer liver transplantation postoperative recurrence marker and application thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20171215

Address after: 100176 Beijing branch of Beijing economic and Technological Development Zone Street 88 Hospital No. 8 Building 2 unit 701 room

Applicant after: Annoroad Genetic Technology (Beijing) Co., Ltd.

Applicant after: Zhejiang Annuo uni-data Biotechnology Co. Ltd.

Applicant after: Annuo uni-data (Yiwu) Medical Inspection Co. Ltd.

Address before: 100176 Beijing branch of Daxing District economic and Technological Development Zone Street 88 Hospital No. 8 Building 2 unit 701 room

Applicant before: Annoroad Genetic Technology (Beijing) Co., Ltd.

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 322000 1 building, No. 2 building, No. 10 standard building, Gaoxin Road, Chou Jiang Street, Yiwu, Zhejiang.

Applicant after: ZHEJIANG ANNOROAD BIO-TECHNOLOGY Co.,Ltd.

Applicant after: ANNOROAD GENE TECHNOLOGY (BEIJING) Co.,Ltd.

Applicant after: ANNOROAD (YIWU) MEDICAL INSPECTION CO.,LTD.

Address before: 100176 room 701, unit 2, building 8, courtyard 88, Kechuang 6th Street, Beijing Economic and Technological Development Zone, Beijing

Applicant before: ANNOROAD GENE TECHNOLOGY (BEIJING) Co.,Ltd.

Applicant before: ZHEJIANG ANNOROAD BIO-TECHNOLOGY Co.,Ltd.

Applicant before: ANNOROAD (YIWU) MEDICAL INSPECTION CO.,LTD.

GR01 Patent grant
GR01 Patent grant