A kind of device for the copy number variation detection of FFPE samples
Technical field
The invention belongs to molecular Biological Detection field, and in particular to FFPE samples copy number variation detection means and detection
Method.
Background technology
Formalin fix FFPE (Formalin-fixed and Paraffin-embedded, FFPE) method system
Standby tissue specimen is referred to as formalin fix paraffin-embedded tissue sample, abbreviation FFPE samples.FFPE samples can be for a long time
Preserve, particularly, there is substantial amounts of tumor tissue section to be preserved in the form of FFPE samples.FFPE samples are usually used in clinical pathology
Inspection, oncogene detection and medical scientific, to illustrate disease mechanisms, finding therapeutic targets and indicating the aspects such as prognosis to carry
The resource of preciousness is supplied.
The copy number variation (Copy Number Variation, CNV) of gene is a class clinically very important knot
Structure makes a variation, the prognosis with kinds of tumors, and the sensitiveness of targeted drug is related.Reliable CNV testing results can be clinical application
And condition assessment etc. provides highly important foundation.At present the CNV detection techniques that are clinically used be mostly PCR-based or
The laboratory facilities (such as FISH, IHC etc.) of SABC.Such method single detection can only cover a gene, and testing result
Sensitivity is relatively low.
CNV detections based on new-generation sequencing (Next-Generation Sequencing, NGS) platform, can protect
The CNV testing results of multiple genes are disposably given on the premise of card detection performance.Traditional NGS platform CNV detection techniques are big
Research and development are completed based on genome sequencing technology platform more, with the continuous progress of NGS technologies, the height based on target area capture
Deep sequencing technology gradually shows advantage under the application scenarios of clinical detection.
But, it is traditional at present because sequencing data of whole genome and target area capture sequencing data have essential difference
The CNV detection methods of NGS platforms capture sequencing data and do not apply to for target area, are difficult in the accuracy of detection CNV
Ensure, and detection sensitivity has much room for improvement.This problem shows particularly evident in FFPE samples.The DNA fragmentation of FFPE samples
Change more seriously, influence can be produced on the process such as target gene DNA captures and NGS sequencings, and eventually affect target area
The key technical index such as effective depth.Therefore, the availability of the low depth sequencing data produced by low quality FFPE samples, into
For larger technological challenge.
The content of the invention
In view of above-mentioned the deficiencies in the prior art, it is an object of the invention to provide a kind of CNV to FFPE samples
Detection sensitivity detection means and detection method higher.
The present inventor has made intensive studies to solve above-mentioned technical problem, as a result finds:In FFPE samples
In CNV detection methods, if carry out rational noise reduction process to data, if used suitable context vault, can directly affect
To testing result, this kind of influence is especially pronounced particularly in sequencing is captured.By more reasonable comprehensively noise reduction process, dynamic
The application of context vault, it is possible to increase the sensitivity of FFPE samples CNV detections, so as to complete the present invention.
That is, the present invention includes:
For FFPE samples copy number variation, (the copy number variation in gene region one kind can occur, it is also possible to send out
It is raw in non-genomic region) device of detection, it includes:
Sequencing data acquisition module, for obtaining capture sequencing data from FFPE samples to be checked and from Healthy People
The sequencing data of group's sample, the healthy population sample is multiple Healthy People (healthy normal person) samples;
Sequence alignment module, it is connected with the sequencing data acquisition module, for by the sequencing data acquisition module
The sequencing data of acquisition is compared with reference gene group sequence, obtain comparison result (include for example, every can with refer to base
Because of the chromosome where group short sequence for comparing, coordinate, the information such as the match condition of short sequence and reference gene group), according to
The comparison result calculates each site (may have the depth in some sites in each site referred on genome, but capture sequencing
Angle value is depth value 0);
Primary Stage Data processing module, it is connected with the sequence alignment module, for by target area (100k~100M,
Full-length genome pays close attention to region) window for there are overlap (10~70%) of certain length (50~1000bp) is divided into,
Remove the depth extreme value (maximum and minimum) in site in window and calculate depth average or intermediate value, and calculate in the window
The G/C content of reference gene group sequence;
Normalization module, it is connected with the Primary Stage Data processing module, for the Primary Stage Data processing module institute
The depth average in each window or intermediate value for obtaining are normalized, and are calculated FFPE samples to be checked and healthy population sample
Z values in this each window;
Context vault screening module, it is connected with the normalization module, for according to FFPE samples to be checked and healthy population
The Z values of sample, filter out n Healthy People sample (one Healthy People of each Healthy People sample correspondence), obtain n Healthy People sample
Context vault sample set, the then matrix X of the Z values structure m rows n row using the n Healthy People sample in m windowm×n;
Data fluctuations cancellation module, it is connected with the context vault screening module, for consolidating that elimination capture sequencing brings
There are data fluctuations;
GC correction modules, it is connected with the data fluctuations cancellation module, for being carried out according to the G/C content in each window
GC is corrected;
Output module, it is connected with the GC correction modules, for exporting CNV testing results (including for example, for showing
The figure of CNV testing results, result of determination of feminine gender/positive of CNV variations etc.).
The sequencing data acquisition module of the device for the copy number variation detection of FFPE samples of the invention is obtained and uses two
Sequencing data obtained from being sequenced to the DNA in FFPE samples to be checked for sequence measurement.The Mainstream Platform one of two generations sequencing
As (Sequencing By Synthesis, SBS) technology be sequenced carry out nucleic acid sequencing using in synthesis.Sequencing before, it is necessary to
The structure of sequencing library is carried out to nucleic acid (DNA or RNA) sample, basic procedure is as follows:The DNA after fragmentation is carried out into piece first
Section end repair, fragment 3' ends after repair add " A " base afterwards, then by above-mentioned DNA fragmentation with contain sequencing primer
DNA joints (Adapter) connection of binding site, is expanded finally by PCR, is completed sequencing library and is built.For specific
Two generation sequence measurements be not particularly limited, any two generations sequence measurement well known by persons skilled in the art can be used.
Preferably, the sequencing data is the sequencing data obtained using capture sequence measurement;
The target gene of the capture sequencing can be different because of different target diseases.The target disease can be for example
Solid carcinoma (such as stomach cancer, mammary gland, colorectal cancer, lung cancer etc.).
Specifically for example, in the case where the target disease is breast cancer, the target gene can be such as EGFR bases
Cause, ERBB2 genes, FGFR1 genes, KIT genes, PIK3CA genes or/and PTEN genes;It is straight colon in the target disease
In the case of intestinal cancer, the target gene can be such as EGFR gene, ERBB2 genes, FGFR2 genes, KRAS genes, MET
Gene, PTEN genes;In the case where the target disease is stomach cancer, the target gene can be such as EGFR gene,
ERBB2 genes, FGFR1 genes, FGFR2 genes, KRAS genes, MET genes, PIK3CA genes or/and PTEN genes;Described
In the case that target disease is lung cancer, the target gene can be such as ALK gene, BRAF gene, EGFR gene, ERBB2
Gene, FGFR1 genes, KRAS genes, MET genes, PIK3CA or/and PTEN.
Preferably, the Primary Stage Data processing module divides the window using slip window sampling.
Preferably, the normalization module is calculated the Z values in sample to be checked each window according to following formula (1),
Zi represents i-th Z value of window in formula (1),
Zi=trimScale (Zi,Zi)……(1)。
Preferably, defined formula (2):
Definition
Wherein, chr represents chromosome, and St represents biological specimen to be checked, SNRepresent healthy population sample;
The context vault screening module is filtered out so that the d according to FFPE samples to be checked and the Z values of healthy population sample
It is worth n minimum Healthy People sample, the context vault sample set S after being screened1,S2,S3,…,Sn(N and n are natural number and n
< N).
Preferably, the data fluctuations cancellation module is to context vault matrix Xm×nSingular value decomposition is done, the m row r row factors are obtained
Matrix Um×r, r is factor number, and (the k factor i.e. in the top, k is generally 4- then to take the k maximum factor of contribution rate
10) LOESS recurrence is carried out, Residual Z is obtainedp。
Preferably, the GC correction modules are according to the G/C content in each window, to ZpReturned based on LOESS and do GC corrections,
Obtain Residual Zpg。
Preferably, the FFPE samples copy number variation detection means also includes:
Data quality checking module, it is connected with the sequencer module and the sequence alignment module, for the sequencing mould
The sequencing data that block is obtained carries out quality inspection.It is higher that quality inspection including but not limited to for example removes low-quality short sequence, removal N content
Short sequence, remove the short sequence related to Adapter and the finally quality control index of statistics items correlation.
Additionally, present invention additionally comprises:
For FFPE samples copy number variation, (the copy number variation in gene region one kind can occur, it is also possible to send out
It is raw in non-genomic region) method of detection, it includes:
Sequencing data obtaining step, obtains capture sequencing data from FFPE samples to be checked and from healthy population sample
This sequencing data, the healthy population sample is multiple Healthy People samples;
Sequence alignment procedures, the sequencing data that the sequencing data obtaining step is obtained is carried out with reference gene group sequence
Compare, obtain comparison result and (include the chromosome for example, where every short sequence that can be compared with reference gene group, sit
The information such as the match condition of mark, short sequence and reference gene group), each site is calculated according to the comparison result and (refers to genome
On each site, but it is depth value 0) that may have the depth value in some sites in capture sequencing;
Primary Stage Data process step, target area (100k~100M, full-length genome or pay close attention to region) is divided
It is the window for there are overlap (10~70%) of certain length (50~1000bp), removes the depth extreme value in site in window (greatly
Value and minimum) and depth average or intermediate value are calculated, and calculate the G/C content of the reference gene group sequence in the window;
Normalization step, is carried out to the depth average or intermediate value in each window obtained by Primary Stage Data process step
Normalization, is calculated the Z values in FFPE samples to be checked and healthy population sample each window;
Context vault screens step, according to FFPE samples to be checked and the Z values of healthy population sample, filters out n healthy proper manners
This (Healthy People sample, one Healthy People of each context vault sample correspondence), obtains context vault sample set, then strong using the n
Z value of the health people sample in m window builds the matrix X of m rows n rowm×n;
Data fluctuations removal process, eliminates the inherent data fluctuation that capture sequencing brings;
GC aligning steps, GC corrections are carried out according to the G/C content in each window;And
Output step, output CNV testing results (including for example, figure for showing CNV testing results, the moon of CNV variations
Result of determination of property/positive etc.).
The sequencing data obtaining step of the method for the copy number variation detection of FFPE samples of the invention is obtained and uses two
Sequencing data obtained from being sequenced to the DNA in FFPE samples to be checked for sequence measurement.The Mainstream Platform one of two generations sequencing
As (Sequencing By Synthesis, SBS) technology be sequenced carry out nucleic acid sequencing using in synthesis.Sequencing before, it is necessary to
The structure of sequencing library is carried out to nucleic acid (DNA or RNA) sample, basic procedure is as follows:The DNA after fragmentation is carried out into piece first
Section end repair, fragment 3' ends after repair add " A " base afterwards, then by above-mentioned DNA fragmentation with contain sequencing primer
DNA joints (Adapter) connection of binding site, is expanded finally by PCR, is completed sequencing library and is built.For specific
Two generation sequence measurements be not particularly limited, any two generations sequence measurement well known by persons skilled in the art can be used.
Preferably, the sequencing data is the sequencing data obtained using capture sequence measurement;
The target gene of the capture sequencing can be different because of different target diseases.The target disease can be for example
Solid carcinoma (such as stomach cancer, mammary gland, colorectal cancer, lung cancer etc.).
Specifically for example, in the case where the target disease is breast cancer, the target gene can be such as EGFR bases
Cause, ERBB2 genes, FGFR1 genes, KIT genes, PIK3CA genes or/and PTEN genes;It is straight colon in the target disease
In the case of intestinal cancer, the target gene can be such as EGFR gene, ERBB2 genes, FGFR2 genes, KRAS genes, MET
Gene, PTEN genes;In the case where the target disease is stomach cancer, the target gene can be such as EGFR gene,
ERBB2 genes, FGFR1 genes, FGFR2 genes, KRAS genes, MET genes, PIK3CA genes or/and PTEN genes;Described
In the case that target disease is lung cancer, the target gene can be such as ALK gene, BRAF gene, EGFR gene, ERBB2
Gene, FGFR1 genes, KRAS genes, MET genes, PIK3CA or/and PTEN.
Preferably, the Primary Stage Data process step divides the window using slip window sampling.
Preferably, the normalization step is calculated the Z values in sample to be checked each window according to following formula (1),
Zi represents i-th Z value of window in formula (1),
Zi=trimScale (Zi,Zi)……(1)。
Preferably, defined formula (2):
Definition
Wherein, chr represents chromosome, STRepresent FFPE samples to be checked, SNRepresent healthy population sample;
The context vault screening step is filtered out so that the d according to FFPE samples to be checked and the Z values of healthy population sample
It is worth n minimum Healthy People sample, the context vault sample set S after being screened1,S2,S3,…,Sn(N, n are natural number and n
< N).
Preferably, the data fluctuations removal process is to context vault matrix Xm×nSingular value decomposition is done, the m row r row factors are obtained
Matrix Um×r, r is factor number, and (the k factor i.e. in the top, k is generally 4- then to take the k maximum factor of contribution rate
10) LOESS recurrence is carried out, Residual Z is obtainedp。
Preferably, the GC aligning steps are according to the G/C content in each window, to ZpReturned based on LOESS and do GC corrections,
Obtain Residual Zpg。
Preferably, the copy number mutation detection method also includes:
Data quality checking step, quality inspection is carried out to the sequencing data that the sequencing steps are obtained.Quality inspection includes but is not limited to example
Such as remove low-quality short sequence, removal N content short sequence higher, remove the short sequence related to Adapter and most finish-unification
The every related quality control index of meter.
Wherein, the preferred embodiment of above steps can refer to foregoing.
According to the present invention, there is provided a kind of detection sensitivity to FFPE samples CNV detection means and detection method higher.
Brief description of the drawings
Fig. 1 is the schematic diagram of the device for the copy number variation detection of FFPE samples of the invention.
Fig. 2 is figure of the embodiment 1 to the CNV testing results of breast cancer multiple gene.
The specific embodiment of invention
The scientific and technical terminology referred in this specification has the implication identical implication being generally understood that with those skilled in the art,
It is defined if any definition of the conflict in this specification.
Definition
Reference gene group:The monoploid sequence of the complete set entrained by one cell or organism, including a full set of base
Cause and intervening sequence.
Compare:Refer generally to sequence alignment, refer to determine the similitude between two or more sequences so that homology, and
By they according to certain aligned transfer process.
Depth value:For certain site on genome, according to comparison result, the short sequence quantity in the site is covered i.e.
It is the depth value in the site.
Window (sliding window):Refer generally to one section of region of regular length on genome.
Context vault:The Sample Storehouse being made up of many cases (it is generally acknowledged that >=20) Healthy People sample.
Capture sequencing:By pre-designed probe, the specific region (region interested) on genome is carried out
DNA fragmentation is captured, and the process of NGS sequencings is finally carried out to the DNA fragmentation for grabbing.
NGS (high-flux sequence):High throughput sequencing technologies (High-throughput sequencing) are also known as " next
Generation " sequencing technologies (" Next-generation " sequencing technology), with can once parallel to hundreds of thousands to several
It is mark that million DNA moleculars carry out sequencing and the general shorter grade of length of reading.
Normalization (Z values):
trimScale(w,v):It is the value that certain needs is normalized to define w, and v is certain data set
A. the data for removing the upper and lower certain percentages of v are obtained
B. calculateMean μ and standard deviation sigma
C. it is calculatedAs final result
SVD (singular value decomposition):SVD is a kind of important matrix decomposition in linear algebra, is positive rule in matrix analysis
The popularization of battle array unitarily diagonalizable.There is important application in fields such as signal transacting, statistics.Its effect is that data set is mapped to low-dimensional
In space.The characteristic value (being characterized with singular value in SVD) of data set is arranged according to importance, and the process of dimensionality reduction is exactly to give up
The process of unessential characteristic vector, and the space of remaining characteristic vector composition is the space after dimensionality reduction.
Embodiment
More specific description is carried out to the present invention by the following examples.It should be appreciated that embodiment described herein is
It is of the invention not for limiting for explaining the present invention.
Embodiment 1
Using the device detected for FFPE samples copy number variation of the invention to the group of Female breast cancer patients
The CNV situations for knitting FFPE samples are detected.
1.1 DNA for extracting FFPE samples
Using GeneRead DNA FFPE Kit (QIAGEN companies), extraction operation is carried out according to handbook explanation, obtained
FFPE sample DNAs.
1.2 samples are interrupted
Instrument being interrupted using Biorupter and entering Break Row, setting interrupts 30 circulations of condition, and 30s ON/30s OFF will
FFPE sample DNAs are broken into the fragment of 200bp or so, the DNA fragmentation after being interrupted.
Repair (End Repair) in 1.3 ends
(1) reagent needed for being taken out from -20 DEG C of kits of preservation in advance, single sample amount of preparation is referring to table 1.
Table 1
(2) reaction is repaired in end:1.5mL centrifuge tubes are placed in 20 DEG C of warm bath 30 in Thermomixer after adding DNA sample
Minute.Reaction uses the DNA in 1.8 × nucleic acid purification magnetic bead recovery purifying reaction system after terminating, be dissolved in 32 μ LEB.
1.4 ends add " A " (A-Tailing)
(1) reagent needed for being taken out from -20 DEG C of kits of preservation in advance, single sample amount of preparation is referring to table 2:
Table 2
(2) end adds " A " to react:32 μ L previous steps are added to be placed in 1.5mL centrifuge tubes after purifying the DNA for reclaiming
37 DEG C of warm bath 30 minutes in Thermomixer.Using the DNA in 1.8 × nucleic acid purification magnetic bead recovery purifying reaction system, it is dissolved in
In 18 μ L EB.
The connection (Adapter Ligation) of 1.5 joints
(1) reagent needed for being taken out from -20 DEG C of kits of preservation in advance, single sample amount of preparation is referring to table 3:
Table 3
(2) coupled reaction of joint:18 μ L previous steps are added to be placed in sample tube after purifying the DNA for reclaiming
20 DEG C of warm bath 15 minutes in Thermomixer.Using the DNA in 1.8 × nucleic acid purification magnetic bead recovery purifying reaction system, it is dissolved in
In the EB of 30 μ L.
1.6 PCR react
(1) reagent needed for being taken out from -20 DEG C of kits of preservation, prepares PCR reaction systems in the PCR pipe of 2mL:
Table 4
(2) PCR programs are set, the program setting of PCR reactions is as follows:
Reaction terminates timely take out sample and is put into 4 DEG C of Refrigerator stores and exits on request or close instrument.
(3) with the DNA in 0.9 × nucleic acid purification magnetic bead recovery purifying reaction system, library after purification is dissolved in 20 μ L's
In ddH2O.Qubit detections are carried out to library, by library censorship Agilent 2100.
1.7 breast cancer target areas capture chip libraries hybridization
(1) in this experiment, for provide hybrid capture reaction ionic environment buffer solution and for elute physics inhale
Attached or non-specific hybridization cleaning fluid, rinsing liquid are commercially obtained.
(2) Hybrid Library is prepared:By DNA library to be hybridized in thawed on ice, the μ g of gross mass 1 are taken (in subsequent operation step
This DNA library is referred to as sample library in rapid).
(3) Ann primers Pool is prepared:By the corresponding Tag primer In1 of sample library Index (100 μM) and consensus primer
(1000 μM) respectively take 1000pmol mixing, (this mixture is referred to as into Ann primer pool in subsequent process steps).
(4) preparation of sample is hybridized:To adding 5 μ L COT DNA (Human Cot-1DNA, Life in 1.5mL EP pipes
Technologies, 1mg/mL), 1 μ g samples library, Ann primers pool.The hybridization sample EP for preparing is sealed with sealed membrane
Pipe, the EP pipes that will fill sample library pool/COT DNA/Ann primers pool are placed in vacuum plant until being completely dried.
(5) solution of sample is hybridized:To being added in the dry powder of sample library pool/COT DNA/Ann primers pool:
7.5 μ 2 × hybridization buffers of L
3 μ L hybridization components A
(6) said mixture is placed on preprepared 95 DEG C of heating modules after fully mixing is denatured 10 minutes.
(7) said mixture is transferred in the 0.2mL flat cover PCR pipes containing 4.5 μ L capture chips.Fully be vortexed concussion
3 seconds, Hybridization samples mixture is placed in 47 DEG C of heating module upper 16 hours.The hot lid temperature of heating module need to be set as 57 DEG C,
Product need to subsequently be eluted reclaimer operation after hybridization.
(8) by 10 × cleaning fluid (I, II and III), 10 × rinsing liquid and 2.5 × magnetic bead cleaning fluid be configured to 1 × working solution.
Table 5
(9) following reagent is preheated in 47 DEG C of heating modules:
400 μ 1 × rinsing liquids of L
100 μ 1 × cleaning fluids of L I
1.8 prepare affine absorption magnetic bead
(1) by Streptavidin MagneSphere (Dynabeads M-280Streptavidin, hereinafter referred to as magnetic bead) at room temperature
After 30 minutes, magnetic bead is fully vortexed balance mixing 15 seconds.
(2) to 100 μ L magnetic beads are dispensed in 1.5mL centrifuge tubes, the centrifuge tube that will fill 100 μ L magnetic beads is placed on magnetic frame,
Careful suction abandons supernatant after about 5 minutes, plus twice magnetic bead initial volume 1 × magnetic bead cleaning fluid, be vortexed and mix 10 seconds.To fill
The centrifuge tube of magnetic bead puts back to magnetic frame, adsorbs magnetic bead.Treat that solution is clarified, supernatant is abandoned in suction.Time step is repeated, is washed twice altogether.
(3) inhaled after washing is finished and abandon magnetic bead cleaning fluid, with 1 × magnetic bead cleaning fluid resuspended magnetic bead of vortex of magnetic bead initial volume
It is transferred in the PCR pipe of 0.2mL.PCR pipe is placed on magnetic frame suction after adsorbing magnetic bead clarification and abandons supernatant.
The combination and rinsing of 1.9 DNA and affine absorption magnetic bead
(1) the sample library of hybridization is transferred in the 0.2mL PCR pipes for filling affine absorption magnetic bead, vortex oscillation is mixed.
(2) 0.2mL PCR pipes are placed in 47 DEG C of heating modules 45 minutes, were vortexed every 15 minutes and mixed once, make DNA with
Magnetic bead is combined.
After (3) 45 minutes are incubated, to 47 DEG C of μ L of 1 × cleaning fluid I 100 of preheating of addition in the DNA sample that 15 μ L are captured.
It is vortexed and mixes 10 seconds.Whole components in 0.2mL PCR pipes are transferred in 1.5mL centrifuge tubes.1.5mL centrifuge tubes are placed in magnetic force
Magnetic bead is adsorbed on frame, supernatant is abandoned.
(4) 1.5mL centrifuge tubes are removed from magnetic frame, the 1 × rinsing liquid for adding 200 μ L to preheat 47 DEG C.Mixing is played in suction
10 times (need to operate rapidly, prevent reagent, sample temperature to be less than 47 DEG C).Sample is placed in 47 DEG C of heating module upper 5 minutes after mixing.
This step is repeated, is washed twice altogether with 47 DEG C of 1 × rinsing liquid.The centrifuge tube of 1.5mL is placed on magnetic frame, magnetic bead is adsorbed,
Abandon supernatant.
(5) to 1 × cleaning fluid I that 200 μ L room temperatures are added in above-mentioned 1.5mL centrifuge tubes, it is vortexed and mixes 2 minutes.Will centrifugation
Pipe is placed on magnetic frame, adsorbs magnetic bead, abandons supernatant.To 1 × cleaning fluid II that 200 μ L room temperatures are added in above-mentioned 1.5mL centrifuge tubes,
It is vortexed and mixes 1 minute.Centrifuge tube is placed on magnetic frame, magnetic bead is adsorbed, supernatant is abandoned.To adding 200 in above-mentioned 1.5mL centrifuge tubes
1 × the cleaning fluid III of μ L room temperatures, is vortexed and mixes 30 seconds.Centrifuge tube is placed on magnetic frame, magnetic bead is adsorbed, supernatant is abandoned.
(6) 1.5mL centrifuge tubes are removed from magnetic frame, add 45 μ L PCR water, dissolving wash-out magnetic capture sample.
The PCR amplifications of 1.10 capture dnas
(1) according to the form below prepares PCR mix after capture, and the concussion that is vortexed after preparing is mixed.Enriching primer F and enriching primer R
It is purchased from Invitrogen Corp..
(2) the amplification program setting of magnetic bead adsorption of DNA PCR is as follows:
(3) recovery purifying of hybrid capture DNA PCR primers:With in nucleic acid purification magnetic bead recovery purifying reaction system
DNA, magnetic bead usage amount is 0.9 ×, library after purification is dissolved in the ddH of 30 μ L2In O.
1.11 libraries quantify
2100 Bio Analyzer (Agilent)/LabChip GX (Caliper) and QPCR detections, note are carried out to library
Record library concentration.
Machine sequencing on 1.12 libraries
The library for building is sequenced with NextSeq 550AR.
1.13 data processing and inversions
Copied at the result that number variation detection means is sequenced to machine on 1.12 libraries using FFPE samples of the invention
Reason analysis.
The FFPE samples copy number variation detection means of embodiment 1 includes following modules.
Sequencing data acquisition module:
Capture survey is carried out to breast cancer FFPE samples to be detected using breast cancer target area capture chip for obtaining
Sequence obtains sequencing data.
Data quality checking module:
Data quality checking is carried out to sequencing data, the low short sequence of average mass values is filtered out, N content short sequence high is filtered out
Row, filter out the short sequence related to Adapter, the sequencing data C for being filtered.
Sequence alignment module:
Using the sequencing data C by filtering, short sequence alignment is carried out with reference to genome HG19 with people, obtain comparison result
A.The depth value in each site on genome is calculated according to comparison result A, result D is obtained.
Primary Stage Data processing module:
By cancerous target region division it is certain length and has the window of overlap, removes the depth extreme value in window and calculate
Depth intermediate value, and the G/C content of reference gene group sequence in the window is calculated, obtain result X.
Normalization module:
With reference to result X and D, according to formula Zi=trimScale (Zi,Zi) it is calculated genomic DNA to be detected each window
Intraoral Z values.
Context vault screening module:
Definition
Chr is the meaning of chromosome, and St represents sample to be detected, and Sn represents context vault sample.
According to genomic DNA to be checked and the Z values of context vault, the context vault sample for causing that d values are minimum is filtered out, screened
Context vault sample set S afterwards1,S2,S3,…,Sn。
Z values using this n sample in m window build matrix Xm×nIt is stand-by as context vault.
Data fluctuations cancellation module:
To context vault matrix Xm×nSingular value decomposition is done, m row n row factor matrixs U is obtainedm×n, n is factor number.Take contribution
The maximum several factors of rate carry out LOESS recurrence, obtain Residual Zp。
GC correction modules:
According to the G/C content in m window, to ZpReturned based on LOESS and do GC corrections, obtain Residual Zpg。
Output module:
Output module is used to show the figure of CNV testing results.
Testing result is as shown in Fig. 2 each dot in figure is a Z for windowpgValue.Wherein, PIK3CA with
Two genes of ERBB2 detect copy number increase.
1.14 result verifications
Same patient original tumour flesh tissue carries out reverse transcription after extracting RNA, using QPCR method validations PIK3CA and
Whether the expression quantity of ERBB2 genes raises, and the result is consistent with 1.13 testing results.Detection means of the invention can succeed
Detect the copy number variation of FFPE samples.
Industrial applicibility
FFPE samples CNV detection means of the invention and detection method can significantly increase the detection sensitivity of CNV.