CN106845154A

CN106845154A - A kind of device for the copy number variation detection of FFPE samples

Info

Publication number: CN106845154A
Application number: CN201710067086.3A
Authority: CN
Inventors: 荆瑞琳; 张萌萌; 董永芳; 王旺; 李雪峰; 玄兆伶; 李大为; 梁峻彬; 陈重建
Original assignee: ANNOROAD GENETIC TECHNOLOGY (BEIJING) Co Ltd
Current assignee: Annoroad Genetic Technology (Beijing) Co., Ltd.; Annuo uni-data (Yiwu) Medical Inspection Co. Ltd.; Zhejiang Annuo uni-data Biotechnology Co. Ltd.
Priority date: 2016-12-29
Filing date: 2017-02-07
Publication date: 2017-06-13
Anticipated expiration: 2037-02-07
Also published as: CN106845154B

Abstract

The present invention relates to a kind of FFPE samples copy number variation detection means, its detection sensitivity is high.FFPE samples copy number variation detection means of the invention includes sequencing data acquisition module, sequence alignment module, Primary Stage Data processing module, normalization module, context vault screening module, data fluctuations cancellation module, GC correction modules and output module.

Description

A kind of device for the copy number variation detection of FFPE samples

Technical field

The invention belongs to molecular Biological Detection field, and in particular to FFPE samples copy number variation detection means and detection Method.

Background technology

Formalin fix FFPE (Formalin-fixed and Paraffin-embedded, FFPE) method system Standby tissue specimen is referred to as formalin fix paraffin-embedded tissue sample, abbreviation FFPE samples.FFPE samples can be for a long time Preserve, particularly, there is substantial amounts of tumor tissue section to be preserved in the form of FFPE samples.FFPE samples are usually used in clinical pathology Inspection, oncogene detection and medical scientific, to illustrate disease mechanisms, finding therapeutic targets and indicating the aspects such as prognosis to carry The resource of preciousness is supplied.

The copy number variation (Copy Number Variation, CNV) of gene is a class clinically very important knot Structure makes a variation, the prognosis with kinds of tumors, and the sensitiveness of targeted drug is related.Reliable CNV testing results can be clinical application And condition assessment etc. provides highly important foundation.At present the CNV detection techniques that are clinically used be mostly PCR-based or The laboratory facilities (such as FISH, IHC etc.) of SABC.Such method single detection can only cover a gene, and testing result Sensitivity is relatively low.

CNV detections based on new-generation sequencing (Next-Generation Sequencing, NGS) platform, can protect The CNV testing results of multiple genes are disposably given on the premise of card detection performance.Traditional NGS platform CNV detection techniques are big Research and development are completed based on genome sequencing technology platform more, with the continuous progress of NGS technologies, the height based on target area capture Deep sequencing technology gradually shows advantage under the application scenarios of clinical detection.

But, it is traditional at present because sequencing data of whole genome and target area capture sequencing data have essential difference The CNV detection methods of NGS platforms capture sequencing data and do not apply to for target area, are difficult in the accuracy of detection CNV Ensure, and detection sensitivity has much room for improvement.This problem shows particularly evident in FFPE samples.The DNA fragmentation of FFPE samples Change more seriously, influence can be produced on the process such as target gene DNA captures and NGS sequencings, and eventually affect target area The key technical index such as effective depth.Therefore, the availability of the low depth sequencing data produced by low quality FFPE samples, into For larger technological challenge.

The content of the invention

In view of above-mentioned the deficiencies in the prior art, it is an object of the invention to provide a kind of CNV to FFPE samples Detection sensitivity detection means and detection method higher.

The present inventor has made intensive studies to solve above-mentioned technical problem, as a result finds：In FFPE samples In CNV detection methods, if carry out rational noise reduction process to data, if used suitable context vault, can directly affect To testing result, this kind of influence is especially pronounced particularly in sequencing is captured.By more reasonable comprehensively noise reduction process, dynamic The application of context vault, it is possible to increase the sensitivity of FFPE samples CNV detections, so as to complete the present invention.

That is, the present invention includes：

For FFPE samples copy number variation, (the copy number variation in gene region one kind can occur, it is also possible to send out It is raw in non-genomic region) device of detection, it includes：

Sequencing data acquisition module, for obtaining capture sequencing data from FFPE samples to be checked and from Healthy People The sequencing data of group's sample, the healthy population sample is multiple Healthy People (healthy normal person) samples；

Sequence alignment module, it is connected with the sequencing data acquisition module, for by the sequencing data acquisition module The sequencing data of acquisition is compared with reference gene group sequence, obtain comparison result (include for example, every can with refer to base Because of the chromosome where group short sequence for comparing, coordinate, the information such as the match condition of short sequence and reference gene group), according to The comparison result calculates each site (may have the depth in some sites in each site referred on genome, but capture sequencing Angle value is depth value 0)；

Primary Stage Data processing module, it is connected with the sequence alignment module, for by target area (100k~100M, Full-length genome pays close attention to region) window for there are overlap (10~70%) of certain length (50~1000bp) is divided into, Remove the depth extreme value (maximum and minimum) in site in window and calculate depth average or intermediate value, and calculate in the window The G/C content of reference gene group sequence；

Normalization module, it is connected with the Primary Stage Data processing module, for the Primary Stage Data processing module institute The depth average in each window or intermediate value for obtaining are normalized, and are calculated FFPE samples to be checked and healthy population sample Z values in this each window；

Context vault screening module, it is connected with the normalization module, for according to FFPE samples to be checked and healthy population The Z values of sample, filter out n Healthy People sample (one Healthy People of each Healthy People sample correspondence), obtain n Healthy People sample Context vault sample set, the then matrix X of the Z values structure m rows n row using the n Healthy People sample in m window_m×n；

Data fluctuations cancellation module, it is connected with the context vault screening module, for consolidating that elimination capture sequencing brings There are data fluctuations；

GC correction modules, it is connected with the data fluctuations cancellation module, for being carried out according to the G/C content in each window GC is corrected；

Output module, it is connected with the GC correction modules, for exporting CNV testing results (including for example, for showing The figure of CNV testing results, result of determination of feminine gender/positive of CNV variations etc.).

The sequencing data acquisition module of the device for the copy number variation detection of FFPE samples of the invention is obtained and uses two Sequencing data obtained from being sequenced to the DNA in FFPE samples to be checked for sequence measurement.The Mainstream Platform one of two generations sequencing As (Sequencing By Synthesis, SBS) technology be sequenced carry out nucleic acid sequencing using in synthesis.Sequencing before, it is necessary to The structure of sequencing library is carried out to nucleic acid (DNA or RNA) sample, basic procedure is as follows：The DNA after fragmentation is carried out into piece first Section end repair, fragment 3' ends after repair add " A " base afterwards, then by above-mentioned DNA fragmentation with contain sequencing primer DNA joints (Adapter) connection of binding site, is expanded finally by PCR, is completed sequencing library and is built.For specific Two generation sequence measurements be not particularly limited, any two generations sequence measurement well known by persons skilled in the art can be used.

Preferably, the sequencing data is the sequencing data obtained using capture sequence measurement；

The target gene of the capture sequencing can be different because of different target diseases.The target disease can be for example Solid carcinoma (such as stomach cancer, mammary gland, colorectal cancer, lung cancer etc.).

Specifically for example, in the case where the target disease is breast cancer, the target gene can be such as EGFR bases Cause, ERBB2 genes, FGFR1 genes, KIT genes, PIK3CA genes or/and PTEN genes；It is straight colon in the target disease In the case of intestinal cancer, the target gene can be such as EGFR gene, ERBB2 genes, FGFR2 genes, KRAS genes, MET Gene, PTEN genes；In the case where the target disease is stomach cancer, the target gene can be such as EGFR gene, ERBB2 genes, FGFR1 genes, FGFR2 genes, KRAS genes, MET genes, PIK3CA genes or/and PTEN genes；Described In the case that target disease is lung cancer, the target gene can be such as ALK gene, BRAF gene, EGFR gene, ERBB2 Gene, FGFR1 genes, KRAS genes, MET genes, PIK3CA or/and PTEN.

Preferably, the Primary Stage Data processing module divides the window using slip window sampling.

Preferably, the normalization module is calculated the Z values in sample to be checked each window according to following formula (1), Zi represents i-th Z value of window in formula (1),

Z_i=trimScale (Z_i,Z_i)……(1)。

Preferably, defined formula (2)：

Definition

Wherein, chr represents chromosome, and St represents biological specimen to be checked, S_NRepresent healthy population sample；

The context vault screening module is filtered out so that the d according to FFPE samples to be checked and the Z values of healthy population sample It is worth n minimum Healthy People sample, the context vault sample set S after being screened₁,S₂,S₃,…,S_n(N and n are natural number and n ＜ N).

Preferably, the data fluctuations cancellation module is to context vault matrix X_m×nSingular value decomposition is done, the m row r row factors are obtained Matrix U_m×r, r is factor number, and (the k factor i.e. in the top, k is generally 4- then to take the k maximum factor of contribution rate 10) LOESS recurrence is carried out, Residual Z is obtained_p。

Preferably, the GC correction modules are according to the G/C content in each window, to Z_pReturned based on LOESS and do GC corrections, Obtain Residual Z_pg。

Preferably, the FFPE samples copy number variation detection means also includes：

Data quality checking module, it is connected with the sequencer module and the sequence alignment module, for the sequencing mould The sequencing data that block is obtained carries out quality inspection.It is higher that quality inspection including but not limited to for example removes low-quality short sequence, removal N content Short sequence, remove the short sequence related to Adapter and the finally quality control index of statistics items correlation.

Additionally, present invention additionally comprises：

For FFPE samples copy number variation, (the copy number variation in gene region one kind can occur, it is also possible to send out It is raw in non-genomic region) method of detection, it includes：

Sequencing data obtaining step, obtains capture sequencing data from FFPE samples to be checked and from healthy population sample This sequencing data, the healthy population sample is multiple Healthy People samples；

Sequence alignment procedures, the sequencing data that the sequencing data obtaining step is obtained is carried out with reference gene group sequence Compare, obtain comparison result and (include the chromosome for example, where every short sequence that can be compared with reference gene group, sit The information such as the match condition of mark, short sequence and reference gene group), each site is calculated according to the comparison result and (refers to genome On each site, but it is depth value 0) that may have the depth value in some sites in capture sequencing；

Primary Stage Data process step, target area (100k~100M, full-length genome or pay close attention to region) is divided It is the window for there are overlap (10~70%) of certain length (50~1000bp), removes the depth extreme value in site in window (greatly Value and minimum) and depth average or intermediate value are calculated, and calculate the G/C content of the reference gene group sequence in the window；

Normalization step, is carried out to the depth average or intermediate value in each window obtained by Primary Stage Data process step Normalization, is calculated the Z values in FFPE samples to be checked and healthy population sample each window；

Context vault screens step, according to FFPE samples to be checked and the Z values of healthy population sample, filters out n healthy proper manners This (Healthy People sample, one Healthy People of each context vault sample correspondence), obtains context vault sample set, then strong using the n Z value of the health people sample in m window builds the matrix X of m rows n row_m×n；

Data fluctuations removal process, eliminates the inherent data fluctuation that capture sequencing brings；

GC aligning steps, GC corrections are carried out according to the G/C content in each window；And

Output step, output CNV testing results (including for example, figure for showing CNV testing results, the moon of CNV variations Result of determination of property/positive etc.).

The sequencing data obtaining step of the method for the copy number variation detection of FFPE samples of the invention is obtained and uses two Sequencing data obtained from being sequenced to the DNA in FFPE samples to be checked for sequence measurement.The Mainstream Platform one of two generations sequencing As (Sequencing By Synthesis, SBS) technology be sequenced carry out nucleic acid sequencing using in synthesis.Sequencing before, it is necessary to The structure of sequencing library is carried out to nucleic acid (DNA or RNA) sample, basic procedure is as follows：The DNA after fragmentation is carried out into piece first Section end repair, fragment 3' ends after repair add " A " base afterwards, then by above-mentioned DNA fragmentation with contain sequencing primer DNA joints (Adapter) connection of binding site, is expanded finally by PCR, is completed sequencing library and is built.For specific Two generation sequence measurements be not particularly limited, any two generations sequence measurement well known by persons skilled in the art can be used.

Preferably, the Primary Stage Data process step divides the window using slip window sampling.

Preferably, the normalization step is calculated the Z values in sample to be checked each window according to following formula (1), Zi represents i-th Z value of window in formula (1),

Z_i=trimScale (Z_i,Z_i)……(1)。

Preferably, defined formula (2)：

Definition

Wherein, chr represents chromosome, S_TRepresent FFPE samples to be checked, S_NRepresent healthy population sample；

The context vault screening step is filtered out so that the d according to FFPE samples to be checked and the Z values of healthy population sample It is worth n minimum Healthy People sample, the context vault sample set S after being screened₁,S₂,S₃,…,S_n(N, n are natural number and n ＜ N).

Preferably, the data fluctuations removal process is to context vault matrix X_m×nSingular value decomposition is done, the m row r row factors are obtained Matrix U_m×r, r is factor number, and (the k factor i.e. in the top, k is generally 4- then to take the k maximum factor of contribution rate 10) LOESS recurrence is carried out, Residual Z is obtained_p。

Preferably, the GC aligning steps are according to the G/C content in each window, to Z_pReturned based on LOESS and do GC corrections, Obtain Residual Z_pg。

Preferably, the copy number mutation detection method also includes：

Data quality checking step, quality inspection is carried out to the sequencing data that the sequencing steps are obtained.Quality inspection includes but is not limited to example Such as remove low-quality short sequence, removal N content short sequence higher, remove the short sequence related to Adapter and most finish-unification The every related quality control index of meter.

Wherein, the preferred embodiment of above steps can refer to foregoing.

According to the present invention, there is provided a kind of detection sensitivity to FFPE samples CNV detection means and detection method higher.

Brief description of the drawings

Fig. 1 is the schematic diagram of the device for the copy number variation detection of FFPE samples of the invention.

Fig. 2 is figure of the embodiment 1 to the CNV testing results of breast cancer multiple gene.

The specific embodiment of invention

The scientific and technical terminology referred in this specification has the implication identical implication being generally understood that with those skilled in the art, It is defined if any definition of the conflict in this specification.

Definition

Reference gene group：The monoploid sequence of the complete set entrained by one cell or organism, including a full set of base Cause and intervening sequence.

Compare：Refer generally to sequence alignment, refer to determine the similitude between two or more sequences so that homology, and By they according to certain aligned transfer process.

Depth value：For certain site on genome, according to comparison result, the short sequence quantity in the site is covered i.e. It is the depth value in the site.

Window (sliding window)：Refer generally to one section of region of regular length on genome.

Context vault：The Sample Storehouse being made up of many cases (it is generally acknowledged that >=20) Healthy People sample.

Capture sequencing：By pre-designed probe, the specific region (region interested) on genome is carried out DNA fragmentation is captured, and the process of NGS sequencings is finally carried out to the DNA fragmentation for grabbing.

NGS (high-flux sequence)：High throughput sequencing technologies (High-throughput sequencing) are also known as " next Generation " sequencing technologies (" Next-generation " sequencing technology), with can once parallel to hundreds of thousands to several It is mark that million DNA moleculars carry out sequencing and the general shorter grade of length of reading.

Normalization (Z values)：

trimScale(w,v)：It is the value that certain needs is normalized to define w, and v is certain data set

A. the data for removing the upper and lower certain percentages of v are obtained

B. calculateMean μ and standard deviation sigma

C. it is calculatedAs final result

SVD (singular value decomposition)：SVD is a kind of important matrix decomposition in linear algebra, is positive rule in matrix analysis The popularization of battle array unitarily diagonalizable.There is important application in fields such as signal transacting, statistics.Its effect is that data set is mapped to low-dimensional In space.The characteristic value (being characterized with singular value in SVD) of data set is arranged according to importance, and the process of dimensionality reduction is exactly to give up The process of unessential characteristic vector, and the space of remaining characteristic vector composition is the space after dimensionality reduction.

Embodiment

More specific description is carried out to the present invention by the following examples.It should be appreciated that embodiment described herein is It is of the invention not for limiting for explaining the present invention.

Embodiment 1

Using the device detected for FFPE samples copy number variation of the invention to the group of Female breast cancer patients The CNV situations for knitting FFPE samples are detected.

1.1 DNA for extracting FFPE samples

Using GeneRead DNA FFPE Kit (QIAGEN companies), extraction operation is carried out according to handbook explanation, obtained FFPE sample DNAs.

1.2 samples are interrupted

Instrument being interrupted using Biorupter and entering Break Row, setting interrupts 30 circulations of condition, and 30s ON/30s OFF will FFPE sample DNAs are broken into the fragment of 200bp or so, the DNA fragmentation after being interrupted.

Repair (End Repair) in 1.3 ends

(1) reagent needed for being taken out from -20 DEG C of kits of preservation in advance, single sample amount of preparation is referring to table 1.

Table 1

(2) reaction is repaired in end：1.5mL centrifuge tubes are placed in 20 DEG C of warm bath 30 in Thermomixer after adding DNA sample Minute.Reaction uses the DNA in 1.8 × nucleic acid purification magnetic bead recovery purifying reaction system after terminating, be dissolved in 32 μ LEB.

1.4 ends add " A " (A-Tailing)

(1) reagent needed for being taken out from -20 DEG C of kits of preservation in advance, single sample amount of preparation is referring to table 2：

Table 2

(2) end adds " A " to react：32 μ L previous steps are added to be placed in 1.5mL centrifuge tubes after purifying the DNA for reclaiming 37 DEG C of warm bath 30 minutes in Thermomixer.Using the DNA in 1.8 × nucleic acid purification magnetic bead recovery purifying reaction system, it is dissolved in In 18 μ L EB.

The connection (Adapter Ligation) of 1.5 joints

(1) reagent needed for being taken out from -20 DEG C of kits of preservation in advance, single sample amount of preparation is referring to table 3：

Table 3

(2) coupled reaction of joint：18 μ L previous steps are added to be placed in sample tube after purifying the DNA for reclaiming 20 DEG C of warm bath 15 minutes in Thermomixer.Using the DNA in 1.8 × nucleic acid purification magnetic bead recovery purifying reaction system, it is dissolved in In the EB of 30 μ L.

1.6 PCR react

(1) reagent needed for being taken out from -20 DEG C of kits of preservation, prepares PCR reaction systems in the PCR pipe of 2mL：

Table 4

(2) PCR programs are set, the program setting of PCR reactions is as follows：

Reaction terminates timely take out sample and is put into 4 DEG C of Refrigerator stores and exits on request or close instrument.

(3) with the DNA in 0.9 × nucleic acid purification magnetic bead recovery purifying reaction system, library after purification is dissolved in 20 μ L's In ddH2O.Qubit detections are carried out to library, by library censorship Agilent 2100.

1.7 breast cancer target areas capture chip libraries hybridization

(1) in this experiment, for provide hybrid capture reaction ionic environment buffer solution and for elute physics inhale Attached or non-specific hybridization cleaning fluid, rinsing liquid are commercially obtained.

(2) Hybrid Library is prepared：By DNA library to be hybridized in thawed on ice, the μ g of gross mass 1 are taken (in subsequent operation step This DNA library is referred to as sample library in rapid).

(3) Ann primers Pool is prepared：By the corresponding Tag primer In1 of sample library Index (100 μM) and consensus primer (1000 μM) respectively take 1000pmol mixing, (this mixture is referred to as into Ann primer pool in subsequent process steps).

(4) preparation of sample is hybridized：To adding 5 μ L COT DNA (Human Cot-1DNA, Life in 1.5mL EP pipes Technologies, 1mg/mL), 1 μ g samples library, Ann primers pool.The hybridization sample EP for preparing is sealed with sealed membrane Pipe, the EP pipes that will fill sample library pool/COT DNA/Ann primers pool are placed in vacuum plant until being completely dried.

(5) solution of sample is hybridized：To being added in the dry powder of sample library pool/COT DNA/Ann primers pool：

7.5 μ 2 × hybridization buffers of L

3 μ L hybridization components A

(6) said mixture is placed on preprepared 95 DEG C of heating modules after fully mixing is denatured 10 minutes.

(7) said mixture is transferred in the 0.2mL flat cover PCR pipes containing 4.5 μ L capture chips.Fully be vortexed concussion 3 seconds, Hybridization samples mixture is placed in 47 DEG C of heating module upper 16 hours.The hot lid temperature of heating module need to be set as 57 DEG C, Product need to subsequently be eluted reclaimer operation after hybridization.

(8) by 10 × cleaning fluid (I, II and III), 10 × rinsing liquid and 2.5 × magnetic bead cleaning fluid be configured to 1 × working solution.

Table 5

(9) following reagent is preheated in 47 DEG C of heating modules：

400 μ 1 × rinsing liquids of L

100 μ 1 × cleaning fluids of L I

1.8 prepare affine absorption magnetic bead

(1) by Streptavidin MagneSphere (Dynabeads M-280Streptavidin, hereinafter referred to as magnetic bead) at room temperature After 30 minutes, magnetic bead is fully vortexed balance mixing 15 seconds.

(2) to 100 μ L magnetic beads are dispensed in 1.5mL centrifuge tubes, the centrifuge tube that will fill 100 μ L magnetic beads is placed on magnetic frame, Careful suction abandons supernatant after about 5 minutes, plus twice magnetic bead initial volume 1 × magnetic bead cleaning fluid, be vortexed and mix 10 seconds.To fill The centrifuge tube of magnetic bead puts back to magnetic frame, adsorbs magnetic bead.Treat that solution is clarified, supernatant is abandoned in suction.Time step is repeated, is washed twice altogether.

(3) inhaled after washing is finished and abandon magnetic bead cleaning fluid, with 1 × magnetic bead cleaning fluid resuspended magnetic bead of vortex of magnetic bead initial volume It is transferred in the PCR pipe of 0.2mL.PCR pipe is placed on magnetic frame suction after adsorbing magnetic bead clarification and abandons supernatant.

The combination and rinsing of 1.9 DNA and affine absorption magnetic bead

(1) the sample library of hybridization is transferred in the 0.2mL PCR pipes for filling affine absorption magnetic bead, vortex oscillation is mixed.

(2) 0.2mL PCR pipes are placed in 47 DEG C of heating modules 45 minutes, were vortexed every 15 minutes and mixed once, make DNA with Magnetic bead is combined.

After (3) 45 minutes are incubated, to 47 DEG C of μ L of 1 × cleaning fluid I 100 of preheating of addition in the DNA sample that 15 μ L are captured. It is vortexed and mixes 10 seconds.Whole components in 0.2mL PCR pipes are transferred in 1.5mL centrifuge tubes.1.5mL centrifuge tubes are placed in magnetic force Magnetic bead is adsorbed on frame, supernatant is abandoned.

(4) 1.5mL centrifuge tubes are removed from magnetic frame, the 1 × rinsing liquid for adding 200 μ L to preheat 47 DEG C.Mixing is played in suction 10 times (need to operate rapidly, prevent reagent, sample temperature to be less than 47 DEG C).Sample is placed in 47 DEG C of heating module upper 5 minutes after mixing. This step is repeated, is washed twice altogether with 47 DEG C of 1 × rinsing liquid.The centrifuge tube of 1.5mL is placed on magnetic frame, magnetic bead is adsorbed, Abandon supernatant.

(5) to 1 × cleaning fluid I that 200 μ L room temperatures are added in above-mentioned 1.5mL centrifuge tubes, it is vortexed and mixes 2 minutes.Will centrifugation Pipe is placed on magnetic frame, adsorbs magnetic bead, abandons supernatant.To 1 × cleaning fluid II that 200 μ L room temperatures are added in above-mentioned 1.5mL centrifuge tubes, It is vortexed and mixes 1 minute.Centrifuge tube is placed on magnetic frame, magnetic bead is adsorbed, supernatant is abandoned.To adding 200 in above-mentioned 1.5mL centrifuge tubes 1 × the cleaning fluid III of μ L room temperatures, is vortexed and mixes 30 seconds.Centrifuge tube is placed on magnetic frame, magnetic bead is adsorbed, supernatant is abandoned.

(6) 1.5mL centrifuge tubes are removed from magnetic frame, add 45 μ L PCR water, dissolving wash-out magnetic capture sample.

The PCR amplifications of 1.10 capture dnas

(1) according to the form below prepares PCR mix after capture, and the concussion that is vortexed after preparing is mixed.Enriching primer F and enriching primer R It is purchased from Invitrogen Corp..

(2) the amplification program setting of magnetic bead adsorption of DNA PCR is as follows：

(3) recovery purifying of hybrid capture DNA PCR primers：With in nucleic acid purification magnetic bead recovery purifying reaction system DNA, magnetic bead usage amount is 0.9 ×, library after purification is dissolved in the ddH of 30 μ L₂In O.

1.11 libraries quantify

2100 Bio Analyzer (Agilent)/LabChip GX (Caliper) and QPCR detections, note are carried out to library Record library concentration.

Machine sequencing on 1.12 libraries

The library for building is sequenced with NextSeq 550AR.

1.13 data processing and inversions

Copied at the result that number variation detection means is sequenced to machine on 1.12 libraries using FFPE samples of the invention Reason analysis.

The FFPE samples copy number variation detection means of embodiment 1 includes following modules.

Sequencing data acquisition module：

Capture survey is carried out to breast cancer FFPE samples to be detected using breast cancer target area capture chip for obtaining Sequence obtains sequencing data.

Data quality checking module：

Data quality checking is carried out to sequencing data, the low short sequence of average mass values is filtered out, N content short sequence high is filtered out Row, filter out the short sequence related to Adapter, the sequencing data C for being filtered.

Sequence alignment module：

Using the sequencing data C by filtering, short sequence alignment is carried out with reference to genome HG19 with people, obtain comparison result A.The depth value in each site on genome is calculated according to comparison result A, result D is obtained.

Primary Stage Data processing module：

By cancerous target region division it is certain length and has the window of overlap, removes the depth extreme value in window and calculate Depth intermediate value, and the G/C content of reference gene group sequence in the window is calculated, obtain result X.

Normalization module：

With reference to result X and D, according to formula Z_i=trimScale (Z_i,Z_i) it is calculated genomic DNA to be detected each window Intraoral Z values.

Context vault screening module：

Definition

Chr is the meaning of chromosome, and St represents sample to be detected, and Sn represents context vault sample.

According to genomic DNA to be checked and the Z values of context vault, the context vault sample for causing that d values are minimum is filtered out, screened Context vault sample set S afterwards₁,S₂,S₃,…,S_n。

Z values using this n sample in m window build matrix X_m×nIt is stand-by as context vault.

Data fluctuations cancellation module：

To context vault matrix X_m×nSingular value decomposition is done, m row n row factor matrixs U is obtained_m×n, n is factor number.Take contribution The maximum several factors of rate carry out LOESS recurrence, obtain Residual Z_p。

GC correction modules：

According to the G/C content in m window, to Z_pReturned based on LOESS and do GC corrections, obtain Residual Z_pg。

Output module：

Output module is used to show the figure of CNV testing results.

Testing result is as shown in Fig. 2 each dot in figure is a Z for window_pgValue.Wherein, PIK3CA with Two genes of ERBB2 detect copy number increase.

1.14 result verifications

Same patient original tumour flesh tissue carries out reverse transcription after extracting RNA, using QPCR method validations PIK3CA and Whether the expression quantity of ERBB2 genes raises, and the result is consistent with 1.13 testing results.Detection means of the invention can succeed Detect the copy number variation of FFPE samples.

Industrial applicibility

FFPE samples CNV detection means of the invention and detection method can significantly increase the detection sensitivity of CNV.

Claims

1. a kind of to copy the device that number variation is detected for FFPE samples, it includes：

Sequencing data acquisition module, for obtaining capture sequencing data from FFPE samples to be checked and from healthy population sample This sequencing data, the healthy population sample is multiple Healthy People samples；

Sequence alignment module, it is connected with the sequencing data acquisition module, for the sequencing data acquisition module to be obtained Sequencing data compare with reference gene group sequence, obtain comparison result, each site is calculated according to the comparison result Depth value；

Primary Stage Data processing module, it is connected with the sequence alignment module, for target area to be divided into certain length There is the window of overlap, remove the depth extreme value in site in window and calculate depth average or intermediate value, and calculate the ginseng in the window Examine the G/C content of genome sequence；

Normalization module, it is connected with the Primary Stage Data processing module, obtained by the Primary Stage Data processing module Each window in depth average or intermediate value be normalized, be calculated FFPE samples to be checked and healthy population sample be every Z values in individual window；

Context vault screening module, it is connected with the normalization module, for according to FFPE samples to be checked and healthy population sample Z values, filter out n Healthy People sample, obtain the n context vault sample set of Healthy People sample, then use the n Healthy People Z value of the sample in m window builds the matrix X of m rows n row_m×n；

Data fluctuations cancellation module, it is connected with the context vault screening module, for eliminating the intrinsic number that capture sequencing brings According to fluctuation；

GC correction modules, it is connected with the data fluctuations cancellation module, is rectified for carrying out GC according to the G/C content in each window Just；

Output module, it is connected with the GC correction modules, for exporting CNV testing results.

2. device according to claim 1, wherein, the sequencing data is the sequencing number obtained using capture sequence measurement According to.

3. device according to claim 1, wherein, the Primary Stage Data processing module divides described using slip window sampling Window.

4. device according to claim 1, wherein, the normalization module is calculated to be checked according to following formula (1) Z values in biological specimen each window, Zi represents i-th Z value of window in formula (1),

Z_i=trimScale (Z_i,Z_i)……(1)。

5. device according to claim 1, wherein, defined formula (2)：

Definition

Wherein, chr represents chromosome, S_TRepresent sample to be checked, S_NHealthy population sample is represented,

The context vault screening module according to the Z values of FFPE samples to be checked and healthy population sample, filter out so that the d values most N small Healthy People sample, the context vault sample set S after being screened₁,S₂,S₃,…,S_n。

6. device according to claim 1, wherein, the data fluctuations cancellation module is to context vault matrix X_m×nDo unusual Value is decomposed, and obtains m row r row factor matrixs U_m×r, r is factor number, and then taking the k maximum factor of contribution rate is carried out LOESS times Return, obtain Residual Z_p。

7. device according to claim 6, wherein, the GC correction modules according to the G/C content in each window, to Z_pIt is based on LOESS is returned and is done GC corrections, obtains Residual Z_pg。

8. device according to claim 1, also including data quality checking module, itself and the sequencer module and sequence ratio Module is connected, for carrying out quality inspection to the sequencing data that the sequencer module is obtained.