CN106367512A

CN106367512A - Method and system for identifying tumor loads in samples

Info

Publication number: CN106367512A
Application number: CN201610842333.8A
Authority: CN
Inventors: 薄世平; 梁覃斯; 任军; 陆思嘉
Original assignee: Shanghai Xukang Medical Technology Co Ltd
Current assignee: Shanghai Yikang Medical Laboratory Co Ltd
Priority date: 2016-09-22
Filing date: 2016-09-22
Publication date: 2017-02-01
Also published as: WO2018054254A1; TW201814290A; TWI670495B

Abstract

The invention provides a method and device for identifying tumor loads in samples, and particularly provides a method for identifying tumor loads in the samples in a non-diagnostic mode. The method comprises the steps that 1, a sample to be tested is provided; 2, sequencing is performed on the sample to be tested, and therefore a genome sequence of the sample is obtained; 3, the genome sequence obtained in the step 2 is compared with a reference genome, and therefore position information of the genome sequence on the reference genome is obtained; 4, the reference genome is divided into M region fragments, wherein each region fragment is a window b, and a copying number of each window b is calculated; 5, each window b in the step 4 is subjected to Z testing, so that a Z value of each window b is calculated; 6, according to the Z value obtained in the step 5, GAS is calculated, and tumor loads in the sample to be tested are identified on the basis of the value of GAS. According to the method and system, sensitivity and universality of tumor detection can be improved.

Description

The method and system of tumor load in a kind of identification sample

Technical field

This area is related to biological technical field, in particular it relates to a kind of method and system identifying tumor load in sample.

Background technology

In biomedical scientific research and clinical practice field, the tumor cell of tumor patient often has substantial amounts of gene Group copy number variation.Copy number variation may be present in tumor tissues, body fluid (as blood, interstitial fluid, lymph fluid, cerebrospinal fluid, Urine, saliva etc.) in, be specifically present in body fluid free circulating tumor cell (ctc), extracellular dissociate dna (cfdna), Excretion body etc..In body fluid, the situation of genome copies number variation is the important indicator of identification tumor load, and identification tumor load can It is applied to tumor early screening, diagnosis, the state of an illness monitoring of patient, prognosis treatment etc..

The main method of detection Oncogenome copy number variation has at present: comparative genome hybridization (comparative Genomic hybridization, cgh), fluorescent quantitation pcr (realtime fluorescence quantitative Pcr, rtfq pcr), fluorescence in situ hybridization (fluorescence in situ hybridization, fish), reconnect more Probe amplification technology (multiplex ligation-dependent probe amplification, mlpa).

However, comparative genome hybridization resolution ratio is relatively low, mb level, flux is low, high cost；Fluorescent quantitation pcr is equally logical Measure low, high cost, once can only survey a copy number variation；Fluorescence in situ hybridization, just for ad-hoc location, resolution is low, visits Pin hybridization efficiency is unstable；Multiplex ligation-dependent probe amplification, complex operation, flux is low, high cost, and coverage is little, easily causes Pcr pollutes.Except above-mentioned technical defect, above technology for detection major part is just for region specific on genome, and tumor Heterogeneity is very strong, specific one or several site can not effectively in overall merit body fluid tumor load.

Therefore, this area in the urgent need to develop a kind of can more effectively in overall merit body fluid tumor load, improve and swell The susceptiveness of tumor detection and the method and apparatus of versatility.

Content of the invention

The present invention provide a kind of can more effectively in overall merit body fluid tumor load, the susceptiveness of raising lesion detection Method and apparatus with versatility.

A kind of method that first aspect present invention identifies tumor load in sample with providing nondiagnostic, including step:

I () provides a sample to be tested；

(ii) described sample to be tested is sequenced, thus obtaining the genome sequence of described sample；

(iii) genome sequence that step (ii) obtains is compared with reference gene group, thus obtaining genome sequence It is listed in the positional information in reference gene group；

(iv) described reference gene group is divided into m region segments, wherein each region segments is a window b, meter Calculate the copy number of each window b；

V () carries out z inspection to each window b of step (iv), thus calculating the z value of each window b；With

(vi) according to the z value obtained by step (v), genome randomness (gas), the number based on genome randomness are calculated Value identifies the tumor load in described sample to be tested.

In another preference, described reference gene group can be continuous or discontinuous.

In another preference, described reference gene group includes full-length genome.

In another preference, described reference gene group refers to the total length of all chromosomes of this species (as people), wall scroll or many A part for the total length of bar chromosome, wall scroll or a plurality of chromosome or a combination thereof.

In another preference, the coverage rate of described reference gene group reaches more than the 50% of full-length genome, it is preferred that More than 60%, more preferably, more than 70%, more preferably, more than 80%, most preferably, more than 95%.

In another preference, described sample is derived from individuality to be detected.

In another preference, described individuality to be detected is people or non-human mammal.

In another preference, described sample is solid sample or liquid sample.

In another preference, described sample includes body fluid sample.

In another preference, described sample is selected from the group: blood, blood plasma, interstitial fluid, lymph fluid, cerebrospinal fluid, urine Liquid, saliva, aqueous humor, seminal fluid or a combination thereof.

In another preference, described sample is selected from the group: free circulating tumor cell (ctc), extracellular dissociate dna (cfdna), excretion body or a combination thereof.

In another preference, described sequencing is selected from the group: single-ended sequencing, both-end sequencing or a combination thereof.

In another preference, described step (iv) also includes correcting the copy number of each window b, calculates each window b The step of the copy number after correction.

In another preference, described bearing calibration is selected from the group: loess correction, the method for weighting, residual error method or a combination thereof.

In another preference, the positional information in reference gene group is listed according to genome sequence, statistics falls each window The sequence number of mouth b, base distribution, the base distribution of reference gene group.

In another preference, the sequence according to each window b and base contentses, correct the copy number of each window b.

The z value of each window b in another preference, is calculated with following formula:

z_{i} = \frac{x_{i} - μ_{i}}{σ_{i}};

Wherein, i is any positive integer of 1 to m；The total quantity of the window that m is divided into for reference gene group, wherein m are >=50 Positive integer, it is preferred that 50≤m≤10⁵, more preferably, 100≤m≤10⁵, most preferably, 200≤m≤10⁵；x_iFor described to be measured Sample is in i-th window b_iThe copy numerical value of detection；b_iFor i-th window；μ_iFor normal control sample in window b_iCopy number Arithmetic mean of instantaneous value, calculated with equation below:

μ_{i} = \frac{σ_{j = 1}^{n} x_{j}}{n};

Wherein, j is any positive integer of 1 to n；N is the total quantity of normal control sample, wherein n is >=30 positive integer, It is preferred that 30≤n≤10⁸, more preferably, 50≤n≤10⁷, most preferably, 100≤n≤10⁴；x_jRefer to j-th normal control sample to exist Described window b_iThe copy numerical value of detection；σ_iFor normal control sample in described window b_iCopy number standard deviation, with public as follows Formula calculates:

σ_{i} = \sqrt{\frac{1}{n} σ_{j = 1}^{n} {(x_{j} - μ_{i})}^{2}};

In formula, n, j, x_jAnd μ_iIt is as defined above.

In another preference, described normal control sample refers to the similar sample of the normal person of same species.

In another preference, with following formula calculate genome randomness:

g a s = σ_{i = m_{b}}^{p_{b}} | z_{i} |;

Wherein, m_bFor sorting in the window of m%, p_bFor sorting in the window of pth %, m is 30-98, it is preferred that 40- 97, more preferably, 60-96, most preferably, 80-95, most preferably, 95, p is 80-100, it is preferred that 85-100, more preferably, 90-100, Most preferably, 100, and p-m >=2 (it is preferred that >=5, more preferably, >=10, more preferably, >=15, most preferably, >=20).

In another preference, before described calculating genome randomness, comprise the steps:

A () removes the high pass such as centromere, telomere, satellite, heterochromatin on genome according to reference gene group sequence signature Measure the region that sequence does not detect, remove centromere on genome, telomere, satellite, the region of the neighbouring l length of heterochromatin, l is little Any length in 3m；Or

B () removes the high flux such as centromere, telomere, satellite, heterochromatin on genome according to the copy number feature of sample The region not detected.

In another preference, also comprise the steps: before described step (v)

(iv1) copy number of each window b according to step (iv), calculates the change of each window b in normal control sample Different coefficient cv_i；With

(iv2) by described cv_iSort from small to large, remove the window of maximum front n%, wherein, n is more than 0, less than etc. In 5 any number, it is preferred that n=1,2,2.5,3,3.1,4,4.2 or 5.

In another preference, described coefficient of variation cv_iCalculated with following formula:

{cv}_{i} = \frac{σ_{i}}{μ_{i}};

Wherein, μ_iFor the arithmetic mean of instantaneous value of normal control sample copy number, calculated with equation below:

μ_{i} = \frac{σ_{j = 1}^{n} x_{j}}{n};

σ_iFor the standard deviation of normal control sample copy number, calculated with equation below:

σ_{i} = \sqrt{\frac{1}{n} σ_{j = 1}^{n} {(x_{j} - μ_{i})}^{2}};

In formula, n, j, x_j、μ_iAnd σ_iIt is as defined above.

Second aspect present invention provides a kind of system (equipment) for identifying tumor load in sample, comprising:

Sequencing unit, described sequencing unit is used for carrying out nucleic acid sequencing to sample to be tested, thus obtaining the base of described sample Because organizing sequence；

Comparing unit, described comparing unit is connected with described sequencing unit, the genome of the described sample for obtaining Sequence is compared with reference gene group, thus obtaining the positional information that genome sequence is listed in reference gene group；

Calculate and verification unit, described calculating is connected with verification unit and described comparing unit, for calculating described reference The copy number of each window b of genome, and z inspection is carried out to each window, thus calculating the z value of each window b；And

Identification unit, described identification unit is connected with verification unit with described calculating, for the value according to obtained z, counts Calculate genome randomness (gas), and the numerical value based on genome randomness identifies the tumor load in sample.

In another preference, described system also includes correcting unit, described correction unit and described calculating and checklist Unit is connected, for correcting the copy number of each window b of described reference gene group, thus calculating copying after each window b correction Shellfish number.

In another preference, in described calculating with verification unit, before z inspection is carried out to each window b, can basis The copy number of each window b, calculates coefficient of variation cv of each window b_i, and by described cv_iSort from small to large, remove maximum Front n% window, wherein, n is more than 0, any number less than or equal to 5, it is preferred that n=1,2,2.5,3,3.1,4,4.2 Or 5.

It should be understood that within the scope of the present invention, above-mentioned each technical characteristic of the present invention and having in below (eg embodiment) Can be combined with each other between each technical characteristic of body description, thus constituting new or preferred technical scheme.As space is limited, exist This no longer tires out one by one states.

Brief description

Fig. 1 shows the analysis method flow chart identifying tumor load in body fluid.

Fig. 2 shows the tumor load testing result in patient's difference clinical application cycle.

Fig. 3 shows s1-7 full-length genome copy number variation and corresponding gas.

Specific embodiment

The present inventor, by extensively in-depth study, establishes a kind of effective first and can improve the sensitive of lesion detection Property and versatility identification sample in tumor load method, specifically, by calculating genome randomness (gas), thus base Numerical value in genome randomness identifies the tumor load in sample.

Additionally, present invention also offers a kind of system (equipment) identifying tumor load in sample, described system (equipment) Including: sequencing unit；Comparing unit；Calculate and verification unit and identification unit.In a preference of the present invention, also include Correction unit.On this basis, the present inventor completes the present invention.

Term

As used herein, term " copy number variation (copy number variations, cnv) " refers to sample genome Chromosome or chromosome segment copy number are abnormal, including but not limited to chromosome aneuploid, disappearance, repetition, more than 1000bp Micro-deleted, micro- repetition of base.

As used herein, term " genome confusion angle value (genomic abnormality score, gas) " is basis Sample genome chromosome or the extremely calculated score value of chromosome segment copy number, score value detection range includes but is not limited to Full-length genome, specific chromosome, chromosome segment, specific gene.

As used herein, term " z value (z-score) " is also standard score (standard score), is a numerical value The difference process divided by standard deviation again with average.It is formulated as:

Z score=(x- μ)/σ

Wherein x is a certain concrete numerical value, and μ is arithmetic mean of instantaneous value, and σ is standard deviation；Z value represents raw value and with reference to flat The distance between average, is to be calculated in units of standard deviation.

As used herein, term " part alleviates (pr, partial response) " refers to the minimizing of target focus maximum diameter sum >=30%, at least maintain 4 weeks.

As used herein, term " progression of disease (pd, progressive disease) " refers to target focus maximum diameter sum extremely Reduce and add >=20%, or new focus occurs.

As used herein, term " system ", " equipment " are identical meanings.

Reference gene group

In the present invention, taking people as a example, described reference gene group can be full-length genome or portion gene group. And, described reference gene group can be continuous or discontinuous.When described reference gene group is portion gene group When, total coverage rate (f) of described reference gene group is more than the 50% of full-length genome, it is preferred that it is preferred that more than 60%, more Goodly, more than 70%, more preferably, more than 80%, most preferably, more than 95%, wherein, described total coverage rate (f) refers to reference gene Group accounts for the percentage ratio of full-length genome.

In a preferred embodiment, described reference gene group is full-length genome.

In a preferred embodiment, described reference gene group is the total length of all chromosomes of this species (as people), wall scroll Or the part for total length, wall scroll or a plurality of chromosome of a plurality of chromosome or a combination thereof.

Tumor load

In the present invention, described " tumor load " refers to the extent of injury to body for the tumor, the size of such as tumor, tumor Active degree, the transfer case of tumor, the tumor of the different parts degree of danger to body.Some evaluate the index of tumor load Including but not limited to: tumor size, tumor marker height, clinical symptoms (breathe heavily suppress, pain etc.), related complication (on Vena Cava Syndrome etc.), Expenditure Levels (anemia, hypoproteinemia etc.).

Sequencing

In the present invention, conventional sequencing technologies and platform is can use to be sequenced.Microarray dataset is not particularly limited, wherein Second filial generation microarray dataset includes but is not limited to: ga, gaii, gaiix, hiseq1000/2000/2500/ of illumina company 3000/4000、x ten、x five、nextseq500/550、miseq、miseqdx、miseq fgx、miniseq；applied The solid of biosystems；The 454flx of roche；Thermo fisher scientific's (life technologies) ion torrent、ion pgm、ion proton i/ii；Bgiseq1000, bgiseq500, bgiseq100 of Hua Da gene； The bioelectronseq 4000 of rich biology group difficult to understand；The da8600 of Da'an Gene Company, Zhongshan University；Bei Rui and The nextseq cn500 of health；The purple prosperous bigis of section in subsidiary under purple prosperous Pharmaceutical；Hua Yinkang gene hyk-pstar-iia.

Third generation single-molecule sequencing platform includes but is not limited to: the heliscope of helicos biosciences company System, the smrt system of pacific bioscience, the gridion of oxford nanopore technologies, minion.Sequencing type can be single-ended (single end) sequencing or both-end (paired end) is sequenced, and sequencing length can be 30bp, 40bp, 50bp, 100bp, 300bp etc. be more than 30bp random length, sequencing depth can be genome 0.01,0.02, 0.1st, any multiple being more than 0.01 such as 1,5,10,30 times.

In the present invention, it is preferred to the hiseq2500 high-flux sequence platform of illumina company, sequencing type is single-ended (single end) is sequenced, and be sequenced length 41bp, and sequencing data amount is 5m.

Data processing

In the present invention, data processing generally includes following steps:

A () carries out nucleic acid extraction, sequencing to the genome of sample to be tested, to obtain genome sequence；

B the genome sequence of described sample is compared reference gene group by (), obtain position in reference gene group for the sequence Put；

C reference gene group is divided into the window of certain length by (), calculate the copy number of each window b；

D () carries out z inspection to each window b, calculate the z value of each window；With

E () calculates genome randomness (gas).

Wherein, in step (a), specifically also include: the type of described sample to be tested be body fluid, body fluid can be blood, Interstitial fluid (abbreviation tissue fluid or intercellular fluid), lymph fluid, cerebrospinal fluid, urine, saliva, detection target is to contain in body fluid Dna, dna be specifically present in free circulating tumor cell (ctc), extracellular dissociate dna (cfdna), excretion body etc..Described The extracting mode of sample to be tested dna includes but is not limited to: pillar is extracted, magnetic bead extracts.Sample is carried out with library construction, adopts High-flux sequence platform, is sequenced to sample.

Wherein, in step (b), specifically also include: sequencing result is removed joint and low quality data, compares reference Genome.Reference gene group can be full-length genome, any chromosome, a part for chromosome.Reference gene group generally selects It is recognized the sequence of determination, such as the genome of people can be hg18 (grch18), hg19 (grch19), the hg38 of ncbi or ucsc (grch38), or any item chromosome and chromosome a part.Compare software and can use any free or business software, As bwa (burrows-wheeler alignment tool), soapaligner/soap2 (short oligonucleotide analysis package)、bowtie/bowtie2.By sequence alignment to reference gene group, obtain sequence on genome Position.Unique sequence comparing on genome can be selected, remove the sequence that on genome, many places compare, eliminate repetitive sequence The error brought is calculated to copy number.

Wherein, in step (c), specifically also include: genome is divided into the window of certain length, according to the data surveyed Amount, length of window can also be 100bp-3, identical or different integer in the range of 000,000bp (3m).The quantity of window is permissible It is the arbitrary integer in the range of 1,000-30,000,000.According to position on genome for the sequence surveyed, statistics falls each The sequence number of window, base distribution, the base distribution of reference gene group.Sequence according to each window and base gc content, Correct the copy number of each window, bearing calibration includes but is not limited to loess correction, calculates the copy after each window correction Number.

Wherein, in step (d), specifically also include: take the sample of n (n is the natural number no less than 30) individual normal person, with The extraction of sample, build storehouse, sequencing condition, repeat the above steps (a)-(c), as reference data set.For each window b_i, all right Answer n normal copy number value.

Calculate the arithmetic mean of instantaneous value μ of normal control sample copy number_i, arithmetic mean of instantaneous value μ_iComputing formula is:

μ_{i} = \frac{σ_{j = 1}^{n} x_{j}}{n};

Calculate the standard deviation sigma of normal control sample copy number_i, the computing formula of standard deviation is:

σ_{i} = \sqrt{\frac{1}{n} σ_{j = 1}^{n} {(x_{j} - μ_{i})}^{2}};

x₁,x₂,x₃,......x_jCopy numerical value for normal sample.

Calculate sample to be detected each window b_iZ value, the computing formula of z value is:

z_{i} = \frac{x_{i} - μ_{i}}{σ_{i}};

x_iFor window b_iThe copy numerical value of detection.

Wherein, in step (e), specifically also include: in whole gene group, certain chromosome, chromosome segment or gene There is high repeat region in surrounding, such as the region such as nearly centromere, telomere, satellite, heterochromatin.Remove high repeat region first, with Eliminate the impact that randomness is calculated.

In a preferred embodiment, the method for removal includes but is not limited to:

A. removed according to reference gene group sequence signature

Remove the region that on genome, the high-flux sequence such as centromere, telomere, satellite, heterochromatin does not detect, remove base Because of the region of l length near the upper centromere of group, telomere, satellite, heterochromatin, l can be any length less than 3m；Or

B. the copy number feature according to normal sample removes

For each window bi, calculate coefficient of variation cv in this window for the normal control sample_i(coefficient of Variation), cv_iComputing formula is:

{cv}_{i} = \frac{σ_{i}}{μ_{i}};

μ_iFor the arithmetic mean of instantaneous value of normal control sample copy number, σ_iStandard deviation for normal control sample copy number.

Cv sorts from small to large, removes the window of maximum front n%, n can be the Arbitrary Digit more than 0, less than or equal to 5 Value.

Wherein, in step (e), specifically also include the calculation of genome randomness (gas):

Determine the detection range of randomness first, detection range includes but is not limited to whole gene group, specific chromosome, spy Determine the arbitrary value in the range of genome length (as the genome about 3g of people) for the 1m such as chromosome segment or specific gene.Mixed In random degree detection range, the z value removing the window of repetitive sequence impact takes absolute value, and z value absolute value sorts from small to large, and will Sorted z value absolute value is evenly distributed in the range of 0%-100%, and wherein z value absolute value minima is allocated to 0%, z value The maximum of absolute value is assigned to 100%.Calculate tired corresponding to each window z value absolute value in the range of m% to pth % Evaluation, wherein, m is 30-98, it is preferred that 40-97, more preferably, 60-96, most preferably, 80-95, most preferably, 95；P is 80- 100, it is preferred that 85-100, more preferably, 90-100, most preferably, 100, and p-m >=2 (preferably >=5, more preferably >=10, more preferably Ground >=15, most preferably >=20), described aggregate-value is genome randomness (gas), and computing formula is:

g a s = σ_{i = m_{b}}^{p_{b}} | z_{i} |;

m_bFor sorting in the window of m%, p_bFor sorting in the window of pth %.Born with tumor in the value identification body fluid of gas Lotus.

The method of tumor load in identification sample

In the present invention, there is provided a kind of effectively and can improve in the susceptiveness of lesion detection and the identification sample of versatility The method of tumor load, including step:

I () provides a sample to be tested；

In a preference of the present invention, methods described includes step:

A () carries out nucleic acid extraction, sequencing to sample genome, to obtain genome sequence；

B (), by sequence alignment to reference gene group, obtains position on genome for the sequence；

C reference gene group is divided into the window b of certain length by (), calculate the copy number of each window b；And

D () carries out z inspection to each window b, calculate the z value of each window b；Calculate genome randomness (gas), thus Numerical value based on genome randomness identifies the tumor load in sample.

The system (equipment) of tumor load in identification sample

In the present invention, additionally provide a kind of system (equipment) of tumor load in identification sample, comprising:

In a preferred embodiment, described system also includes correcting unit, described correction unit and described calculating and inspection Verification certificate unit is connected, for correcting the copy number of each window b of described reference gene group, thus after calculating each window b correction Copy number.

Main advantages of the present invention include:

(1) present invention sets up a kind of method and system of tumor load in identification sample, the method for the present invention and be first System can be accurate and effective identification sample in tumor load.

(2) method of the present invention and system can improve susceptiveness and the versatility of lesion detection.

(3) misery that when method of the present invention and system can reduce tumor patient detection, sampling brings, realizes Non-invasive detection.

(4) method of the present invention and system can effectively detect the patient that some conventional sense cannot sample；

(5) method of the present invention and system can be monitored medication curative effect, to doctor's medication, control to tumor patient real-time detection Treat and make certain guidance.

With reference to specific embodiment, state the present invention further.It should be understood that these embodiments are merely to illustrate the present invention Rather than restriction the scope of the present invention.The experimental technique of unreceipted detailed conditions in the following example, generally according to conventional strip Part such as sambrook et al., molecular cloning: laboratory manual (new york:cold spring harbor laboratory Press, 1989) condition described in, or according to the condition proposed by manufacturer.Unless otherwise indicated, otherwise percentage ratio and Number is calculated by weight.

Unless otherwise specified, otherwise the material used by embodiment is commercially available prod.

Embodiment 1

The present invention has been applied to 15 examples, and obtains good effect.In order that the usage of the present invention and effect are more Plus should be readily appreciated that and grasp, will cite an actual example below and be further elaborated.Implement outline flowchart as shown in figure 1, Implementation process in detail is as follows:

1. pair sample genome carries out nucleic acid extraction, sequencing

In the present embodiment, detection samples sources are certain gastic cancer patients, extract in blood free dna (cfdna) and Leukocyte.Nucleic acid extraction adopts the cw2603 nucleic acid extraction kit that health is century bio tech ltd, and extracting method is pressed The product description operation providing for century bio tech ltd according to health.

Build storehouse test kit and carry out library construction for the cw2185 of century bio tech ltd using health, upper machine sequencing. Upper machine sequencing is using the hiseq2500 high-flux sequence platform of illumina company, the explanation providing according to illumina company Book operates.Sequencing type is that single-ended (single end) is sequenced, and be sequenced length 41bp, and sequencing data amount is 5m.

2., by sequence alignment to reference gene group, obtain position on genome for the sequence

Sequencing result is removed joint and low quality data, compares reference gene group.Reference gene group is the gene of people The hg19 (grch19) of group ucsc, comparison software is bwa (burrows-wheeler alignment tool), using acquiescence ginseng Number, sequence alignment to reference gene group obtains position on genome for the sequence, selects unique sequence comparing on genome Row.

3. reference gene group is divided into the window of certain length, calculates the copy number of each window

Genome is divided into 15489 window b (region), each window b length is 200k, according to sequence on genome Position, statistics falls the sequence number of each window b, base distribution, the base distribution of reference gene group.According to each window The sequence of b and base gc content, correct the copy number of each window b, and bearing calibration is loess, after calculating each window b correction Copy number.

4. calculate the cv value of each window

Take the sample of 100 normal persons, same extraction, build storehouse, sequencing condition, repeat above-mentioned 1,2,3 steps, just obtain Often check sample data, as reference data set, calculates sample to be detected each window b_iCv value.

For each window b_i, all correspond to the individual normal copy number value of n (the present embodiment n=100).

μ_{i} = \frac{σ_{j = 1}^{n} x_{j}}{n};

σ_{i} = \sqrt{\frac{1}{n} σ_{j = 1}^{n} {(x_{j} - μ_{i})}^{2}};

x₁,x₂,x₃,......x_jCopy numerical value for normal sample.

Calculate sample to be detected each window b_iCv value, the computing formula of cv value is:

{cv}_{i} = \frac{σ_{i}}{μ_{i}} .

5. pair each window carries out z inspection, calculates the z value of each window

z_{i} = \frac{x_{i} - μ_{i}}{σ_{i}};

x_iFor window b_iThe copy numerical value of detection, μ_iFor the arithmetic mean of instantaneous value of normal control sample copy number, σ_iFor normally right The standard deviation of this copy number in the same old way, computing formula is with step 4.

6. calculate genome randomness (gas)

In the present embodiment, each window cv sorts from small to large, removes the window of maximum front 5%, is not involved in following Randomness calculates.The detection range of randomness is whole gene group；Z value takes absolute value, and sorts from small to large, calculates m% To the aggregate-value of pth % window z value absolute value, its aggregate-value as genome randomness (gas).Computing formula is:

g a s = σ_{i = m_{b}}^{p_{b}} | z_{i} |;

m_bFor sorting in the window of m%, p_bFor sorting in the window of pth %, wherein, m is 95, p is 100.With gas's Tumor load in value identification body fluid.

7. testing result

More than ten sample is detected.The situation of one typical pathologic is as follows.

Testing result is as shown in table 1, Fig. 2 and Fig. 3.

Table 1 embodiment 1 does tumor load testing result to the clinical application effect of certain patients with gastric cancer

Result shows, before patient clinical medication, is diagnosed as gastric cancer, now cfdna copy number severely subnormal (Fig. 3 s1), entirely Genome randomness is 999.84, and in blood, tumor load is more serious.

Along with medication, normal to period 4 cfdna copy number, full-length genome randomness is 728.80, and normally white Cell 729.86 is close.

With the present embodiment identical method, calculate the full-length genome randomness of above-mentioned 100 normal persons, normal range is 722.87-739.89, arithmetic average 733.22, the full-length genome confusion angle value of the present embodiment the 4th medication cycle and leukocyte Within normal range, tumor load very little in blood is described, it is corresponding for commenting effect result pr (part is alleviated) with its clinic.

With further medication, tumor develops immunity to drugs, and cfdna copy number abnormal conditions become seriously again, and full-length genome mixes Random degree score value becomes big, and in blood, tumor load becomes serious, and to medication the 7th cycle, full-length genome randomness highest, with its clinic Effect result pd (progression of disease) is commented to be corresponding.

Result shows, genome randomness can effectively identify the tumor load in body fluid.

The all documents referring in the present invention are all incorporated as reference in this application, independent just as each document It is incorporated as with reference to like that.In addition, it is to be understood that after the above-mentioned teachings having read the present invention, those skilled in the art can To make various changes or modifications to the present invention, these equivalent form of values equally fall within the model that the application appended claims are limited Enclose.

Claims

1. the method for tumor load in sample is identified it is characterised in that including step in a kind of nondiagnostic ground:

I () provides a sample to be tested；

(iii) genome sequence that step (ii) obtains is compared with reference gene group, thus obtain genome sequence being listed in Positional information in reference gene group；

(iv) described reference gene group is divided into m region segments, wherein each region segments is a window b, calculates every The copy number of individual window b；

(vi) according to the z value obtained by step (v), calculate genome randomness (gas), the numerical value mirror based on genome randomness Tumor load in fixed described sample to be tested.

2. the method for claim 1 is it is characterised in that described reference gene group includes full-length genome.

3. method as claimed in claim 1 or 2 is it is characterised in that the coverage rate of described reference gene group reaches full-length genome More than 50%, it is preferred that more than 60%, more preferably, more than 70%, more preferably, more than 80%, most preferably, more than 95%.

4. the method for claim 1 is it is characterised in that described sample is selected from the group: blood, blood plasma, interstitial fluid, Lymph fluid, cerebrospinal fluid, urine, saliva, aqueous humor, seminal fluid or a combination thereof.

5. the method for claim 1 is it is characterised in that described step (iv) also includes correcting the copy of each window b Number, the step calculating the copy number after each window b correction.

6. the method for claim 1 it is characterised in that calculate the z value of each window b with following formula:

z_{i} = \frac{x_{i} - μ_{i}}{σ_{i}};

Wherein, i is any positive integer of 1 to m；The total quantity of the window that m is divided into for reference gene group, wherein m be >=50 just Integer, it is preferred that 50≤m≤10⁵, more preferably, 100≤m≤10⁵, most preferably, 200≤m≤10⁵；x_iFor described sample to be tested In i-th window b_iThe copy numerical value of detection；b_iFor i-th window；μ_iFor normal control sample in window b_iCopy number calculation Art meansigma methodss, are calculated with equation below:

μ_{i} = \frac{σ_{j = 1}^{n} x_{j}}{n};

Wherein, j is any positive integer of 1 to n；N is the total quantity of normal control sample, wherein n is >=30 positive integer, preferably Ground, 30≤n≤10⁸, more preferably, 50≤n≤10⁷, most preferably, 100≤n≤10⁴；x_jRefer to j-th normal control sample described Window b_iThe copy numerical value of detection；σ_iFor normal control sample in described window b_iCopy number standard deviation, use equation below meter Calculate:

σ_{i} = \sqrt{\frac{1}{n} σ_{j = 1}^{n} {(x_{j} - μ_{i})}^{2}};

In formula, n, j, x_jAnd μ_iIt is as defined above.

7. the method for claim 1 it is characterised in that with following formula calculate genome randomness:

g a s = σ_{i = m_{b}}^{p_{b}} | z_{i} |;

Wherein, m_bFor sorting in the window of m%, p_bFor sorting in the window of pth %, m is 30-98, it is preferred that 40-97, more Goodly, 60-96, most preferably, 80-95, most preferably, 95, p is 80-100, it is preferred that 85-100, more preferably, 90-100, most preferably Ground, 100, and p-m >=2 (it is preferred that >=5, more preferably, >=10, more preferably, >=15, most preferably, >=20).

8. the method for claim 1 is it is characterised in that also comprise the steps: before described step (v)

(iv1) copy number of each window b according to step (iv), calculates the Variation Lines of each window b in normal control sample Number cv_i；

(iv2) by described cv_iSort from small to large, remove the window of maximum front n%, wherein, n is more than 0, less than or equal to 5 Any number, it is preferred that n=1,2,2.5,3,3.1,4,4.2 or 5.

9. method as claimed in claim 8 is it is characterised in that described coefficient of variation cv_iCalculated with following formula:

{cv}_{i} = \frac{σ_{i}}{μ_{i}};

μ_{i} = \frac{σ_{j = 1}^{n} x_{j}}{n};

σ_{i} = \sqrt{\frac{1}{n} σ_{j = 1}^{n} {(x_{j} - μ_{i})}^{2}};

In formula, n, j, x_j、μ_iAnd σ_iIt is as defined above.

10. a kind of system for identifying tumor load in sample is it is characterised in that include:

Sequencing unit, described sequencing unit is used for carrying out nucleic acid sequencing to sample to be tested, thus obtaining the genome of described sample Sequence；

Comparing unit, described comparing unit is connected with described sequencing unit, the genome sequence of the described sample for obtaining Compare with reference gene group, thus obtaining the positional information that genome sequence is listed in reference gene group；

Calculate and verification unit, described calculating is connected with verification unit and described comparing unit, for calculating described reference gene The copy number of each window b of group, and z inspection is carried out to each window, thus calculating the z value of each window b；And

Identification unit, described identification unit is connected with verification unit with described calculating, for the value according to obtained z, calculates base Because of group randomness (gas), and the numerical value based on genome randomness identifies the tumor load in sample.