CN106367512A - Method and system for identifying tumor loads in samples - Google Patents

Method and system for identifying tumor loads in samples Download PDF

Info

Publication number
CN106367512A
CN106367512A CN201610842333.8A CN201610842333A CN106367512A CN 106367512 A CN106367512 A CN 106367512A CN 201610842333 A CN201610842333 A CN 201610842333A CN 106367512 A CN106367512 A CN 106367512A
Authority
CN
China
Prior art keywords
window
sample
genome
value
reference gene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610842333.8A
Other languages
Chinese (zh)
Inventor
薄世平
梁覃斯
任军
陆思嘉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Yikang Medical Laboratory Co Ltd
Original Assignee
Shanghai Xukang Medical Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xukang Medical Technology Co Ltd filed Critical Shanghai Xukang Medical Technology Co Ltd
Priority to CN201610842333.8A priority Critical patent/CN106367512A/en
Publication of CN106367512A publication Critical patent/CN106367512A/en
Priority to PCT/CN2017/101573 priority patent/WO2018054254A1/en
Priority to TW106131581A priority patent/TWI670495B/en
Pending legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Organic Chemistry (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Immunology (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a method and device for identifying tumor loads in samples, and particularly provides a method for identifying tumor loads in the samples in a non-diagnostic mode. The method comprises the steps that 1, a sample to be tested is provided; 2, sequencing is performed on the sample to be tested, and therefore a genome sequence of the sample is obtained; 3, the genome sequence obtained in the step 2 is compared with a reference genome, and therefore position information of the genome sequence on the reference genome is obtained; 4, the reference genome is divided into M region fragments, wherein each region fragment is a window b, and a copying number of each window b is calculated; 5, each window b in the step 4 is subjected to Z testing, so that a Z value of each window b is calculated; 6, according to the Z value obtained in the step 5, GAS is calculated, and tumor loads in the sample to be tested are identified on the basis of the value of GAS. According to the method and system, sensitivity and universality of tumor detection can be improved.

Description

The method and system of tumor load in a kind of identification sample
Technical field
This area is related to biological technical field, in particular it relates to a kind of method and system identifying tumor load in sample.
Background technology
In biomedical scientific research and clinical practice field, the tumor cell of tumor patient often has substantial amounts of gene Group copy number variation.Copy number variation may be present in tumor tissues, body fluid (as blood, interstitial fluid, lymph fluid, cerebrospinal fluid, Urine, saliva etc.) in, be specifically present in body fluid free circulating tumor cell (ctc), extracellular dissociate dna (cfdna), Excretion body etc..In body fluid, the situation of genome copies number variation is the important indicator of identification tumor load, and identification tumor load can It is applied to tumor early screening, diagnosis, the state of an illness monitoring of patient, prognosis treatment etc..
The main method of detection Oncogenome copy number variation has at present: comparative genome hybridization (comparative Genomic hybridization, cgh), fluorescent quantitation pcr (realtime fluorescence quantitative Pcr, rtfq pcr), fluorescence in situ hybridization (fluorescence in situ hybridization, fish), reconnect more Probe amplification technology (multiplex ligation-dependent probe amplification, mlpa).
However, comparative genome hybridization resolution ratio is relatively low, mb level, flux is low, high cost;Fluorescent quantitation pcr is equally logical Measure low, high cost, once can only survey a copy number variation;Fluorescence in situ hybridization, just for ad-hoc location, resolution is low, visits Pin hybridization efficiency is unstable;Multiplex ligation-dependent probe amplification, complex operation, flux is low, high cost, and coverage is little, easily causes Pcr pollutes.Except above-mentioned technical defect, above technology for detection major part is just for region specific on genome, and tumor Heterogeneity is very strong, specific one or several site can not effectively in overall merit body fluid tumor load.
Therefore, this area in the urgent need to develop a kind of can more effectively in overall merit body fluid tumor load, improve and swell The susceptiveness of tumor detection and the method and apparatus of versatility.
Content of the invention
The present invention provide a kind of can more effectively in overall merit body fluid tumor load, the susceptiveness of raising lesion detection Method and apparatus with versatility.
A kind of method that first aspect present invention identifies tumor load in sample with providing nondiagnostic, including step:
I () provides a sample to be tested;
(ii) described sample to be tested is sequenced, thus obtaining the genome sequence of described sample;
(iii) genome sequence that step (ii) obtains is compared with reference gene group, thus obtaining genome sequence It is listed in the positional information in reference gene group;
(iv) described reference gene group is divided into m region segments, wherein each region segments is a window b, meter Calculate the copy number of each window b;
V () carries out z inspection to each window b of step (iv), thus calculating the z value of each window b;With
(vi) according to the z value obtained by step (v), genome randomness (gas), the number based on genome randomness are calculated Value identifies the tumor load in described sample to be tested.
In another preference, described reference gene group can be continuous or discontinuous.
In another preference, described reference gene group includes full-length genome.
In another preference, described reference gene group refers to the total length of all chromosomes of this species (as people), wall scroll or many A part for the total length of bar chromosome, wall scroll or a plurality of chromosome or a combination thereof.
In another preference, the coverage rate of described reference gene group reaches more than the 50% of full-length genome, it is preferred that More than 60%, more preferably, more than 70%, more preferably, more than 80%, most preferably, more than 95%.
In another preference, described sample is derived from individuality to be detected.
In another preference, described individuality to be detected is people or non-human mammal.
In another preference, described sample is solid sample or liquid sample.
In another preference, described sample includes body fluid sample.
In another preference, described sample is selected from the group: blood, blood plasma, interstitial fluid, lymph fluid, cerebrospinal fluid, urine Liquid, saliva, aqueous humor, seminal fluid or a combination thereof.
In another preference, described sample is selected from the group: free circulating tumor cell (ctc), extracellular dissociate dna (cfdna), excretion body or a combination thereof.
In another preference, described sequencing is selected from the group: single-ended sequencing, both-end sequencing or a combination thereof.
In another preference, described step (iv) also includes correcting the copy number of each window b, calculates each window b The step of the copy number after correction.
In another preference, described bearing calibration is selected from the group: loess correction, the method for weighting, residual error method or a combination thereof.
In another preference, the positional information in reference gene group is listed according to genome sequence, statistics falls each window The sequence number of mouth b, base distribution, the base distribution of reference gene group.
In another preference, the sequence according to each window b and base contentses, correct the copy number of each window b.
The z value of each window b in another preference, is calculated with following formula:
z i = x i - μ i σ i ;
Wherein, i is any positive integer of 1 to m;The total quantity of the window that m is divided into for reference gene group, wherein m are >=50 Positive integer, it is preferred that 50≤m≤105, more preferably, 100≤m≤105, most preferably, 200≤m≤105;xiFor described to be measured Sample is in i-th window biThe copy numerical value of detection;biFor i-th window;μiFor normal control sample in window biCopy number Arithmetic mean of instantaneous value, calculated with equation below:
μ i = σ j = 1 n x j n ;
Wherein, j is any positive integer of 1 to n;N is the total quantity of normal control sample, wherein n is >=30 positive integer, It is preferred that 30≤n≤108, more preferably, 50≤n≤107, most preferably, 100≤n≤104;xjRefer to j-th normal control sample to exist Described window biThe copy numerical value of detection;σiFor normal control sample in described window biCopy number standard deviation, with public as follows Formula calculates:
σ i = 1 n σ j = 1 n ( x j - μ i ) 2 ;
In formula, n, j, xjAnd μiIt is as defined above.
In another preference, described normal control sample refers to the similar sample of the normal person of same species.
In another preference, with following formula calculate genome randomness:
g a s = σ i = m b p b | z i | ;
Wherein, mbFor sorting in the window of m%, pbFor sorting in the window of pth %, m is 30-98, it is preferred that 40- 97, more preferably, 60-96, most preferably, 80-95, most preferably, 95, p is 80-100, it is preferred that 85-100, more preferably, 90-100, Most preferably, 100, and p-m >=2 (it is preferred that >=5, more preferably, >=10, more preferably, >=15, most preferably, >=20).
In another preference, before described calculating genome randomness, comprise the steps:
A () removes the high pass such as centromere, telomere, satellite, heterochromatin on genome according to reference gene group sequence signature Measure the region that sequence does not detect, remove centromere on genome, telomere, satellite, the region of the neighbouring l length of heterochromatin, l is little Any length in 3m;Or
B () removes the high flux such as centromere, telomere, satellite, heterochromatin on genome according to the copy number feature of sample The region not detected.
In another preference, also comprise the steps: before described step (v)
(iv1) copy number of each window b according to step (iv), calculates the change of each window b in normal control sample Different coefficient cvi;With
(iv2) by described cviSort from small to large, remove the window of maximum front n%, wherein, n is more than 0, less than etc. In 5 any number, it is preferred that n=1,2,2.5,3,3.1,4,4.2 or 5.
In another preference, described coefficient of variation cviCalculated with following formula:
cv i = σ i μ i ;
Wherein, μiFor the arithmetic mean of instantaneous value of normal control sample copy number, calculated with equation below:
μ i = σ j = 1 n x j n ;
σiFor the standard deviation of normal control sample copy number, calculated with equation below:
σ i = 1 n σ j = 1 n ( x j - μ i ) 2 ;
In formula, n, j, xj、μiAnd σiIt is as defined above.
Second aspect present invention provides a kind of system (equipment) for identifying tumor load in sample, comprising:
Sequencing unit, described sequencing unit is used for carrying out nucleic acid sequencing to sample to be tested, thus obtaining the base of described sample Because organizing sequence;
Comparing unit, described comparing unit is connected with described sequencing unit, the genome of the described sample for obtaining Sequence is compared with reference gene group, thus obtaining the positional information that genome sequence is listed in reference gene group;
Calculate and verification unit, described calculating is connected with verification unit and described comparing unit, for calculating described reference The copy number of each window b of genome, and z inspection is carried out to each window, thus calculating the z value of each window b;And
Identification unit, described identification unit is connected with verification unit with described calculating, for the value according to obtained z, counts Calculate genome randomness (gas), and the numerical value based on genome randomness identifies the tumor load in sample.
In another preference, described system also includes correcting unit, described correction unit and described calculating and checklist Unit is connected, for correcting the copy number of each window b of described reference gene group, thus calculating copying after each window b correction Shellfish number.
In another preference, in described calculating with verification unit, before z inspection is carried out to each window b, can basis The copy number of each window b, calculates coefficient of variation cv of each window bi, and by described cviSort from small to large, remove maximum Front n% window, wherein, n is more than 0, any number less than or equal to 5, it is preferred that n=1,2,2.5,3,3.1,4,4.2 Or 5.
It should be understood that within the scope of the present invention, above-mentioned each technical characteristic of the present invention and having in below (eg embodiment) Can be combined with each other between each technical characteristic of body description, thus constituting new or preferred technical scheme.As space is limited, exist This no longer tires out one by one states.
Brief description
Fig. 1 shows the analysis method flow chart identifying tumor load in body fluid.
Fig. 2 shows the tumor load testing result in patient's difference clinical application cycle.
Fig. 3 shows s1-7 full-length genome copy number variation and corresponding gas.
Specific embodiment
The present inventor, by extensively in-depth study, establishes a kind of effective first and can improve the sensitive of lesion detection Property and versatility identification sample in tumor load method, specifically, by calculating genome randomness (gas), thus base Numerical value in genome randomness identifies the tumor load in sample.
Additionally, present invention also offers a kind of system (equipment) identifying tumor load in sample, described system (equipment) Including: sequencing unit;Comparing unit;Calculate and verification unit and identification unit.In a preference of the present invention, also include Correction unit.On this basis, the present inventor completes the present invention.
Term
As used herein, term " copy number variation (copy number variations, cnv) " refers to sample genome Chromosome or chromosome segment copy number are abnormal, including but not limited to chromosome aneuploid, disappearance, repetition, more than 1000bp Micro-deleted, micro- repetition of base.
As used herein, term " genome confusion angle value (genomic abnormality score, gas) " is basis Sample genome chromosome or the extremely calculated score value of chromosome segment copy number, score value detection range includes but is not limited to Full-length genome, specific chromosome, chromosome segment, specific gene.
As used herein, term " z value (z-score) " is also standard score (standard score), is a numerical value The difference process divided by standard deviation again with average.It is formulated as:
Z score=(x- μ)/σ
Wherein x is a certain concrete numerical value, and μ is arithmetic mean of instantaneous value, and σ is standard deviation;Z value represents raw value and with reference to flat The distance between average, is to be calculated in units of standard deviation.
As used herein, term " part alleviates (pr, partial response) " refers to the minimizing of target focus maximum diameter sum >=30%, at least maintain 4 weeks.
As used herein, term " progression of disease (pd, progressive disease) " refers to target focus maximum diameter sum extremely Reduce and add >=20%, or new focus occurs.
As used herein, term " system ", " equipment " are identical meanings.
Reference gene group
In the present invention, taking people as a example, described reference gene group can be full-length genome or portion gene group. And, described reference gene group can be continuous or discontinuous.When described reference gene group is portion gene group When, total coverage rate (f) of described reference gene group is more than the 50% of full-length genome, it is preferred that it is preferred that more than 60%, more Goodly, more than 70%, more preferably, more than 80%, most preferably, more than 95%, wherein, described total coverage rate (f) refers to reference gene Group accounts for the percentage ratio of full-length genome.
In a preferred embodiment, described reference gene group is full-length genome.
In a preferred embodiment, described reference gene group is the total length of all chromosomes of this species (as people), wall scroll Or the part for total length, wall scroll or a plurality of chromosome of a plurality of chromosome or a combination thereof.
Tumor load
In the present invention, described " tumor load " refers to the extent of injury to body for the tumor, the size of such as tumor, tumor Active degree, the transfer case of tumor, the tumor of the different parts degree of danger to body.Some evaluate the index of tumor load Including but not limited to: tumor size, tumor marker height, clinical symptoms (breathe heavily suppress, pain etc.), related complication (on Vena Cava Syndrome etc.), Expenditure Levels (anemia, hypoproteinemia etc.).
Sequencing
In the present invention, conventional sequencing technologies and platform is can use to be sequenced.Microarray dataset is not particularly limited, wherein Second filial generation microarray dataset includes but is not limited to: ga, gaii, gaiix, hiseq1000/2000/2500/ of illumina company 3000/4000、x ten、x five、nextseq500/550、miseq、miseqdx、miseq fgx、miniseq;applied The solid of biosystems;The 454flx of roche;Thermo fisher scientific's (life technologies) ion torrent、ion pgm、ion proton i/ii;Bgiseq1000, bgiseq500, bgiseq100 of Hua Da gene; The bioelectronseq 4000 of rich biology group difficult to understand;The da8600 of Da'an Gene Company, Zhongshan University;Bei Rui and The nextseq cn500 of health;The purple prosperous bigis of section in subsidiary under purple prosperous Pharmaceutical;Hua Yinkang gene hyk-pstar-iia.
Third generation single-molecule sequencing platform includes but is not limited to: the heliscope of helicos biosciences company System, the smrt system of pacific bioscience, the gridion of oxford nanopore technologies, minion.Sequencing type can be single-ended (single end) sequencing or both-end (paired end) is sequenced, and sequencing length can be 30bp, 40bp, 50bp, 100bp, 300bp etc. be more than 30bp random length, sequencing depth can be genome 0.01,0.02, 0.1st, any multiple being more than 0.01 such as 1,5,10,30 times.
In the present invention, it is preferred to the hiseq2500 high-flux sequence platform of illumina company, sequencing type is single-ended (single end) is sequenced, and be sequenced length 41bp, and sequencing data amount is 5m.
Data processing
In the present invention, data processing generally includes following steps:
A () carries out nucleic acid extraction, sequencing to the genome of sample to be tested, to obtain genome sequence;
B the genome sequence of described sample is compared reference gene group by (), obtain position in reference gene group for the sequence Put;
C reference gene group is divided into the window of certain length by (), calculate the copy number of each window b;
D () carries out z inspection to each window b, calculate the z value of each window;With
E () calculates genome randomness (gas).
Wherein, in step (a), specifically also include: the type of described sample to be tested be body fluid, body fluid can be blood, Interstitial fluid (abbreviation tissue fluid or intercellular fluid), lymph fluid, cerebrospinal fluid, urine, saliva, detection target is to contain in body fluid Dna, dna be specifically present in free circulating tumor cell (ctc), extracellular dissociate dna (cfdna), excretion body etc..Described The extracting mode of sample to be tested dna includes but is not limited to: pillar is extracted, magnetic bead extracts.Sample is carried out with library construction, adopts High-flux sequence platform, is sequenced to sample.
Wherein, in step (b), specifically also include: sequencing result is removed joint and low quality data, compares reference Genome.Reference gene group can be full-length genome, any chromosome, a part for chromosome.Reference gene group generally selects It is recognized the sequence of determination, such as the genome of people can be hg18 (grch18), hg19 (grch19), the hg38 of ncbi or ucsc (grch38), or any item chromosome and chromosome a part.Compare software and can use any free or business software, As bwa (burrows-wheeler alignment tool), soapaligner/soap2 (short oligonucleotide analysis package)、bowtie/bowtie2.By sequence alignment to reference gene group, obtain sequence on genome Position.Unique sequence comparing on genome can be selected, remove the sequence that on genome, many places compare, eliminate repetitive sequence The error brought is calculated to copy number.
Wherein, in step (c), specifically also include: genome is divided into the window of certain length, according to the data surveyed Amount, length of window can also be 100bp-3, identical or different integer in the range of 000,000bp (3m).The quantity of window is permissible It is the arbitrary integer in the range of 1,000-30,000,000.According to position on genome for the sequence surveyed, statistics falls each The sequence number of window, base distribution, the base distribution of reference gene group.Sequence according to each window and base gc content, Correct the copy number of each window, bearing calibration includes but is not limited to loess correction, calculates the copy after each window correction Number.
Wherein, in step (d), specifically also include: take the sample of n (n is the natural number no less than 30) individual normal person, with The extraction of sample, build storehouse, sequencing condition, repeat the above steps (a)-(c), as reference data set.For each window bi, all right Answer n normal copy number value.
Calculate the arithmetic mean of instantaneous value μ of normal control sample copy numberi, arithmetic mean of instantaneous value μiComputing formula is:
μ i = σ j = 1 n x j n ;
Calculate the standard deviation sigma of normal control sample copy numberi, the computing formula of standard deviation is:
σ i = 1 n σ j = 1 n ( x j - μ i ) 2 ;
x1,x2,x3,......xjCopy numerical value for normal sample.
Calculate sample to be detected each window biZ value, the computing formula of z value is:
z i = x i - μ i σ i ;
xiFor window biThe copy numerical value of detection.
Wherein, in step (e), specifically also include: in whole gene group, certain chromosome, chromosome segment or gene There is high repeat region in surrounding, such as the region such as nearly centromere, telomere, satellite, heterochromatin.Remove high repeat region first, with Eliminate the impact that randomness is calculated.
In a preferred embodiment, the method for removal includes but is not limited to:
A. removed according to reference gene group sequence signature
Remove the region that on genome, the high-flux sequence such as centromere, telomere, satellite, heterochromatin does not detect, remove base Because of the region of l length near the upper centromere of group, telomere, satellite, heterochromatin, l can be any length less than 3m;Or
B. the copy number feature according to normal sample removes
For each window bi, calculate coefficient of variation cv in this window for the normal control samplei(coefficient of Variation), cviComputing formula is:
cv i = σ i μ i ;
μiFor the arithmetic mean of instantaneous value of normal control sample copy number, σiStandard deviation for normal control sample copy number.
Cv sorts from small to large, removes the window of maximum front n%, n can be the Arbitrary Digit more than 0, less than or equal to 5 Value.
Wherein, in step (e), specifically also include the calculation of genome randomness (gas):
Determine the detection range of randomness first, detection range includes but is not limited to whole gene group, specific chromosome, spy Determine the arbitrary value in the range of genome length (as the genome about 3g of people) for the 1m such as chromosome segment or specific gene.Mixed In random degree detection range, the z value removing the window of repetitive sequence impact takes absolute value, and z value absolute value sorts from small to large, and will Sorted z value absolute value is evenly distributed in the range of 0%-100%, and wherein z value absolute value minima is allocated to 0%, z value The maximum of absolute value is assigned to 100%.Calculate tired corresponding to each window z value absolute value in the range of m% to pth % Evaluation, wherein, m is 30-98, it is preferred that 40-97, more preferably, 60-96, most preferably, 80-95, most preferably, 95;P is 80- 100, it is preferred that 85-100, more preferably, 90-100, most preferably, 100, and p-m >=2 (preferably >=5, more preferably >=10, more preferably Ground >=15, most preferably >=20), described aggregate-value is genome randomness (gas), and computing formula is:
g a s = σ i = m b p b | z i | ;
mbFor sorting in the window of m%, pbFor sorting in the window of pth %.Born with tumor in the value identification body fluid of gas Lotus.
The method of tumor load in identification sample
In the present invention, there is provided a kind of effectively and can improve in the susceptiveness of lesion detection and the identification sample of versatility The method of tumor load, including step:
I () provides a sample to be tested;
(ii) described sample to be tested is sequenced, thus obtaining the genome sequence of described sample;
(iii) genome sequence that step (ii) obtains is compared with reference gene group, thus obtaining genome sequence It is listed in the positional information in reference gene group;
(iv) described reference gene group is divided into m region segments, wherein each region segments is a window b, meter Calculate the copy number of each window b;
V () carries out z inspection to each window b of step (iv), thus calculating the z value of each window b;With
(vi) according to the z value obtained by step (v), genome randomness (gas), the number based on genome randomness are calculated Value identifies the tumor load in described sample to be tested.
In a preference of the present invention, methods described includes step:
A () carries out nucleic acid extraction, sequencing to sample genome, to obtain genome sequence;
B (), by sequence alignment to reference gene group, obtains position on genome for the sequence;
C reference gene group is divided into the window b of certain length by (), calculate the copy number of each window b;And
D () carries out z inspection to each window b, calculate the z value of each window b;Calculate genome randomness (gas), thus Numerical value based on genome randomness identifies the tumor load in sample.
The system (equipment) of tumor load in identification sample
In the present invention, additionally provide a kind of system (equipment) of tumor load in identification sample, comprising:
Sequencing unit, described sequencing unit is used for carrying out nucleic acid sequencing to sample to be tested, thus obtaining the base of described sample Because organizing sequence;
Comparing unit, described comparing unit is connected with described sequencing unit, the genome of the described sample for obtaining Sequence is compared with reference gene group, thus obtaining the positional information that genome sequence is listed in reference gene group;
Calculate and verification unit, described calculating is connected with verification unit and described comparing unit, for calculating described reference The copy number of each window b of genome, and z inspection is carried out to each window, thus calculating the z value of each window b;And
Identification unit, described identification unit is connected with verification unit with described calculating, for the value according to obtained z, counts Calculate genome randomness (gas), and the numerical value based on genome randomness identifies the tumor load in sample.
In a preferred embodiment, described system also includes correcting unit, described correction unit and described calculating and inspection Verification certificate unit is connected, for correcting the copy number of each window b of described reference gene group, thus after calculating each window b correction Copy number.
Main advantages of the present invention include:
(1) present invention sets up a kind of method and system of tumor load in identification sample, the method for the present invention and be first System can be accurate and effective identification sample in tumor load.
(2) method of the present invention and system can improve susceptiveness and the versatility of lesion detection.
(3) misery that when method of the present invention and system can reduce tumor patient detection, sampling brings, realizes Non-invasive detection.
(4) method of the present invention and system can effectively detect the patient that some conventional sense cannot sample;
(5) method of the present invention and system can be monitored medication curative effect, to doctor's medication, control to tumor patient real-time detection Treat and make certain guidance.
With reference to specific embodiment, state the present invention further.It should be understood that these embodiments are merely to illustrate the present invention Rather than restriction the scope of the present invention.The experimental technique of unreceipted detailed conditions in the following example, generally according to conventional strip Part such as sambrook et al., molecular cloning: laboratory manual (new york:cold spring harbor laboratory Press, 1989) condition described in, or according to the condition proposed by manufacturer.Unless otherwise indicated, otherwise percentage ratio and Number is calculated by weight.
Unless otherwise specified, otherwise the material used by embodiment is commercially available prod.
Embodiment 1
The present invention has been applied to 15 examples, and obtains good effect.In order that the usage of the present invention and effect are more Plus should be readily appreciated that and grasp, will cite an actual example below and be further elaborated.Implement outline flowchart as shown in figure 1, Implementation process in detail is as follows:
1. pair sample genome carries out nucleic acid extraction, sequencing
In the present embodiment, detection samples sources are certain gastic cancer patients, extract in blood free dna (cfdna) and Leukocyte.Nucleic acid extraction adopts the cw2603 nucleic acid extraction kit that health is century bio tech ltd, and extracting method is pressed The product description operation providing for century bio tech ltd according to health.
Build storehouse test kit and carry out library construction for the cw2185 of century bio tech ltd using health, upper machine sequencing. Upper machine sequencing is using the hiseq2500 high-flux sequence platform of illumina company, the explanation providing according to illumina company Book operates.Sequencing type is that single-ended (single end) is sequenced, and be sequenced length 41bp, and sequencing data amount is 5m.
2., by sequence alignment to reference gene group, obtain position on genome for the sequence
Sequencing result is removed joint and low quality data, compares reference gene group.Reference gene group is the gene of people The hg19 (grch19) of group ucsc, comparison software is bwa (burrows-wheeler alignment tool), using acquiescence ginseng Number, sequence alignment to reference gene group obtains position on genome for the sequence, selects unique sequence comparing on genome Row.
3. reference gene group is divided into the window of certain length, calculates the copy number of each window
Genome is divided into 15489 window b (region), each window b length is 200k, according to sequence on genome Position, statistics falls the sequence number of each window b, base distribution, the base distribution of reference gene group.According to each window The sequence of b and base gc content, correct the copy number of each window b, and bearing calibration is loess, after calculating each window b correction Copy number.
4. calculate the cv value of each window
Take the sample of 100 normal persons, same extraction, build storehouse, sequencing condition, repeat above-mentioned 1,2,3 steps, just obtain Often check sample data, as reference data set, calculates sample to be detected each window biCv value.
For each window bi, all correspond to the individual normal copy number value of n (the present embodiment n=100).
Calculate the arithmetic mean of instantaneous value μ of normal control sample copy numberi, arithmetic mean of instantaneous value μiComputing formula is:
μ i = σ j = 1 n x j n ;
Calculate the standard deviation sigma of normal control sample copy numberi, the computing formula of standard deviation is:
σ i = 1 n σ j = 1 n ( x j - μ i ) 2 ;
x1,x2,x3,......xjCopy numerical value for normal sample.
Calculate sample to be detected each window biCv value, the computing formula of cv value is:
cv i = σ i μ i .
5. pair each window carries out z inspection, calculates the z value of each window
Calculate sample to be detected each window biZ value, the computing formula of z value is:
z i = x i - μ i σ i ;
xiFor window biThe copy numerical value of detection, μiFor the arithmetic mean of instantaneous value of normal control sample copy number, σiFor normally right The standard deviation of this copy number in the same old way, computing formula is with step 4.
6. calculate genome randomness (gas)
In the present embodiment, each window cv sorts from small to large, removes the window of maximum front 5%, is not involved in following Randomness calculates.The detection range of randomness is whole gene group;Z value takes absolute value, and sorts from small to large, calculates m% To the aggregate-value of pth % window z value absolute value, its aggregate-value as genome randomness (gas).Computing formula is:
g a s = σ i = m b p b | z i | ;
mbFor sorting in the window of m%, pbFor sorting in the window of pth %, wherein, m is 95, p is 100.With gas's Tumor load in value identification body fluid.
7. testing result
More than ten sample is detected.The situation of one typical pathologic is as follows.
Testing result is as shown in table 1, Fig. 2 and Fig. 3.
Table 1 embodiment 1 does tumor load testing result to the clinical application effect of certain patients with gastric cancer
Result shows, before patient clinical medication, is diagnosed as gastric cancer, now cfdna copy number severely subnormal (Fig. 3 s1), entirely Genome randomness is 999.84, and in blood, tumor load is more serious.
Along with medication, normal to period 4 cfdna copy number, full-length genome randomness is 728.80, and normally white Cell 729.86 is close.
With the present embodiment identical method, calculate the full-length genome randomness of above-mentioned 100 normal persons, normal range is 722.87-739.89, arithmetic average 733.22, the full-length genome confusion angle value of the present embodiment the 4th medication cycle and leukocyte Within normal range, tumor load very little in blood is described, it is corresponding for commenting effect result pr (part is alleviated) with its clinic.
With further medication, tumor develops immunity to drugs, and cfdna copy number abnormal conditions become seriously again, and full-length genome mixes Random degree score value becomes big, and in blood, tumor load becomes serious, and to medication the 7th cycle, full-length genome randomness highest, with its clinic Effect result pd (progression of disease) is commented to be corresponding.
Result shows, genome randomness can effectively identify the tumor load in body fluid.
The all documents referring in the present invention are all incorporated as reference in this application, independent just as each document It is incorporated as with reference to like that.In addition, it is to be understood that after the above-mentioned teachings having read the present invention, those skilled in the art can To make various changes or modifications to the present invention, these equivalent form of values equally fall within the model that the application appended claims are limited Enclose.

Claims (10)

1. the method for tumor load in sample is identified it is characterised in that including step in a kind of nondiagnostic ground:
I () provides a sample to be tested;
(ii) described sample to be tested is sequenced, thus obtaining the genome sequence of described sample;
(iii) genome sequence that step (ii) obtains is compared with reference gene group, thus obtain genome sequence being listed in Positional information in reference gene group;
(iv) described reference gene group is divided into m region segments, wherein each region segments is a window b, calculates every The copy number of individual window b;
V () carries out z inspection to each window b of step (iv), thus calculating the z value of each window b;With
(vi) according to the z value obtained by step (v), calculate genome randomness (gas), the numerical value mirror based on genome randomness Tumor load in fixed described sample to be tested.
2. the method for claim 1 is it is characterised in that described reference gene group includes full-length genome.
3. method as claimed in claim 1 or 2 is it is characterised in that the coverage rate of described reference gene group reaches full-length genome More than 50%, it is preferred that more than 60%, more preferably, more than 70%, more preferably, more than 80%, most preferably, more than 95%.
4. the method for claim 1 is it is characterised in that described sample is selected from the group: blood, blood plasma, interstitial fluid, Lymph fluid, cerebrospinal fluid, urine, saliva, aqueous humor, seminal fluid or a combination thereof.
5. the method for claim 1 is it is characterised in that described step (iv) also includes correcting the copy of each window b Number, the step calculating the copy number after each window b correction.
6. the method for claim 1 it is characterised in that calculate the z value of each window b with following formula:
z i = x i - μ i σ i ;
Wherein, i is any positive integer of 1 to m;The total quantity of the window that m is divided into for reference gene group, wherein m be >=50 just Integer, it is preferred that 50≤m≤105, more preferably, 100≤m≤105, most preferably, 200≤m≤105;xiFor described sample to be tested In i-th window biThe copy numerical value of detection;biFor i-th window;μiFor normal control sample in window biCopy number calculation Art meansigma methodss, are calculated with equation below:
μ i = σ j = 1 n x j n ;
Wherein, j is any positive integer of 1 to n;N is the total quantity of normal control sample, wherein n is >=30 positive integer, preferably Ground, 30≤n≤108, more preferably, 50≤n≤107, most preferably, 100≤n≤104;xjRefer to j-th normal control sample described Window biThe copy numerical value of detection;σiFor normal control sample in described window biCopy number standard deviation, use equation below meter Calculate:
σ i = 1 n σ j = 1 n ( x j - μ i ) 2 ;
In formula, n, j, xjAnd μiIt is as defined above.
7. the method for claim 1 it is characterised in that with following formula calculate genome randomness:
g a s = σ i = m b p b | z i | ;
Wherein, mbFor sorting in the window of m%, pbFor sorting in the window of pth %, m is 30-98, it is preferred that 40-97, more Goodly, 60-96, most preferably, 80-95, most preferably, 95, p is 80-100, it is preferred that 85-100, more preferably, 90-100, most preferably Ground, 100, and p-m >=2 (it is preferred that >=5, more preferably, >=10, more preferably, >=15, most preferably, >=20).
8. the method for claim 1 is it is characterised in that also comprise the steps: before described step (v)
(iv1) copy number of each window b according to step (iv), calculates the Variation Lines of each window b in normal control sample Number cvi
(iv2) by described cviSort from small to large, remove the window of maximum front n%, wherein, n is more than 0, less than or equal to 5 Any number, it is preferred that n=1,2,2.5,3,3.1,4,4.2 or 5.
9. method as claimed in claim 8 is it is characterised in that described coefficient of variation cviCalculated with following formula:
cv i = σ i μ i ;
Wherein, μiFor the arithmetic mean of instantaneous value of normal control sample copy number, calculated with equation below:
μ i = σ j = 1 n x j n ;
σiFor the standard deviation of normal control sample copy number, calculated with equation below:
σ i = 1 n σ j = 1 n ( x j - μ i ) 2 ;
In formula, n, j, xj、μiAnd σiIt is as defined above.
10. a kind of system for identifying tumor load in sample is it is characterised in that include:
Sequencing unit, described sequencing unit is used for carrying out nucleic acid sequencing to sample to be tested, thus obtaining the genome of described sample Sequence;
Comparing unit, described comparing unit is connected with described sequencing unit, the genome sequence of the described sample for obtaining Compare with reference gene group, thus obtaining the positional information that genome sequence is listed in reference gene group;
Calculate and verification unit, described calculating is connected with verification unit and described comparing unit, for calculating described reference gene The copy number of each window b of group, and z inspection is carried out to each window, thus calculating the z value of each window b;And
Identification unit, described identification unit is connected with verification unit with described calculating, for the value according to obtained z, calculates base Because of group randomness (gas), and the numerical value based on genome randomness identifies the tumor load in sample.
CN201610842333.8A 2016-09-22 2016-09-22 Method and system for identifying tumor loads in samples Pending CN106367512A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201610842333.8A CN106367512A (en) 2016-09-22 2016-09-22 Method and system for identifying tumor loads in samples
PCT/CN2017/101573 WO2018054254A1 (en) 2016-09-22 2017-09-13 Method and system for identifying tumor load in sample
TW106131581A TWI670495B (en) 2016-09-22 2017-09-14 Method and system for identifying tumor burden in a sample

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610842333.8A CN106367512A (en) 2016-09-22 2016-09-22 Method and system for identifying tumor loads in samples

Publications (1)

Publication Number Publication Date
CN106367512A true CN106367512A (en) 2017-02-01

Family

ID=57898089

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610842333.8A Pending CN106367512A (en) 2016-09-22 2016-09-22 Method and system for identifying tumor loads in samples

Country Status (3)

Country Link
CN (1) CN106367512A (en)
TW (1) TWI670495B (en)
WO (1) WO2018054254A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106755547A (en) * 2017-03-15 2017-05-31 上海亿康医学检验所有限公司 The Non-invasive detection and its recurrence monitoring method of a kind of carcinoma of urinary bladder
WO2018054254A1 (en) * 2016-09-22 2018-03-29 上海亿康医学检验所有限公司 Method and system for identifying tumor load in sample
CN108229103A (en) * 2018-01-15 2018-06-29 臻和(北京)科技有限公司 The processing method and processing device of Circulating tumor DNA repetitive sequence
CN108319817A (en) * 2018-01-15 2018-07-24 臻和(北京)科技有限公司 The processing method and processing device of Circulating tumor DNA repetitive sequence
WO2018148903A1 (en) * 2017-02-16 2018-08-23 上海亿康医学检验所有限公司 Auxiliary diagnosis method for urinary system tumours
CN108595918A (en) * 2018-01-15 2018-09-28 臻和(北京)科技有限公司 The processing method and processing device of Circulating tumor DNA repetitive sequence
CN111583992A (en) * 2020-05-11 2020-08-25 广州金域医学检验中心有限公司 System and method for analyzing load of tumor caused by RNA level fusion gene mutation
CN114582427A (en) * 2022-03-22 2022-06-03 成都基因汇科技有限公司 Method for identifying introgression section and computer readable storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109182526A (en) * 2018-10-10 2019-01-11 杭州翱锐生物科技有限公司 Kit and its detection method for early liver cancer auxiliary diagnosis

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104313136A (en) * 2014-09-30 2015-01-28 江苏亿康基因科技有限公司 Noninvasive human liver cancer early detection and differential diagnosis method and system
CN105574361A (en) * 2015-11-05 2016-05-11 上海序康医疗科技有限公司 Method for detecting variation of copy numbers of genomes

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100112590A1 (en) * 2007-07-23 2010-05-06 The Chinese University Of Hong Kong Diagnosing Fetal Chromosomal Aneuploidy Using Genomic Sequencing With Enrichment
JP5993029B2 (en) * 2011-12-31 2016-09-14 ビージーアイ ダイアグノーシス カンパニー リミテッドBgi Diagnosis Co., Ltd. Detection method of gene mutation
EP2844771A4 (en) * 2012-05-04 2015-12-02 Complete Genomics Inc Methods for determining absolute genome-wide copy number variations of complex tumors
CN113337604A (en) * 2013-03-15 2021-09-03 莱兰斯坦福初级大学评议会 Identification and use of circulating nucleic acid tumor markers
CN105844116B (en) * 2016-03-18 2018-02-27 广州市锐博生物科技有限公司 The processing method and processing unit of sequencing data
CN106367512A (en) * 2016-09-22 2017-02-01 上海序康医疗科技有限公司 Method and system for identifying tumor loads in samples

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104313136A (en) * 2014-09-30 2015-01-28 江苏亿康基因科技有限公司 Noninvasive human liver cancer early detection and differential diagnosis method and system
CN105574361A (en) * 2015-11-05 2016-05-11 上海序康医疗科技有限公司 Method for detecting variation of copy numbers of genomes

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
REBECCA J. LEARY等: "Detection of Chromosomal Alterations in the Circulation of Cancer Patients with Whole-Genome Sequencing", 《SCI TRANSL MED.》 *
SARAH-JANE DAWSON等: "Analysis of Circulating Tumor DNA to Monitor Metastatic Breast Cancer", 《THE NEW ENGLAND JOURNAL OF MEDICINE》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018054254A1 (en) * 2016-09-22 2018-03-29 上海亿康医学检验所有限公司 Method and system for identifying tumor load in sample
WO2018148903A1 (en) * 2017-02-16 2018-08-23 上海亿康医学检验所有限公司 Auxiliary diagnosis method for urinary system tumours
WO2018166476A1 (en) * 2017-03-15 2018-09-20 上海亿康医学检验所有限公司 Method for detecting mutation site in sample
TWI679280B (en) * 2017-03-15 2019-12-11 大陸商上海億康醫學檢驗所有限公司 Non-invasive detection of bladder cancer and method for monitoring its recurrence
CN106755547A (en) * 2017-03-15 2017-05-31 上海亿康医学检验所有限公司 The Non-invasive detection and its recurrence monitoring method of a kind of carcinoma of urinary bladder
CN108595918A (en) * 2018-01-15 2018-09-28 臻和(北京)科技有限公司 The processing method and processing device of Circulating tumor DNA repetitive sequence
CN108319817A (en) * 2018-01-15 2018-07-24 臻和(北京)科技有限公司 The processing method and processing device of Circulating tumor DNA repetitive sequence
CN108229103A (en) * 2018-01-15 2018-06-29 臻和(北京)科技有限公司 The processing method and processing device of Circulating tumor DNA repetitive sequence
CN108229103B (en) * 2018-01-15 2020-12-25 无锡臻和生物科技有限公司 Method and device for processing circulating tumor DNA repetitive sequence
CN108319817B (en) * 2018-01-15 2020-12-25 无锡臻和生物科技有限公司 Method and device for processing circulating tumor DNA repetitive sequence
CN108595918B (en) * 2018-01-15 2021-03-16 无锡臻和生物科技有限公司 Method and device for processing circulating tumor DNA repetitive sequence
CN111583992A (en) * 2020-05-11 2020-08-25 广州金域医学检验中心有限公司 System and method for analyzing load of tumor caused by RNA level fusion gene mutation
CN111583992B (en) * 2020-05-11 2023-08-29 广州金域医学检验中心有限公司 RNA level fusion gene mutation-caused tumor load analysis system and method
CN114582427A (en) * 2022-03-22 2022-06-03 成都基因汇科技有限公司 Method for identifying introgression section and computer readable storage medium

Also Published As

Publication number Publication date
WO2018054254A1 (en) 2018-03-29
TW201814290A (en) 2018-04-16
TWI670495B (en) 2019-09-01

Similar Documents

Publication Publication Date Title
CN106367512A (en) Method and system for identifying tumor loads in samples
CN109880910A (en) A kind of detection site combination, detection method, detection kit and the system of Tumor mutations load
KR102521842B1 (en) Mutational analysis of plasma dna for cancer detection
EP3692172B1 (en) Assessment of jak-stat3 cellular signaling pathway activity using mathematical modelling of target gene expression
EP3304093B1 (en) Validating biomarker measurement
TW201840853A (en) Diagnostic applications using nucleic acid fragments
CN107513565A (en) A kind of microsatellite instability Sites Combination, detection kit and its application
CN107077537A (en) With short reading sequencing data detection repeat amplification protcol
CN105986008A (en) CNV detection method and CNV detection apparatus
CN111440884A (en) Intestinal flora for diagnosing sarcopenia and application thereof
TWI679280B (en) Non-invasive detection of bladder cancer and method for monitoring its recurrence
HUE030510T2 (en) Diagnosing fetal chromosomal aneuploidy using genomic sequencing
CN107208155A (en) The analysis based on size and based on counting for the combination of the Maternal plasma that detects the sub- chromosome aberration of fetus
CN113168885B (en) Methods and systems for somatic mutation and uses thereof
CN110229897A (en) MED12 gene mutation detection kit and its application
CN108949979A (en) A method of judging that Lung neoplasm is good pernicious by blood sample
CN104428426B (en) The diagnosis miRNA overview of multiple sclerosis
CN104073499B (en) TMC1 gene mutation body and its application
CN105838720B (en) PTPRQ gene mutation body and its application
CN104099338B (en) MYO15A gene mutation body and its application
CN105177130B (en) It is used for assessing the mark of aids patient generation immune reconstitution inflammatory syndrome
WO2018186687A1 (en) Method for determining nucleic acid quality of biological sample
CN114540488B (en) Gene combination, detection device, detection kit and application for detecting tumor mutation load by high-throughput targeted sequencing
Cheng et al. Personalized circulating tumor DNA detection to monitor immunotherapy efficacy and predict outcome in locally advanced or metastatic non‐small cell lung cancer
CN105243294B (en) A kind of method for predicting the related protein pair of cancer patient prognosis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20170711

Address after: 201499 Shanghai Road, Fengxian District, Lane 1698, Lane 17, building 26

Applicant after: Shanghai Yikang medical laboratory Co., Ltd.

Address before: 201403 Shanghai, Fengxian District Jin Qi Road, room 868, No. 5232

Applicant before: SHANGHAI XUKANG MEDICAL TECHNOLOGY CO., LTD.

REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1231138

Country of ref document: HK

REG Reference to a national code

Ref country code: HK

Ref legal event code: WD

Ref document number: 1231138

Country of ref document: HK