WO2018054254A1 - 一种鉴定样本中肿瘤负荷的方法和系统 - Google Patents

一种鉴定样本中肿瘤负荷的方法和系统 Download PDF

Info

Publication number
WO2018054254A1
WO2018054254A1 PCT/CN2017/101573 CN2017101573W WO2018054254A1 WO 2018054254 A1 WO2018054254 A1 WO 2018054254A1 CN 2017101573 W CN2017101573 W CN 2017101573W WO 2018054254 A1 WO2018054254 A1 WO 2018054254A1
Authority
WO
WIPO (PCT)
Prior art keywords
window
sample
value
genome
copy number
Prior art date
Application number
PCT/CN2017/101573
Other languages
English (en)
French (fr)
Inventor
薄世平
粱覃斯
任军
陆思嘉
Original Assignee
上海亿康医学检验所有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海亿康医学检验所有限公司 filed Critical 上海亿康医学检验所有限公司
Publication of WO2018054254A1 publication Critical patent/WO2018054254A1/zh

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Definitions

  • the field relates to the field of biotechnology and, in particular, to a method and system for identifying tumor burden in a sample.
  • tumor cells of tumor patients often have a large number of genomic copy number variations.
  • Copy number variation may exist in tumor tissues, body fluids (such as blood, interstitial fluid, lymph, cerebrospinal fluid, urine, saliva, etc.), and the body fluids are specifically present in free circulating tumor cells (CTC), extracellular free DNA ( cfDNA), exosomes, etc.
  • CTC free circulating tumor cells
  • cfDNA extracellular free DNA
  • exosomes etc.
  • the genomic copy number variation in body fluids is an important indicator for identifying tumor burden. The identification of tumor burden can be applied to early tumor screening, diagnosis, patient condition monitoring, and prognosis treatment.
  • the main methods for detecting tumor genome copy number variation are: comparative genomic hybridization (CGH), real-time fluorescence quantitative PCR (RTFQ PCR), fluorescence in situ hybridization (FISH), multiple Multiplex ligation-dependent probe amplification (MLPA).
  • CGH comparative genomic hybridization
  • RDFQ PCR real-time fluorescence quantitative PCR
  • FISH fluorescence in situ hybridization
  • MLPA multiple Multiplex ligation-dependent probe amplification
  • the comparative genomic hybridization has lower resolution, Mb level, low flux and high cost. Fluorescence quantitative PCR also has low throughput and high cost, and can only measure one copy number variation at a time; fluorescence in situ hybridization is only for specific locations. Low resolution, unstable probe hybridization efficiency; multi-link probe amplification technology, complex operation, low throughput, high cost, small coverage, easy to cause PCR contamination.
  • most of the above techniques are only for specific regions on the genome, and the tumor heterogeneity is very strong, and a specific one or several sites cannot effectively comprehensively evaluate the tumor load in the body fluid.
  • the present invention provides a method and apparatus for more effectively and comprehensively evaluating the load of tumors in body fluids and improving the sensitivity and versatility of tumor detection.
  • a first aspect of the invention provides a method for non-diagnostic identification of tumor burden in a sample, comprising the steps of:
  • step (iii) aligning the genomic sequence obtained in step (ii) with a reference genome to obtain positional information of the genomic sequence on the reference genome;
  • step (v) performing a Z-test on each window b of step (iv) to calculate the Z value of each window b;
  • step (vi) Calculating the genomic disorder (GAS) based on the Z value obtained in step (v), and identifying the tumor burden in the sample to be tested based on the numerical value of the genomic disorder.
  • the reference genome may be continuous or discontinuous.
  • the reference genome comprises a whole genome.
  • the reference genome refers to the full length of all chromosomes of the species (eg, human), the full length of a single or multiple chromosomes, a portion of a single or multiple chromosomes, or a combination thereof.
  • the reference genome has a coverage of more than 50% of the whole genome, preferably 60% or more, more preferably 70% or more, more preferably 80% or more, optimally, above 95.
  • the sample is from an individual to be detected.
  • the individual to be detected is a human or a non-human mammal.
  • the sample is a solid sample or a liquid sample.
  • the sample comprises a body fluid sample.
  • the sample is selected from the group consisting of blood, plasma, interstitial fluid, lymph, cerebrospinal fluid, urine, saliva, aqueous humor, semen, or a combination thereof.
  • the sample is selected from the group consisting of free circulating tumor cells (CTC), extracellular free DNA (cfDNA), exosomes, or a combination thereof.
  • CTC free circulating tumor cells
  • cfDNA extracellular free DNA
  • exosomes or a combination thereof.
  • the sequencing is selected from the group consisting of single-ended sequencing, double-ended sequencing, or a combination thereof.
  • step (iv) further comprises the step of correcting the copy number of each window b and calculating the corrected copy number of each window b.
  • the correction method is selected from the group consisting of Loess correction, weighting method, residual method, or a combination thereof.
  • the statistics fall to each position based on the positional information of the genomic sequence on the reference genome.
  • the number of copies of each window b is corrected based on the sequence and base content of each window b.
  • the Z value of each window b is calculated using the following formula:
  • i is any positive integer from 1 to M;
  • M is the total number of windows of the reference gene component, wherein M is a positive integer ⁇ 50, preferably 50 ⁇ M ⁇ 10 5 , more preferably, 100 ⁇ M ⁇ 10 5, optimally, 200 ⁇ M ⁇ 10 5;
  • x i is the i th value of the copy window b i detected in the sample of the test;
  • b i is the i-th window; [mu] i of the control sample
  • the arithmetic mean of the copy number of window b i is calculated using the following formula:
  • N is the total number of normal control samples, wherein N is a positive integer ⁇ 30, preferably 30 ⁇ N ⁇ 10 8 , more preferably, 50 ⁇ N ⁇ 10 7 , optimally, 100 ⁇ N ⁇ 10 4 ;
  • X j refers to the copy value detected by the jth normal control sample at the window b i ;
  • ⁇ i is the standard of the copy number of the normal control sample at the window b i Poor, calculated by the following formula:
  • N, j, X j and ⁇ i are as defined above.
  • the normal control sample refers to a homogeneous sample of a normal person of the same species.
  • the genomic disorder is calculated using the following formula:
  • m b is the window sorted at the mthth percent
  • p b is the window sorted at the p%
  • m is 30-98, preferably 40-97, more preferably 60-96, optimally, 80-95, optimally, 95
  • p is 80-100, preferably, 85-100, more preferably, 90-100, optimally, 100
  • pm ⁇ 2 preferably, ⁇ 5, More preferably, ⁇ 10, more preferably ⁇ 15, optimally ⁇ 20).
  • the calculating the genomic disorder includes the following steps:
  • step (v) further includes the following steps:
  • step (iv1) calculating a coefficient of variation CV i of each window b in the normal control sample according to the number of copies of each window b in step (iv);
  • the coefficient of variation CV i is calculated using the following formula:
  • ⁇ i is the arithmetic mean of the copy number of the normal control sample and is calculated by the following formula:
  • ⁇ i is the standard deviation of the copy number of the normal control sample and is calculated by the following formula:
  • N, j, X j , ⁇ i and ⁇ i are as defined above.
  • a second aspect of the invention provides a system (device) for identifying a tumor burden in a sample, comprising:
  • sequencing unit performs nucleic acid sequencing on the sample to be tested, thereby obtaining a genome sequence of the sample
  • the aligning unit is connected to the sequencing unit, configured to compare the obtained genomic sequence of the sample with a reference genome, thereby obtaining position information of the genomic sequence on the reference genome;
  • calculation and verification unit the calculation and verification unit and the comparison unit are connected to calculate a copy number of each window b of the reference genome, and perform a Z test on each window to calculate each window b Z value;
  • An identification unit, the identification unit and the calculation and test unit are connected for calculating a genomic disorder degree (GAS) based on the obtained value of Z, and identifying a tumor load in the sample based on the numerical value of the genomic disorder degree.
  • GAS genomic disorder degree
  • system further includes a correction unit coupled to the calculation and verification unit for correcting the copy number of each window b of the reference genome, thereby calculating each window b Corrected copy number.
  • Figure 1 shows a flow chart of an analytical method for identifying tumor burden in body fluids.
  • Figure 2 shows the results of tumor burden testing for different clinical cycles of patients.
  • Figure 3 shows the S1-7 genome-wide copy number variation and the corresponding GAS.
  • the present inventors have for the first time established a method for identifying tumor burden in a sample which is effective and can improve the sensitivity and versatility of tumor detection, specifically, by calculating the genomic disorder degree (GAS), thereby The numerical value of the genomic disorder identifies the tumor burden in the sample.
  • GAS genomic disorder degree
  • the present invention provides a system (device) for identifying a tumor load in a sample, the system (device) comprising: a sequencing unit; a comparison unit; a calculation and verification unit and an identification unit.
  • a correction unit is further included. On the basis of this, the inventors completed the present invention.
  • CNV Copy Number Variations
  • GAS Genetic Abnormality Score
  • Z-score is also called a standard score and is a The process of dividing the difference between the value and the mean by the standard deviation. Expressed as:
  • x is a specific value
  • is the arithmetic mean
  • is the standard deviation
  • the Z value represents the distance between the original value and the reference average, calculated in units of standard deviation.
  • partial response refers to a reduction in the sum of the maximum diameters of the target lesions by > 30% for at least 4 weeks.
  • progressive disease refers to a increase in the maximum diameter of a target lesion of at least ⁇ 20%, or the appearance of a new lesion.
  • the reference genome in the case of a human, may be a whole genome or a partial genome. Also, the reference genome may be continuous or discontinuous.
  • the total coverage (F) of the reference genome is more than 50% of the whole genome, preferably, preferably, 60% or more, more preferably, 70% or more, more Preferably, more than 80%, optimally, more than 95%, wherein the total coverage (F) refers to the percentage of the reference genome as a whole genome.
  • the reference genome is a whole genome.
  • the reference genome is the full length of all chromosomes of the species (eg, human), the full length of a single or multiple chromosomes, a portion of a single or multiple chromosomes, or a combination thereof.
  • the "tumor load” refers to the degree of damage of the tumor to the body, such as the size of the tumor, the degree of tumor activity, the metastasis of the tumor, and the degree of danger of the tumor at different sites to the body.
  • Some indicators for evaluating tumor burden include (but are not limited to): tumor size, tumor marker height, clinical symptoms (wheezing, pain, etc.), related complications (superior vena cava syndrome, etc.), consumption (anemia, low) Proteinemia, etc.).
  • sequencing can be performed using conventional sequencing techniques and platforms.
  • the sequencing platform is not particularly limited, and the second generation sequencing platform includes (but is not limited to): Illumina's GA, GAII, GAIIx, HiSeq1000/2000/2500/3000/4000, X Ten, X Five, NextSeq500/550, MiSeq , MiSeqDx, MiSeq FGx, MiniSeq; SOLiD from Applied Biosystems; 454FLX from Roche; Ion Torrent, Ion PGM, Ion Proton I/II from Thermo Fisher Scientific (Life Technologies); BGISEQ1000, BGISEQ500, BGISEQ100 from BGI; Boao Bio Group BioelectronSeq 4000; DA8600 of Sun Yat-sen University Daan Gene Co., Ltd.; NextSeq CN500 of Berry and Kang; BIGIS of Zixin Zixin, a subsidiary of Zixin Pharmaceutical; HYK-PSTAR-I
  • Third-generation single-molecule sequencing platforms include, but are not limited to, HeliScope Systems from Helicos BioSciences, SMRT Systems from Pacific Bioscience, GridION, MinION from Oxford Nanopore Technologies.
  • the sequencing type can be Single End sequencing or Paired End sequencing.
  • the sequencing length can be any length greater than 30 bp, such as 30 bp, 40 bp, 50 bp, 100 bp, 300 bp, etc., and the sequencing depth can be 0.01, 0.02 of the genome. 0.1, 1, 5, 10, 30 times, etc. are any multiples greater than 0.01.
  • Illumina's HiSeq2500 high-throughput sequencing platform is preferred, and the sequencing type is single-end sequencing, the sequencing length is 41 bp, and the sequencing data amount is 5M.
  • data processing generally includes the following steps:
  • the method further includes: the type of the sample to be tested is a body fluid, and the body fluid may be blood, tissue interstitial fluid (referred to as tissue fluid or intercellular fluid), lymph fluid, cerebrospinal fluid, urine, saliva,
  • tissue fluid or intercellular fluid tissue interstitial fluid
  • lymph fluid lymph fluid
  • cerebrospinal fluid urine
  • saliva saliva
  • the detection target is DNA contained in body fluid, and the DNA is specifically present in free circulating tumor cells (CTC), extracellular free DNA (cfDNA), exosomes, and the like.
  • CTC free circulating tumor cells
  • cfDNA extracellular free DNA
  • exosomes exosomes, and the like.
  • the extraction method of the sample DNA to be tested includes (but is not limited to): column extraction, magnetic bead extraction. The samples were constructed using a high-throughput sequencing platform to sequence the samples.
  • the method further comprises: removing the joint and the low-quality data from the sequencing result, and comparing the reference genome.
  • the reference genome can be part of the whole genome, any chromosome, or chromosome.
  • the reference genome typically selects a sequence that has been generally identified, such as the human genome can be hg18 (GRCh18), hg19 (GRCh19), hg38 (GRCh38) of NCBI or UCSC, or any part of a chromosome and chromosome.
  • the comparison software can be used with any kind of free or commercial software, such as BWA (Burrows-Wheeler Alignment tool), SOAPaligner/soap2 (Short Oligonucleotide Analysis Package), Bowtie/Bowtie2.
  • BWA Borrows-Wheeler Alignment tool
  • SOAPaligner/soap2 Short Oligonucleotide Analysis Package
  • Bowtie/Bowtie2 Bowtie/Bowtie2.
  • the method further comprises: forming the gene component into a window of a certain length, and according to the measured data amount, the window length may also be the same or different integers in the range of 100 bp to 3,000,000 bp (3M).
  • the number of windows can be any integer in the range of 1,000-30,000,000. Based on the position of the sequence on the genome, the number of sequences falling into each window, the base distribution, and the base distribution of the reference genome were counted.
  • the copy number of each window is corrected according to the sequence of each window and the base GC content.
  • the correction methods include, but are not limited to, Loess correction, and the corrected copy number of each window is calculated.
  • step (d) specifically: taking N (N is a natural number of not less than 30) normal human samples, the same extraction, database construction, sequencing conditions, repeating the above steps (a)-(c ) as a reference data set. For each window b i , there are N normal copy values.
  • the arithmetic mean ⁇ i is calculated as:
  • X 1 , X 2 , X 3 , ... X j are copy values of normal samples.
  • x i is the copy value detected by window b i .
  • the method further comprises: a high repeating region, such as a near centromere, a telomere, a satellite, a heterochromatin, or the like, around the entire genome, a chromosome, a chromosome fragment or a gene.
  • a high repeating region such as a near centromere, a telomere, a satellite, a heterochromatin, or the like, around the entire genome, a chromosome, a chromosome fragment or a gene.
  • the high repeat area is first removed to eliminate the effect on the chaos calculation.
  • the method of removal includes (but is not limited to):
  • L Remove areas of the genome that are not detected by high-throughput sequencing such as centromere, telomere, satellite, and heterochromatin, and remove the L-length region near the centromere, telomere, satellite, and heterochromatin on the genome, L Can be any length less than 3M; or
  • ⁇ i is the arithmetic mean of the copy number of the normal control sample
  • ⁇ i is the standard deviation of the copy number of the normal control sample
  • the CV is sorted from small to large, removing the largest first n% of the window, and n can be any value greater than 0 and less than or equal to 5.
  • step (e) specifically calculating the degree of genomic disorder (GAS):
  • the detection range of the degree of confusion is first determined, including but not limited to any value ranging from 1 M to the genome length (eg, the human genome is about 3 G) of the entire genome, a specific chromosome, a specific chromosome fragment, or a specific gene.
  • the Z value of the window that removes the effect of the repetitive sequence is taken as an absolute value
  • the absolute value of the Z value is sorted from small to large, and the absolute value of the ordered Z value is evenly distributed in the range of 0%-100%.
  • the absolute value of the absolute value of the Z value is assigned to 0%
  • the maximum value of the absolute value of the Z value is assigned to 100%.
  • m b is the window sorted at the mth
  • p b is the window sorted at the p%.
  • the tumor load in body fluids was identified using the value of GAS.
  • an identification is provided which is effective and can improve the sensitivity and versatility of tumor detection.
  • a method of tumor burden in a sample including steps:
  • step (iii) aligning the genomic sequence obtained in step (ii) with a reference genome to obtain positional information of the genomic sequence on the reference genome;
  • step (v) performing a Z-test on each window b of step (iv) to calculate the Z value of each window b;
  • step (vi) Calculating the genomic disorder (GAS) based on the Z value obtained in step (v), and identifying the tumor burden in the sample to be tested based on the numerical value of the genomic disorder.
  • the method comprises the steps of:
  • a system for identifying a tumor load in a sample, comprising:
  • sequencing unit performs nucleic acid sequencing on the sample to be tested, thereby obtaining a genome sequence of the sample
  • the aligning unit is connected to the sequencing unit, configured to compare the obtained genomic sequence of the sample with a reference genome, thereby obtaining position information of the genomic sequence on the reference genome;
  • calculation and verification unit the calculation and verification unit and the comparison unit are connected to calculate a copy number of each window b of the reference genome, and perform a Z test on each window to calculate each window b Z value;
  • An identification unit, the identification unit and the calculation and test unit are connected for calculating a genomic disorder degree (GAS) based on the obtained value of Z, and identifying a tumor load in the sample based on the numerical value of the genomic disorder degree.
  • GAS genomic disorder degree
  • system further comprises a correction unit, the correction unit and the calculation and verification unit being connected for correcting the copy number of each window b of the reference genome, thereby calculating each The number of copies after window b correction.
  • the present invention establishes for the first time a method and system for identifying tumor burden in a sample, and the method and system of the present invention can accurately and effectively identify tumor load in a sample.
  • the methods and systems of the present invention can increase the sensitivity and versatility of tumor detection.
  • the method and system of the present invention can reduce the pain caused by sampling during tumor patient detection and achieve non-invasive detection.
  • the method and system of the present invention can effectively detect patients who are not routinely tested for sampling;
  • the method and system of the invention can detect tumor patients in real time, monitor the efficacy of medication, and provide certain guidance for doctors' medication and treatment.
  • the sample is sourced from the blood of a patient with gastric cancer, and free DNA (cfDNA) and white blood cells are extracted from the blood.
  • the nucleic acid extraction was carried out using the CW2603 nucleic acid extraction kit of Kangwei Century Biotechnology Co., Ltd., and the extraction method was operated according to the product manual provided by Kangwei Century Biotechnology Co., Ltd.
  • the library was constructed using CW2185 library kit of Kangwei Century Biotechnology Co., Ltd., and sequenced on the machine.
  • the sequencing of the machine was performed using Illumina's HiSeq2500 high-throughput sequencing platform. Instructions for operation provided by Illumina.
  • the sequencing type was single-end sequencing, the sequencing length was 41 bp, and the sequencing data amount was 5M.
  • the sequencing results were removed from the linker and the low quality data was compared to the reference genome.
  • the reference genome is hg19 (GRCh19) of the human genome UCSC, and the comparison software is BWA (Burrows-Wheeler Alignment tool).
  • BWA Borrows-Wheeler Alignment tool. The default parameters are used to compare the sequences to the reference genome to obtain the position of the sequence on the genome, and the selection is unique on the genome.
  • the sequence of the alignment is
  • the gene components were divided into 15489 windows b (regions), each window b having a length of 200K, and the number of sequences falling into each window b, the base distribution, and the base distribution of the reference genome were counted according to the position of the sequence on the genome.
  • the copy number of each window b is corrected according to the sequence of each window b and the base GC content, and the correction method is Loess, and the corrected copy number of each window b is calculated.
  • the arithmetic mean ⁇ i is calculated as:
  • X 1 , X 2 , X 3 , ... X j are copy values of normal samples.
  • x i is the copy value detected by the window b i
  • ⁇ i is the arithmetic mean of the copy number of the normal control sample
  • ⁇ i is the standard deviation of the copy number of the normal control sample
  • each window CV is sorted from small to large, and the largest top 5% window is removed, and the following confusion calculation is not participated.
  • the detection range of chaos is the whole genome; the Z value is taken as an absolute value and sorted from small to large, and the cumulative value of the absolute value of the Z value from the m%th to the p%th window is calculated, and the cumulative value is the genomic disorder degree (GAS).
  • the calculation formula is:
  • m b is the window sorted at the mth
  • p b is the window sorted at the p%, where m is 95 and p is 100.
  • the tumor load in body fluids was identified using the value of GAS.
  • a typical pathology is as follows.
  • the cfDNA copy number was normal in the fourth cycle, and the whole genome disorder was 728.80, which was close to the normal white blood cell 729.86.
  • the whole genome disorder degree of the above 100 normal persons was calculated, the normal range was 722.87-739.89, and the arithmetic mean value was 733.22.
  • the fourth medication cycle of the present embodiment and the whole genome disorder value of the white blood cells were in the normal range. Internally, the tumor burden in the blood is small, and it corresponds to the clinical evaluation result PR (partial remission).

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Organic Chemistry (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Immunology (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

一种鉴定样本中肿瘤负荷的方法和系统。其中,一种非诊断性地鉴定样本中肿瘤负荷的方法,包括步骤:(i)提供一待测样本;(ii)对所述待测样本进行测序,从而获得所述样本的基因组序列;(iii)将步骤(ii)获得的基因组序列与参考基因组进行比对,从而获得基因组序列在参考基因组上的位置信息;(iv)将所述的参考基因组分成M个区域片段,其中每个区域片段为一个窗口b,计算每个窗口b的拷贝数;(v)对步骤(iv)的每个窗口b进行Z检验,从而计算每个窗口b的Z值;和(vi)根据步骤(v)所得到的Z值,计算基因组混乱度(GAS),基于基因组混乱度的数值鉴定所述待测样本中的肿瘤负荷。所述方法和系统可提高肿瘤检测的灵敏性和通用性。

Description

一种鉴定样本中肿瘤负荷的方法和系统 技术领域
本领域涉及生物技术领域,具体地,涉及一种鉴定样本中肿瘤负荷的方法和系统。
背景技术
在生物医学的科学研究及临床应用领域,肿瘤患者的肿瘤细胞经常有大量的基因组拷贝数变异。拷贝数变异可存在于肿瘤组织、体液(如血液、组织间隙液、淋巴液、脑脊液、尿液、唾液等)中,体液中具体存在于游离的循环肿瘤细胞(CTC)、细胞外游离DNA(cfDNA)、外泌体等。体液中基因组拷贝数变异的情况是鉴定肿瘤负荷的重要指标,鉴定肿瘤负荷可应用于肿瘤早期筛查、诊断,患者的病情监控、预后治疗等。
目前检测肿瘤基因组拷贝数变异的主要方法有:比较基因组杂交(comparative genomic hybridization,CGH),荧光定量PCR(realtime fluorescence quantitative PCR,RTFQ PCR),荧光原位杂交(fluorescence in situ hybridization,FISH),多重连接探针扩增技术(multiplex ligation-dependent probe amplification,MLPA)。
然而,比较基因组杂交分辨率比较低,Mb级,通量低,成本高;荧光定量PCR同样通量低,成本高,一次只能测一个拷贝数变异;荧光原位杂交,只针对特定位置,分辨率低,探针杂交效率不稳定;多重连接探针扩增技术,操作复杂,通量低,成本高,覆盖度小,易造成PCR污染。除上述技术上的缺陷,以上技术检测大部分只针对基因组上特定的区域,而肿瘤异质性很强,特定的一个或几个位点不能有效综合评价体液中肿瘤的负荷。
因此,本领域迫切需要开发一种能够更有效综合评价体液中肿瘤的负荷,提高肿瘤检测的灵敏性和通用性的方法和设备。
发明内容
本发明提供一种能够更有效综合评价体液中肿瘤的负荷,提高肿瘤检测的灵敏性和通用性的方法和设备。
本发明第一方面提供了一种非诊断性地鉴定样本中肿瘤负荷的方法,包括步骤:
(i)提供一待测样本;
(ii)对所述待测样本进行测序,从而获得所述样本的基因组序列;
(iii)将步骤(ii)获得的基因组序列与参考基因组进行比对,从而获得基因组序列在参考基因组上的位置信息;
(iv)将所述的参考基因组分成M个区域片段,其中每个区域片段为一个窗口b,计算每个窗口b的拷贝数;
(v)对步骤(iv)的每个窗口b进行Z检验,从而计算每个窗口b的Z值;和
(vi)根据步骤(v)所得到的Z值,计算基因组混乱度(GAS),基于基因组混乱度的数值鉴定所述待测样本中的肿瘤负荷。
在另一优选例中,所述参考基因组可以是连续的,也可以是不连续的。
在另一优选例中,所述参考基因组包括全基因组。
在另一优选例中,所述参考基因组指该物种(如人)所有染色体的全长、单条或多条染色体的全长、单条或多条染色体的一部分、或其组合。
在另一优选例中,所述参考基因组的覆盖率达到全基因组的50%以上,较佳地,60%以上,更佳地,70%以上,更佳地,80%以上,最佳地,95%以上。
在另一优选例中,所述样本来自待检测个体。
在另一优选例中,所述待检测个体为人或非人哺乳动物。
在另一优选例中,所述样本为固体样本或液体样本。
在另一优选例中,所述样本包括体液样本。
在另一优选例中,所述样本选自下组:血液、血浆、组织间隙液、淋巴液、脑脊液、尿液、唾液、房水、精液、或其组合。
在另一优选例中,所述样本选自下组:游离的循环肿瘤细胞(CTC)、细胞外游离DNA(cfDNA)、外泌体、或其组合。
在另一优选例中,所述测序选自下组:单端测序、双端测序、或其组合。
在另一优选例中,所述步骤(iv)还包括校正每个窗口b的拷贝数,计算每个窗口b校正后的拷贝数的步骤。
在另一优选例中,所述校正方法选自下组:Loess校正、权重法、残差法、或其组合。
在另一优选例中,根据基因组序列在参考基因组上的位置信息,统计落到每 个窗口b的序列数目、碱基分布、参考基因组的碱基分布。
在另一优选例中,根据每个窗口b的序列及碱基含量,校正每个窗口b的拷贝数。
在另一优选例中,用下述公式计算每个窗口b的Z值:
Figure PCTCN2017101573-appb-000001
其中,i为1至M的任意正整数;M为参考基因组分成的窗口的总数量,其中M为≥50的正整数,较佳地,50≤M≤105,更佳地,100≤M≤105,最佳地,200≤M≤105;xi为所述待测样本在第i个窗口bi检测的拷贝数值;bi为第i个窗口;μi为正常对照样本在窗口bi的拷贝数的算术平均值,用如下公式计算:
Figure PCTCN2017101573-appb-000002
其中,j为1至N的任意正整数;N为正常对照样本的总数量,其中N为≥30的正整数,较佳地,30≤N≤108,更佳地,50≤N≤107,最佳地,100≤N≤104;Xj指第j个正常对照样本在所述窗口bi检测的拷贝数值;σi为正常对照样本在所述窗口bi的拷贝数的标准差,用如下公式计算:
Figure PCTCN2017101573-appb-000003
式中,N、j、Xj和μi的定义如上。
在另一优选例中,所述正常对照样本指同一物种的正常人的同类样本。
在另一优选例中,用下述公式计算基因组混乱度:
Figure PCTCN2017101573-appb-000004
其中,mb为排序在第m%的窗口,pb为排序在第p%的窗口,m为30-98,较佳地,40-97,更佳地,60-96,最佳地,80-95,最佳地,95,p为80-100,较佳地,85-100,更佳地,90-100,最佳地,100,且p-m≥2(较佳地,≥5,更佳地,≥10,更佳地,≥15,最佳地,≥20)。
在另一优选例中,所述计算基因组混乱度之前,包括如下步骤:
(a)根据参考基因组序列特征去除基因组上着丝粒、端粒、随体、异染色质等高通量测序测不到的区域,去除基因组上着丝粒、端粒、随体、异染色质附近L 长度的区域,L为小于3M的任何长度;或
(b)根据样本的拷贝数特征去除基因组上着丝粒、端粒、随体、异染色质等高通量测序测不到的区域。
在另一优选例中,所述步骤(v)之前还包括如下步骤:
(iv1)根据步骤(iv)的每个窗口b的拷贝数,计算正常对照样本中每个窗口b的变异系数CVi;和
(iv2)将所述CVi从小到大排序,去除最大的前n%的窗口,其中,n为大于0,小于等于5的任意数值,较佳地,n=1、2、2.5、3、3.1、4、4.2或5。
在另一优选例中,所述变异系数CVi用下述公式进行计算:
Figure PCTCN2017101573-appb-000005
其中,μi为正常对照样本拷贝数的算术平均值,用如下公式计算:
Figure PCTCN2017101573-appb-000006
σi为正常对照样本拷贝数的标准差,用如下公式计算:
Figure PCTCN2017101573-appb-000007
式中,N、j、Xj、μi和σi的定义如上。
本发明第二方面提供了一种用于鉴定样本中肿瘤负荷的系统(设备),包括:
测序单元,所述测序单元用于对待测样本进行核酸测序,从而获得所述样本的基因组序列;
比对单元,所述比对单元与所述测序单元相连,用于将获得的所述样本的基因组序列与参考基因组进行比对,从而获得基因组序列在参考基因组上的位置信息;
计算与检验单元,所述计算与检验单元和所述比对单元相连,用于计算所述参考基因组的每个窗口b的拷贝数,并对每个窗口进行Z检验,从而计算每个窗口b的Z值;以及
鉴定单元,所述鉴定单元和所述计算与检验单元相连,用于根据所得到Z的值,计算基因组混乱度(GAS),并基于基因组混乱度的数值鉴定样本中的肿瘤负荷。
在另一优选例中,所述系统还包括校正单元,所述校正单元和所述计算与检验单元相连,用于校正所述参考基因组的每个窗口b的拷贝数,从而计算每个窗口 b校正后的拷贝数。
在另一优选例中,在所述计算与检验单元中,在对每个窗口b进行Z检验前,可根据每个窗口b的拷贝数,计算每个窗口b的变异系数CVi,并将所述CVi从小到大排序,去除最大的前n%的窗口,其中,n为大于0,小于等于5的任意数值,较佳地,n=1、2、2.5、3、3.1、4、4.2或5。
应理解,在本发明范围内中,本发明的上述各技术特征和在下文(如实施例)中具体描述的各技术特征之间都可以互相组合,从而构成新的或优选的技术方案。限于篇幅,在此不再一一累述。
附图说明
图1显示了体液中鉴定肿瘤负荷的分析方法流程图。
图2显示了患者不同临床用药周期的肿瘤负荷检测结果。
图3显示了S1-7全基因组拷贝数变异及对应的GAS。
具体实施方式
本发明人通过广泛而深入的研究,首次建立了一种有效且可提高肿瘤检测的灵敏性和通用性的鉴定样本中肿瘤负荷的方法,具体地,通过计算基因组混乱度(GAS),从而基于基因组混乱度的数值鉴定样本中的肿瘤负荷。
此外,本发明还提供了一种鉴定样本中肿瘤负荷的系统(设备),所述系统(设备)包括:测序单元;比对单元;计算与检验单元和鉴定单元。在本发明的一个优选例中,还包括校正单元。在此基础上,本发明人完成了本发明。
术语
如本文所用,术语“拷贝数变异(Copy Number Variations,CNV)”是指样本基因组染色体或染色体片段拷贝数异常,包括但不限于染色体非整倍体、缺失、重复,大于1000bp碱基的微缺失、微重复。
如本文所用,术语“基因组混乱度值(Genomic Abnormality Score,GAS)”是根据样本基因组染色体或染色体片段拷贝数异常计算得到的分值,分值检测范围包括但不限于全基因组、特定的染色体、染色体片段、特定基因。
如本文所用,术语“Z值(Z-score)”也叫标准分值(standard score),是一个 数值与平均数的差再除以标准差的过程。用公式表示为:
Z score=(x-μ)/σ
其中x为某一具体数值,μ为算术平均值,σ为标准差;Z值代表着原始数值和参考平均值之间的距离,是以标准差为单位计算。
如本文所用,术语“部分缓解(PR,partial response)”指靶病灶最大径之和减少≥30%,至少维持4周。
如本文所用,术语“疾病进展(PD,progressive disease)”指靶病灶最大径之和至少增加≥20%,或出现新病灶。
如本文所用,术语“系统”、“设备”为相同含义。
参考基因组
在本发明中,以人为例,所述参考基因组可以是全基因组,也可以是部分基因组。并且,所述参考基因组可以是连续的,也可以是不连续的。当所述参考基因组为部分基因组时,所述参考基因组的总覆盖率(F)为全基因组的50%以上,较佳地,较佳地,60%以上,更佳地,70%以上,更佳地,80%以上,最佳地,95%以上,其中,所述总覆盖率(F)指参考基因组占全基因组的百分比。
在一优选实施方式中,所述参考基因组为全基因组。
在一优选实施方式中,所述参考基因组为该物种(如人)所有染色体的全长、单条或多条染色体的全长、单条或多条染色体的一部分、或其组合。
肿瘤负荷
在本发明中,所述“肿瘤负荷”指肿瘤对机体的危害程度,比如肿瘤的大小,肿瘤的活跃程度,肿瘤的转移情况,不同部位的肿瘤对机体的危险程度。一些评价肿瘤负荷的指标包括(但不限于):肿瘤大小、肿瘤标记物高低、临床症状(喘憋、疼痛等等)、相关并发症(上腔静脉综合征等)、消耗情况(贫血、低蛋白血症等)。
测序
在本发明中,可用常规的测序技术和平台进行测序。测序平台不受特别限制,其中第二代测序平台包括(但不限于):Illumina公司的GA、GAII、GAIIx、HiSeq1000/2000/2500/3000/4000、X Ten、X Five、NextSeq500/550、MiSeq、 MiSeqDx、MiSeq FGx、MiniSeq;Applied Biosystems的SOLiD;Roche的454FLX;Thermo Fisher Scientific(Life Technologies)的Ion Torrent、Ion PGM、Ion Proton I/II;华大基因的BGISEQ1000、BGISEQ500、BGISEQ100;博奥生物集团的BioelectronSeq 4000;中山大学达安基因股份有限公司的DA8600;贝瑞和康的NextSeq CN500;紫鑫药业旗下子公司中科紫鑫的BIGIS;华因康基因HYK-PSTAR-IIA。
第三代单分子测序平台包括(但不限于):Helicos BioSciences公司的HeliScope系统,Pacific Bioscience的SMRT系统,Oxford Nanopore Technologies的GridION、MinION。测序类型可为单端(Single End)测序或双端(Paired End)测序,测序长度可为30bp、40bp、50bp、100bp、300bp等大于30bp的任意长度,测序深度可为基因组的0.01、0.02、0.1、1、5、10、30倍等大于0.01的任意倍数。
在本发明中,优选Illumina公司的HiSeq2500高通量测序平台,测序类型为单端(Single End)测序,测序长度41bp,测序数据量为5M。
数据处理
在本发明中,数据处理通常包括以下步骤:
(a)对待测样本的基因组进行核酸提取、测序,以获得基因组序列;
(b)将所述样本的基因组序列比对到参考基因组,得到序列在参考基因组上的位置;
(c)将参考基因组分成一定长度的窗口,计算每个窗口b的拷贝数;
(d)对每个窗口b进行Z检验,计算每个窗口的Z值;和
(e)计算基因组混乱度(GAS)。
其中,在步骤(a)中,具体还包括:所述待测样本的类型为体液,体液可以是血液、组织间隙液(简称组织液或细胞间液)、淋巴液、脑脊液、尿液、唾液,检测目标为体液中含有的DNA,DNA具体存在于游离的循环肿瘤细胞(CTC)、细胞外游离DNA(cfDNA)、外泌体等。所述待测样本DNA的提取方式包括(但不限于):柱式提取、磁珠提取。对样本进行文库构建,采用高通量测序平台,对样本进行测序。
其中,在步骤(b)中,具体还包括:将测序结果去掉接头及低质量数据,比对到参考基因组。参考基因组可为全基因组、任意染色体、染色体的一部分。 参考基因组通常选择已被公认确定的序列,如人的基因组可为NCBI或UCSC的hg18(GRCh18)、hg19(GRCh19)、hg38(GRCh38),或任意一条染色体及染色体的一部分。比对软件可用任何一种免费或商业软件,如BWA(Burrows-Wheeler Alignment tool)、SOAPaligner/soap2(Short Oligonucleotide Analysis Package)、Bowtie/Bowtie2。将序列比对到参考基因组,得到序列在基因组上的位置。可以选择在基因组上唯一比对的序列,去除基因组上多处比对的序列,消除重复序列对拷贝数计算带来的误差。
其中,在步骤(c)中,具体还包括:将基因组分成一定长度的窗口,根据测的数据量,窗口长度也可以为100bp-3,000,000bp(3M)范围内相同或不同的整数。窗口的数量可以是1,000-30,000,000范围内的任意整数。根据测的序列在基因组上的位置,统计落到每个窗口的序列数目、碱基分布、参考基因组的碱基分布。根据每个窗口的序列及碱基GC含量,校正每个窗口的拷贝数,校正方法包括但不限于Loess校正,计算每个窗口校正后的拷贝数。
其中,在步骤(d)中,具体还包括:取N(N为不少于30的自然数)个正常人的样本,同样的提取、建库、测序条件,重复上述步骤(a)-(c),作为参考数据集。对于每个窗口bi,都对应N个正常拷贝数值。
计算正常对照样本拷贝数的算术平均值μi,算术平均值μi计算公式为:
Figure PCTCN2017101573-appb-000008
计算正常对照样本拷贝数的标准差σi,标准差的计算公式为:
Figure PCTCN2017101573-appb-000009
X1,X2,X3,......Xj为正常样本的拷贝数值。
计算待检测样本每个窗口bi的Z值,Z值的计算公式为:
Figure PCTCN2017101573-appb-000010
xi为窗口bi检测的拷贝数值。
其中,在步骤(e)中,具体还包括:在整个基因组、某条染色体、染色体片段或基因周围存在高重复区域,如近着丝粒、端粒、随体、异染色质等区域。首先去除高重复区域,以消除对混乱度计算的影响。
在一优选实施方式中,去除的方法包括(但不限于):
a.根据参考基因组序列特征去除
去除基因组上着丝粒、端粒、随体、异染色质等高通量测序测不到的区域,去除基因组上着丝粒、端粒、随体、异染色质附近L长度的区域,L可以为小于3M的任何长度;或
b.根据正常样本的拷贝数特征去除
对于每个窗口bi,计算正常对照样本在这个窗口的变异系数CVi(Coefficient of Variation),CVi计算公式为:
Figure PCTCN2017101573-appb-000011
μi为正常对照样本拷贝数的算术平均值,σi为正常对照样本拷贝数的标准差。
CV从小到大排序,去除最大的前n%的窗口,n可以为大于0,小于等于5的任意数值。
其中,在步骤(e)中,具体还包括基因组混乱度(GAS)的计算方式:
首先确定混乱度的检测范围,检测范围包括但不限于整个基因组、特定染色体、特定染色体片段或特定的基因等1M到基因组长度(如人的基因组约3G)范围内的任意值。在混乱度检测范围内,去除重复序列影响的窗口的Z值取绝对值,Z值绝对值从小到大排序,并将排好序的Z值绝对值平均分配到0%-100%范围内,其中Z值绝对值最小值被分配至0%,Z值绝对值的最大值被分配给100%。计算对应于第m%到第p%范围内的各窗口Z值绝对值的累计值,其中,m为30-98,较佳地,40-97,更佳地,60-96,最佳地,80-95,最佳地,95;p为80-100,较佳地,85-100,更佳地,90-100,最佳地,100,且p-m≥2(较佳地≥5,更佳地≥10,更佳地≥15,最佳地≥20),所述的累计值即为基因组混乱度(GAS),计算公式为:
Figure PCTCN2017101573-appb-000012
mb为排序在第m%的窗口,pb为排序在第p%的窗口。用GAS的值鉴定体液中肿瘤负荷。
鉴定样本中肿瘤负荷的方法
在本发明中,提供了一种有效且可提高肿瘤检测的灵敏性和通用性的鉴定 样本中肿瘤负荷的方法,包括步骤:
(i)提供一待测样本;
(ii)对所述待测样本进行测序,从而获得所述样本的基因组序列;
(iii)将步骤(ii)获得的基因组序列与参考基因组进行比对,从而获得基因组序列在参考基因组上的位置信息;
(iv)将所述的参考基因组分成M个区域片段,其中每个区域片段为一个窗口b,计算每个窗口b的拷贝数;
(v)对步骤(iv)的每个窗口b进行Z检验,从而计算每个窗口b的Z值;和
(vi)根据步骤(v)所得到的Z值,计算基因组混乱度(GAS),基于基因组混乱度的数值鉴定所述待测样本中的肿瘤负荷。
在本发明的一个优选例中,所述方法包括步骤:
(a)对样本基因组进行核酸提取、测序,以获得基因组序列;
(b)将序列比对到参考基因组,得到序列在基因组上的位置;
(c)将参考基因组分成一定长度的窗口b,计算每个窗口b的拷贝数;以及
(d)对每个窗口b进行Z检验,计算每个窗口b的Z值;计算基因组混乱度(GAS),从而基于基因组混乱度的数值鉴定样本中的肿瘤负荷。
鉴定样本中肿瘤负荷的系统(设备)
在本发明中,还提供了一种鉴定样本中肿瘤负荷的系统(设备),包括:
测序单元,所述测序单元用于对待测样本进行核酸测序,从而获得所述样本的基因组序列;
比对单元,所述比对单元与所述测序单元相连,用于将获得的所述样本的基因组序列与参考基因组进行比对,从而获得基因组序列在参考基因组上的位置信息;
计算与检验单元,所述计算与检验单元和所述比对单元相连,用于计算所述参考基因组的每个窗口b的拷贝数,并对每个窗口进行Z检验,从而计算每个窗口b的Z值;以及
鉴定单元,所述鉴定单元和所述计算与检验单元相连,用于根据所得到Z的值,计算基因组混乱度(GAS),并基于基因组混乱度的数值鉴定样本中的肿瘤负荷。
在一优选实施方式中,所述系统还包括校正单元,所述校正单元和所述计算与检验单元相连,用于校正所述参考基因组的每个窗口b的拷贝数,从而计算每个 窗口b校正后的拷贝数。
本发明的主要优点包括:
(1)本发明首次建立一种鉴定样本中肿瘤负荷的方法和系统,本发明的方法和系统可准确、有效的鉴定样本中肿瘤负荷。
(2)本发明的方法和系统可提高肿瘤检测的灵敏性和通用性。
(3)本发明的方法和系统可减少肿瘤患者检测时取样带来的痛苦,实现无创检测。
(4)本发明的方法和系统可有效的检测某些常规检测无法取样的患者;
(5)本发明的方法和系统可对肿瘤患者实时检测,监测用药疗效,对医生用药、治疗做出一定的指导。
下面结合具体实施例,进一步陈述本发明。应理解,这些实施例仅用于说明本发明而不用于限制本发明的范围。下列实施例中未注明详细条件的实验方法,通常按照常规条件如Sambrook等人,分子克隆:实验室手册(New York:Cold Spring Harbor Laboratory Press,1989)中所述的条件,或按照制造厂商所建议的条件。除非另外说明,否则百分比和份数按重量计算。
除非有特别说明,否则实施例所用的材料均为市售产品。
实施例1
本发明已经应用到15个例子,并取得良好的效果。为了使本发明的用法和效果更加易于理解和掌握,下面将举一个实例进行进一步的阐述。实施的简要流程图如图1所示,详细实施过程如下:
1.对样本基因组进行核酸提取、测序
在本实施例中,检测样本来源为某胃癌患者血液,提取血液中游离DNA(cfDNA)及白细胞。核酸提取采用康为世纪生物科技有限公司的CW2603核酸提取试剂盒,提取方法按照康为世纪生物科技有限公司提供的产品说明书操作。
采用康为世纪生物科技有限公司的CW2185建库试剂盒进行文库构建,上机测序。上机测序采用Illumina公司的HiSeq2500高通量测序平台,按照 Illumina公司提供的说明书操作。测序类型为单端(Single End)测序,测序长度41bp,测序数据量为5M。
2.将序列比对到参考基因组,得到序列在基因组上的位置
将测序结果去掉接头及低质量数据,比对到参考基因组。参考基因组为人的基因组UCSC的hg19(GRCh19),比对软件为BWA(Burrows-Wheeler Alignment tool),采用默认参数,将序列比对到参考基因组,得到序列在基因组上的位置,选择在基因组上唯一比对的序列。
3.将参考基因组分成一定长度的窗口,计算每个窗口的拷贝数
将基因组分成15489个窗口b(区域),每个窗口b长度为200K,根据序列在基因组上的位置,统计落到每个窗口b的序列数目、碱基分布、参考基因组的碱基分布。根据每个窗口b的序列及碱基GC含量,校正每个窗口b的拷贝数,校正方法为Loess,计算每个窗口b校正后的拷贝数。
4.计算每个窗口的CV值
取100个正常人的样本,同样的提取、建库、测序条件,重复上述1、2、3步骤,获得正常对照样本数据,作为参考数据集,计算待检测样本每个窗口bi的CV值。
对于每个窗口bi,都对应N(本实施例N=100)个正常拷贝数值。
计算正常对照样本拷贝数的算术平均值μi,算术平均值μi计算公式为:
Figure PCTCN2017101573-appb-000013
计算正常对照样本拷贝数的标准差σi,标准差的计算公式为:
Figure PCTCN2017101573-appb-000014
X1,X2,X3,......Xj为正常样本的拷贝数值。
计算待检测样本每个窗口bi的CV值,CV值的计算公式为:
Figure PCTCN2017101573-appb-000015
5.对每个窗口进行Z检验,计算每个窗口的Z值
计算待检测样本每个窗口bi的Z值,Z值的计算公式为:
Figure PCTCN2017101573-appb-000016
xi为窗口bi检测的拷贝数值,μi为正常对照样本拷贝数的算术平均值,σi为正常对照样本拷贝数的标准差,计算公式同步骤4。
6.计算基因组混乱度(GAS)
在本实施例中,每个窗口CV从小到大排序,去除最大的前5%的窗口,不参与以下混乱度计算。混乱度的检测范围为整个基因组;Z值取绝对值,并从小到大排序,计算第m%到第p%窗口Z值绝对值的累计值,其累计值即为基因组混乱度(GAS)。计算公式为:
Figure PCTCN2017101573-appb-000017
mb为排序在第m%的窗口,pb为排序在第p%的窗口,其中,m为95,p为100。用GAS的值鉴定体液中肿瘤负荷。
7.检测结果
对十几个样本进行检测。一个典型病理的情况如下所示。
检测结果如表1、图2和图3所示。
表1 实施例1对某胃癌患者的临床用药效果做肿瘤负荷检测结果
Figure PCTCN2017101573-appb-000018
结果显示,患者临床用药前,确诊为胃癌,此时cfDNA拷贝数严重异常(图3S1),全基因组混乱度为999.84,血液中肿瘤负荷较严重。
伴随着用药,到第四周期cfDNA拷贝数正常,全基因组混乱度为728.80,和正常白细胞729.86接近。
用本实施例相同的方法,计算上述100例正常人的全基因组混乱度,正常范围为722.87-739.89,算数平均值733.22,本实施例第四用药周期及白细胞的全基因组混乱度值在正常范围内,说明血液中肿瘤负荷很小,与其临床评效结果PR(部分缓解)是对应的。
伴随进一步用药,肿瘤产生抗药性,cfDNA拷贝数异常情况又变严重,全基因组混乱度分值变大,血液中肿瘤负荷变严重,到用药第七周期,全基因组混乱度最高,与其临床评效结果PD(疾病进展)是对应的。
结果表明,基因组混乱度可有效鉴定体液中的肿瘤负荷。
在本发明提及的所有文献都在本申请中引用作为参考,就如同每一篇文献被单独引用作为参考那样。此外应理解,在阅读了本发明的上述讲授内容之后,本领域技术人员可以对本发明作各种改动或修改,这些等价形式同样落于本申请所附权利要求书所限定的范围。

Claims (10)

  1. 一种非诊断性地鉴定样本中肿瘤负荷的方法,其特征在于,包括步骤:
    (i)提供一待测样本;
    (ii)对所述待测样本进行测序,从而获得所述样本的基因组序列;
    (iii)将步骤(ii)获得的基因组序列与参考基因组进行比对,从而获得基因组序列在参考基因组上的位置信息;
    (iv)将所述的参考基因组分成M个区域片段,其中每个区域片段为一个窗口b,计算每个窗口b的拷贝数;
    (v)对步骤(iv)的每个窗口b进行Z检验,从而计算每个窗口b的Z值;和
    (vi)根据步骤(v)所得到的Z值,计算基因组混乱度(GAS),基于基因组混乱度的数值鉴定所述待测样本中的肿瘤负荷。
  2. 如权利要求1所述的方法,其特征在于,所述参考基因组包括全基因组。
  3. 如权利要求1或2所述的方法,其特征在于,所述参考基因组的覆盖率达到全基因组的50%以上,较佳地,60%以上,更佳地,70%以上,更佳地,80%以上,最佳地,95%以上。
  4. 如权利要求1所述的方法,其特征在于,所述样本选自下组:血液、血浆、组织间隙液、淋巴液、脑脊液、尿液、唾液、房水、精液、或其组合。
  5. 如权利要求1所述的方法,其特征在于,所述步骤(iv)还包括校正每个窗口b的拷贝数,计算每个窗口b校正后的拷贝数的步骤。
  6. 如权利要求1所述的方法,其特征在于,用下述公式计算每个窗口b的Z值:
    Figure PCTCN2017101573-appb-100001
    其中,i为1至M的任意正整数;M为参考基因组分成的窗口的总数量,其中M为≥50的正整数,较佳地,50≤M≤105,更佳地,100≤M≤105,最佳地,200≤M≤105;xi为所述待测样本在第i个窗口bi检测的拷贝数值;bi为第i个窗口;μi为正常对照样本在窗口bi的拷贝数的算术平均值,用如下公式计算:
    Figure PCTCN2017101573-appb-100002
    其中,j为1至N的任意正整数;N为正常对照样本的总数量,其中N为≥30的正整数,较佳地,30≤N≤108,更佳地,50≤N≤107,最佳地,100≤N≤104;Xj指第j个正常对照样本在所述窗口bi检测的拷贝数值;σi为正常对照样本在所述窗口bi的拷贝数的标准差,用如下公式计算:
    Figure PCTCN2017101573-appb-100003
    式中,N、j、Xj和μi的定义如上。
  7. 如权利要求1所述的方法,其特征在于,用下述公式计算基因组混乱度:
    Figure PCTCN2017101573-appb-100004
    其中,mb为排序在第m%的窗口,pb为排序在第p%的窗口,m为30-98,较佳地,40-97,更佳地,60-96,最佳地,80-95,最佳地,95,p为80-100,较佳地,85-100,更佳地,90-100,最佳地,100,且p-m≥2(较佳地,≥5,更佳地,≥10,更佳地,≥15,最佳地,≥20)。
  8. 如权利要求1所述的方法,其特征在于,所述步骤(v)之前还包括如下步骤:
    (iv1)根据步骤(iv)的每个窗口b的拷贝数,计算正常对照样本中每个窗口b的变异系数CVi
    (iv2)将所述CVi从小到大排序,去除最大的前n%的窗口,其中,n为大于0,小于等于5的任意数值,较佳地,n=1、2、2.5、3、3.1、4、4.2或5。
  9. 如权利要求8所述的方法,其特征在于,所述变异系数CVi用下述公式进行计算:
    Figure PCTCN2017101573-appb-100005
    其中,μi为正常对照样本拷贝数的算术平均值,用如下公式计算:
    Figure PCTCN2017101573-appb-100006
    σi为正常对照样本拷贝数的标准差,用如下公式计算:
    Figure PCTCN2017101573-appb-100007
    式中,N、j、Xj、μi和σi的定义如上。
  10. 一种用于鉴定样本中肿瘤负荷的系统,其特征在于,包括:
    测序单元,所述测序单元用于对待测样本进行核酸测序,从而获得所述样本的基因组序列;
    比对单元,所述比对单元与所述测序单元相连,用于将获得的所述样本的基因组序列与参考基因组进行比对,从而获得基因组序列在参考基因组上的位置信息;
    计算与检验单元,所述计算与检验单元和所述比对单元相连,用于计算所述参考基因组的每个窗口b的拷贝数,并对每个窗口进行Z检验,从而计算每个窗口b的Z值;以及
    鉴定单元,所述鉴定单元和所述计算与检验单元相连,用于根据所得到Z的值,计算基因组混乱度(GAS),并基于基因组混乱度的数值鉴定样本中的肿瘤负荷。
PCT/CN2017/101573 2016-09-22 2017-09-13 一种鉴定样本中肿瘤负荷的方法和系统 WO2018054254A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610842333.8 2016-09-22
CN201610842333.8A CN106367512A (zh) 2016-09-22 2016-09-22 一种鉴定样本中肿瘤负荷的方法和系统

Publications (1)

Publication Number Publication Date
WO2018054254A1 true WO2018054254A1 (zh) 2018-03-29

Family

ID=57898089

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/101573 WO2018054254A1 (zh) 2016-09-22 2017-09-13 一种鉴定样本中肿瘤负荷的方法和系统

Country Status (3)

Country Link
CN (1) CN106367512A (zh)
TW (1) TWI670495B (zh)
WO (1) WO2018054254A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110791564A (zh) * 2018-10-10 2020-02-14 杭州翱锐基因科技有限公司 早期癌症的分析方法和设备
CN114582427A (zh) * 2022-03-22 2022-06-03 成都基因汇科技有限公司 一种渐渗区段鉴定方法及计算机可读存储介质

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106367512A (zh) * 2016-09-22 2017-02-01 上海序康医疗科技有限公司 一种鉴定样本中肿瘤负荷的方法和系统
WO2018148903A1 (zh) * 2017-02-16 2018-08-23 上海亿康医学检验所有限公司 泌尿系统肿瘤的辅助诊断方法
CN106755547A (zh) * 2017-03-15 2017-05-31 上海亿康医学检验所有限公司 一种膀胱癌的无创检测及其复发监测方法
CN108229103B (zh) * 2018-01-15 2020-12-25 无锡臻和生物科技有限公司 循环肿瘤dna重复序列的处理方法及装置
CN108595918B (zh) * 2018-01-15 2021-03-16 无锡臻和生物科技有限公司 循环肿瘤dna重复序列的处理方法及装置
CN108319817B (zh) * 2018-01-15 2020-12-25 无锡臻和生物科技有限公司 循环肿瘤dna重复序列的处理方法及装置
CN111583992B (zh) * 2020-05-11 2023-08-29 广州金域医学检验中心有限公司 Rna水平融合基因突变导致肿瘤的负荷分析系统和方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013097062A1 (zh) * 2011-12-31 2013-07-04 深圳华大基因健康科技有限公司 一种遗传变异检测方法
CN104313136A (zh) * 2014-09-30 2015-01-28 江苏亿康基因科技有限公司 一种无创人肝癌早期检测与鉴别诊断方法及系统
CN105518151A (zh) * 2013-03-15 2016-04-20 莱兰斯坦福初级大学评议会 循环核酸肿瘤标志物的鉴别和用途
CN105574361A (zh) * 2015-11-05 2016-05-11 上海序康医疗科技有限公司 一种检测基因组拷贝数变异的方法
CN105844116A (zh) * 2016-03-18 2016-08-10 广州市锐博生物科技有限公司 测序数据的处理方法和处理装置
CN106367512A (zh) * 2016-09-22 2017-02-01 上海序康医疗科技有限公司 一种鉴定样本中肿瘤负荷的方法和系统

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100112590A1 (en) * 2007-07-23 2010-05-06 The Chinese University Of Hong Kong Diagnosing Fetal Chromosomal Aneuploidy Using Genomic Sequencing With Enrichment
CN104428425A (zh) * 2012-05-04 2015-03-18 考利达基因组股份有限公司 测定复杂肿瘤全基因组绝对拷贝数变异的方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013097062A1 (zh) * 2011-12-31 2013-07-04 深圳华大基因健康科技有限公司 一种遗传变异检测方法
CN105518151A (zh) * 2013-03-15 2016-04-20 莱兰斯坦福初级大学评议会 循环核酸肿瘤标志物的鉴别和用途
CN104313136A (zh) * 2014-09-30 2015-01-28 江苏亿康基因科技有限公司 一种无创人肝癌早期检测与鉴别诊断方法及系统
CN105574361A (zh) * 2015-11-05 2016-05-11 上海序康医疗科技有限公司 一种检测基因组拷贝数变异的方法
CN105844116A (zh) * 2016-03-18 2016-08-10 广州市锐博生物科技有限公司 测序数据的处理方法和处理装置
CN106367512A (zh) * 2016-09-22 2017-02-01 上海序康医疗科技有限公司 一种鉴定样本中肿瘤负荷的方法和系统

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
FADHAA, A. ET AL.: "Screening Tests for Disease Risk Haplotype Segments in Genome by Use of Permutation", JOURNAL OF SYSTEMS SCIENCE AND MATHEMATICAL SCIENCES, vol. 35, no. 12, 31 December 2015 (2015-12-31), pages 1402 - 1417 *
HEITZER, E. ET AL.: "Tumor-associated Copy Number Changes in the Circulation of Patients with Prostate Cancer Identified through Whole-genome Sequencing", GENOME MEDICINE, vol. 5, 5 April 2013 (2013-04-05), pages 1 - 16 *
LEARY, R.J. ET AL.: "Detection of Chromosomal Alterations in the Circulation of Cancer Patients with Whole-Genome Sequencing (manuscript version", SCI. TRANSL. MED., vol. 4, no. 162, 2 May 2013 (2013-05-02), pages 1 - 21 *
XIA, S. ET AL.: "Plasma Genetic and Genomic Abnormalities Predict Treatment Response and Clinical Outcome in Advanced Prostate Cancer", ONCOTARGET, vol. 6, no. 18, 15 April 2015 (2015-04-15), pages 16411 - 16421 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110791564A (zh) * 2018-10-10 2020-02-14 杭州翱锐基因科技有限公司 早期癌症的分析方法和设备
CN110791564B (zh) * 2018-10-10 2022-07-08 杭州翱锐基因科技有限公司 早期癌症的分析方法和设备
CN114582427A (zh) * 2022-03-22 2022-06-03 成都基因汇科技有限公司 一种渐渗区段鉴定方法及计算机可读存储介质

Also Published As

Publication number Publication date
TWI670495B (zh) 2019-09-01
TW201814290A (zh) 2018-04-16
CN106367512A (zh) 2017-02-01

Similar Documents

Publication Publication Date Title
WO2018054254A1 (zh) 一种鉴定样本中肿瘤负荷的方法和系统
JP7119014B2 (ja) まれな変異およびコピー数多型を検出するためのシステムおよび方法
US20220093212A1 (en) Size-based analysis of fetal dna fraction in plasma
KR102521842B1 (ko) 암 검출을 위한 혈장 dna의 돌연변이 분석
CN109880910A (zh) 一种肿瘤突变负荷的检测位点组合、检测方法、检测试剂盒及系统
EP2826865B1 (en) Method and system for determining whether copy number variation exists in sample genome, and computer readable medium
TW201833329A (zh) 腫瘤檢測之方法及系統
CN108256292B (zh) 一种拷贝数变异检测装置
TWI679280B (zh) 一種膀胱癌的無創檢測及其復發監測方法
TWI727938B (zh) 血漿粒線體dna分析之應用
CN116356001B (zh) 一种基于血液循环肿瘤dna的双重背景噪声突变去除方法
WO2018186687A1 (ko) 생물학적 시료의 핵산 품질을 결정하는 방법
US20230103637A1 (en) Sequencing of viral dna for predicting disease relapse
WO2018148903A1 (zh) 泌尿系统肿瘤的辅助诊断方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17852322

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17852322

Country of ref document: EP

Kind code of ref document: A1