TWI670495B - Method and system for identifying tumor burden in a sample - Google Patents
Method and system for identifying tumor burden in a sample Download PDFInfo
- Publication number
- TWI670495B TWI670495B TW106131581A TW106131581A TWI670495B TW I670495 B TWI670495 B TW I670495B TW 106131581 A TW106131581 A TW 106131581A TW 106131581 A TW106131581 A TW 106131581A TW I670495 B TWI670495 B TW I670495B
- Authority
- TW
- Taiwan
- Prior art keywords
- window
- sample
- value
- copy number
- genome
- Prior art date
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Physics & Mathematics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Analytical Chemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Organic Chemistry (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Medical Informatics (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Immunology (AREA)
- Molecular Biology (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
本發明提供了一種鑒定樣本中腫瘤負荷的方法和系統,具體地,本發明提供了一種非診斷性地鑒定樣本中腫瘤負荷的方法,包括步驟:(i)提供一待測樣本;(ii)對所述待測樣本進行測序,從而獲得所述樣本的基因組序列;(iii)將步驟(ii)獲得的基因組序列與參考基因組進行比對,從而獲得基因組序列在參考基因組上的位置資訊;(iv)將所述的參考基因組分成M個區域片段,其中每個區域片段為一個視窗b,計算每個視窗b的拷貝數;(v)對步驟(iv)的每個視窗b進行Z檢驗,從而計算每個視窗b的Z值;和(vi)根據步驟(v)所得到的Z值,計算基因組混亂度(GAS),基於基因組混亂度的數值鑒定所述待測樣本中的腫瘤負荷。本發明的方法和系統可提高腫瘤檢測的靈敏性和通用性。The present invention provides a method and system for identifying tumor burden in a sample. Specifically, the present invention provides a method for non-diagnosticly identifying tumor burden in a sample, comprising the steps of: (i) providing a sample to be tested; (ii) Sequencing the test sample to obtain a genomic sequence of the sample; (iii) comparing the genomic sequence obtained in step (ii) with a reference genome to obtain position information of the genomic sequence on the reference genome; iv) the reference gene component is divided into M region fragments, where each region fragment is a window b, and the copy number of each window b is calculated; (v) a Z-test is performed on each window b in step (iv), Thus, the Z value of each window b is calculated; and (vi) the genomic disorder (GAS) is calculated according to the Z value obtained in step (v), and the tumor burden in the test sample is identified based on the value of the genomic disorder. The method and system of the invention can improve the sensitivity and versatility of tumor detection.
Description
本領域涉及生物技術領域,具體地,涉及一種鑒定樣本中腫瘤負荷的方法和系統。The field relates to the field of biotechnology, and in particular, to a method and system for identifying tumor burden in a sample.
在生物醫學的科學研究及臨床應用領域,腫瘤患者的腫瘤細胞經常有大量的基因組拷貝數變異。拷貝數變異可存在於腫瘤組織、體液(如血液、組織間隙液、淋巴液、腦脊液、尿液、唾液等)中,體液中具體存在於游離的迴圈腫瘤細胞(CTC)、細胞外游離DNA(cfDNA)、外泌體等。體液中基因組拷貝數變異的情況是鑒定腫瘤負荷的重要指標,鑒定腫瘤負荷可應用於腫瘤早期篩查、診斷,患者的病情監控、預後治療等。 目前檢測腫瘤基因組拷貝數變異的主要方法有:比較基因組雜交(comparative genomic hybridization,CGH),螢光定量PCR(realtime fluorescence quantitative PCR,RTFQ PCR),螢光原位雜交(fluorescence in situ hybridization, FISH),多重連接探針擴增技術(multiplex ligation-dependent probe amplification ,MLPA)。 然而,比較基因組雜交解析度比較低,Mb級,通量低,成本高;螢光定量PCR同樣通量低,成本高,一次只能測一個拷貝數變異;螢光原位雜交,只針對特定位置,解析度低,探針雜交效率不穩定;多重連接探針擴增技術,操作複雜,通量低,成本高,覆蓋度小,易造成PCR污染。除上述技術上的缺陷,以上技術檢測大部分隻針對基因組上特定的區域,而腫瘤異質性很強,特定的一個或幾個位點不能有效綜合評價體液中腫瘤的負荷。 因此,本領域迫切需要開發一種能夠更有效綜合評價體液中腫瘤的負荷,提高腫瘤檢測的靈敏性和通用性的方法和設備。In the fields of scientific research and clinical application of biomedicine, tumor cells of tumor patients often have a large number of genomic copy number variations. Copy number variations can exist in tumor tissues, body fluids (such as blood, interstitial fluid, lymph fluid, cerebrospinal fluid, urine, saliva, etc.), and body fluids specifically exist in free circulating tumor cells (CTC), extracellular free DNA (cfDNA), exosomes, etc. The variation of genomic copy number in body fluids is an important indicator for the identification of tumor burden. Identification of tumor burden can be used in early screening, diagnosis, patient monitoring, and prognosis treatment of tumors. At present, the main methods for detecting copy number variations of tumor genomes are: comparative genomic hybridization (CGH), real-time quantitative quantitative PCR (RTFQ PCR), and fluorescence in situ hybridization (FISH) , Multiplex ligation-dependent probe amplification (MLPA). However, comparative genomic hybridization has relatively low resolution, Mb level, low throughput, and high cost; fluorescent quantitative PCR also has low throughput and high cost, and can only measure one copy number variation at a time; fluorescent in situ hybridization is only for specific Location, low resolution, and unstable probe hybridization efficiency. Multiplexed probe amplification technology is complicated in operation, low in throughput, high in cost, small in coverage, and easy to cause PCR contamination. In addition to the above-mentioned technical defects, most of the above techniques detect only specific regions on the genome, and tumors are very heterogeneous, and specific one or more sites cannot effectively comprehensively evaluate the tumor load in body fluids. Therefore, there is an urgent need in the art to develop a method and device that can more effectively comprehensively evaluate the burden of tumors in body fluids, and improve the sensitivity and versatility of tumor detection.
本發明提供一種能夠更有效綜合評價體液中腫瘤的負荷,提高腫瘤檢測的靈敏性和通用性的方法和設備。 本發明第一方面提供了一種非診斷性地鑒定樣本中腫瘤負荷的方法,包括步驟: (i)提供一待測樣本; (ii)對所述待測樣本進行測序,從而獲得所述樣本的基因組序列; (iii)將步驟(ii)獲得的基因組序列與參考基因組進行比對,從而獲得基因組序列在參考基因組上的位置資訊; (iv)將所述的參考基因組分成M個區域片段,其中每個區域片段為一個視窗b,計算每個視窗b的拷貝數; (v)對步驟(iv)的每個視窗 b進行Z檢驗,從而計算每個視窗b的Z值;和(vi)根據步驟(v)所得到的Z值,計算基因組混亂度(GAS),基於基因組混亂度的數值鑒定所述待測樣本中的腫瘤負荷。 The invention provides a method and a device that can more effectively comprehensively evaluate the load of tumors in body fluids and improve the sensitivity and versatility of tumor detection. A first aspect of the present invention provides a method for non-diagnosticly identifying a tumor burden in a sample, comprising the steps of: (i) providing a sample to be tested; (ii) sequencing the sample to be tested, thereby obtaining a sample of Genomic sequence; (iii) comparing the genomic sequence obtained in step (ii) with a reference genome to obtain position information of the genomic sequence on the reference genome; (iv) dividing the reference gene component into M region fragments, where Each region fragment is a window b, and the number of copies of each window b is calculated; (v) performs a Z-test on each window b of step (iv) to calculate the Z value of each window b; and (vi) according to The Z value obtained in step (v) is used to calculate the genomic disorder (GAS), and the tumor burden in the test sample is identified based on the value of the genomic disorder.
在另一優選例中,所述參考基因組可以是連續的,也可以是不連續的。 In another preferred example, the reference genome may be continuous or discontinuous.
在另一優選例中,所述參考基因組包括全基因組。 In another preferred example, the reference genome includes a whole genome.
在另一優選例中,所述參考基因組指該物種(如人)所有染色體的全長、單條或多條染色體的全長、單條或多條染色體的一部分、或其組合。 In another preferred example, the reference genome refers to the full length of all chromosomes of the species (such as a human), the full length of a single or multiple chromosomes, a portion of a single or multiple chromosomes, or a combination thereof.
在另一優選例中,所述參考基因組的覆蓋率達到全基因組的50%以上,較佳地,60%以上,更佳地,70%以上,更佳地,80%以上,最佳地,95%以上。 In another preferred example, the coverage of the reference genome reaches more than 50% of the whole genome, preferably 60% or more, more preferably 70% or more, more preferably 80% or more, most preferably, above 95.
在另一優選例中,所述樣本來自待檢測個體。 In another preferred example, the sample is from an individual to be tested.
在另一優選例中,所述待檢測個體為人或非人哺乳動物。 In another preferred example, the individual to be detected is a human or a non-human mammal.
在另一優選例中,所述樣本為固體樣本或液體樣本。 In another preferred example, the sample is a solid sample or a liquid sample.
在另一優選例中,所述樣本包括體液樣本。 In another preferred example, the sample includes a body fluid sample.
在另一優選例中,所述樣本選自下組:血液、血漿、組織間隙液、淋巴液、腦脊液、尿液、唾液、房水、精液、或其組合。 In another preferred example, the sample is selected from the group consisting of blood, plasma, interstitial fluid, lymph fluid, cerebrospinal fluid, urine, saliva, aqueous humor, semen, or a combination thereof.
在另一優選例中,所述樣本選自下組:游離的迴圈腫瘤細胞(CTC)、細胞外游離DNA(cfDNA)、外泌體、或其組合。 In another preferred example, the sample is selected from the group consisting of free circulating tumor cells (CTC), extracellular free DNA (cfDNA), exosomes, or a combination thereof.
在另一優選例中,所述測序選自下組:單端測序、雙端測序、或其組合。 In another preferred example, the sequencing is selected from the group consisting of single-ended sequencing, double-ended sequencing, or a combination thereof.
在另一優選例中,所述步驟(iv)還包括校正每個視窗b的拷貝數,計算每個視窗b校正後的拷貝數的步驟。 In another preferred example, the step (iv) further includes the steps of correcting the copy number of each window b and calculating the corrected copy number of each window b.
在另一優選例中,所述校正方法選自下組:Loess校正、權重法、殘差法、或其組合。 In another preferred example, the correction method is selected from the following group: Loess correction, weighting method, residual method, or a combination thereof.
在另一優選例中,根據基因組序列在參考基因組上的位置資訊,統計落到每個視窗b的序列數目、堿基分佈、參考基因組的堿基分佈。 In another preferred example, according to the position information of the genomic sequence on the reference genome, the number of sequences falling into each window b, the distribution of the base groups, and the base distribution of the reference genome are counted.
在另一優選例中,根據每個視窗b的序列及堿基含量,校正每個視窗b的拷貝數。 In another preferred example, the copy number of each window b is corrected according to the sequence and the fluorene content of each window b.
在另一優選例中,用下述公式計算每個視窗b的Z值:
在另一優選例中,所述正常對照樣本指同一物種的正常人的同類樣本。 In another preferred example, the normal control sample refers to a homogeneous sample of a normal person of the same species.
在另一優選例中,用下述公式計算基因組混亂度:
在另一優選例中,所述計算基因組混亂度之前,包括如下步驟:(a)根據參考基因組序列特徵去除基因組上著絲粒、端粒、隨體、異染色質等高通量測序測不到的區域,去除基因組上著絲粒、端粒、隨體、異染色質附近L長度的區域,L為小於3M的任何長度;或(b)根據樣本的拷貝數特徵去除基因組上著絲粒、端粒、隨體、異染色質等高通量測序測不到的區域。 In another preferred example, before calculating the degree of genomic confusion, the method includes the following steps: (a) removing centromeres, telomeres, satellites, heterochromatin and other high-throughput sequencing tests on the genome according to the characteristics of the reference genome sequence. To the region, remove the region of length L near the centromere, telomere, satellite, heterochromatin on the genome, where L is any length less than 3M; or (b) remove the centromere on the genome according to the copy number characteristics of the sample , Telomere, satellite, heterochromatin and other areas not detected by high-throughput sequencing.
在另一優選例中,所述步驟(v)之前還包括如下步 驟:(iv1)根據步驟(iv)的每個視窗b的拷貝數,計算正常對照樣本中每個視窗b的變異係數CVi;和(iv2)將所述CVi從小到大排序,去除最大的前n%的視窗,其中,n為大於0,小於等於5的任意數值,較佳地,n=1、2、2.5、3、3.1、4、4.2或5。 In another preferred example, before step (v), the method further includes the following steps: (iv1) calculating the coefficient of variation CV i of each window b in the normal control sample according to the copy number of each window b in step (iv) ; And (iv2) sort the CV i from small to large, removing the largest first n% of the window, where n is any value greater than 0 and less than or equal to 5, preferably n = 1, 2, 2.5, 3, 3.1, 4, 4.2 or 5.
在另一優選例中,所述變異係數CVi用下述公式進行計算:
本發明第二方面提供了一種用於鑒定樣本中腫瘤負荷的系統(設備),包括:測序單元,所述測序單元用於對待測樣本進行核酸測序,從而獲得所述樣本的基因組序列;比對單元,所述比對單元與所述測序單元相連,用於將獲得的所述樣本的基因組序列與參考基因組進行比對,從而獲得基因組序列在參考基因組上的位置資訊; 計算與檢驗單元,所述計算與檢驗單元和所述比對單元相連,用於計算所述參考基因組的每個視窗b的拷貝數,並對每個視窗進行Z檢驗,從而計算每個視窗b的Z值;以及 鑒定單元,所述鑒定單元和所述計算與檢驗單元相連,用於根據所得到Z的值,計算基因組混亂度(GAS),並基於基因組混亂度的數值鑒定樣本中的腫瘤負荷。 在另一優選例中,所述系統還包括校正單元,所述校正單元和所述計算與檢驗單元相連,用於校正所述參考基因組的每個視窗b的拷貝數,從而計算每個視窗b校正後的拷貝數。 在另一優選例中,在所述計算與檢驗單元中,在對每個視窗b進行Z檢驗前,可根據每個視窗b的拷貝數,計算每個視窗b的變異係數CVi ,並將所述CVi 從小到大排序,去除最大的前n%的視窗,其中,n為大於0,小於等於5的任意數值,較佳地,n=1、2、2.5、3、3.1、4、4.2或5。 應理解,在本發明範圍內中,本發明的上述各技術特徵和在下文(如實施例)中具體描述的各技術特徵之間都可以互相組合,從而構成新的或優選的技術方案。限於篇幅,在此不再一一累述。A second aspect of the present invention provides a system (equipment) for identifying a tumor burden in a sample, including: a sequencing unit, the sequencing unit is configured to perform nucleic acid sequencing on a sample to be tested, thereby obtaining a genomic sequence of the sample; A unit, the comparison unit is connected to the sequencing unit, and is configured to compare the obtained genomic sequence of the sample with a reference genome, thereby obtaining position information of the genomic sequence on the reference genome; a calculation and inspection unit, all The calculation is connected with the inspection unit and the comparison unit, and is used for calculating the copy number of each window b of the reference genome, and performing a Z test on each window, thereby calculating the Z value of each window b; and identifying A unit, the identification unit and the calculation and inspection unit are connected to calculate a genomic disorder (GAS) based on the obtained value of Z, and identify a tumor burden in the sample based on the value of the genomic disorder. In another preferred example, the system further includes a correction unit, and the correction unit and the calculation and inspection unit are connected to correct a copy number of each window b of the reference genome, thereby calculating each window b Corrected copy number. In another preferred example, in the calculation and inspection unit, before performing the Z test on each window b, the coefficient of variation CV i of each window b may be calculated according to the copy number of each window b, and The CV i is sorted from small to large, removing the largest first n% of the window, where n is any value greater than 0 and less than or equal to 5, preferably, n = 1, 2, 2.5, 3, 3.1, 4, 4.2 or 5. It should be understood that, within the scope of the present invention, the above technical features of the present invention and the technical features specifically described in the following (such as the embodiments) may be combined with each other to form a new or preferred technical solution. Due to space limitations, I will not repeat them here.
本發明人通過廣泛而深入的研究,首次建立了一種有效且可提高腫瘤檢測的靈敏性和通用性的鑒定樣本中腫瘤負荷的方法,具體地,通過計算基因組混亂度(GAS),從而基於基因組混亂度的數值鑒定樣本中的腫瘤負荷。 此外,本發明還提供了一種鑒定樣本中腫瘤負荷的系統(設備),所述系統(設備)包括:測序單元;比對單元;計算與檢驗單元和鑒定單元。在本發明的一個優選例中,還包括校正單元。在此基礎上,本發明人完成了本發明。術語
如本文所用,術語“拷貝數變異(Copy Number Variations,CNV)”是指樣本基因組染色體或染色體片段拷貝數異常,包括但不限於染色體非整倍體、缺失、重複,大於1000bp堿基的微缺失、微重複。 如本文所用,術語“基因組混亂度值(Genomic Abnormality Score,GAS)”是根據樣本基因組染色體或染色體片段拷貝數異常計算得到的分值,分值檢測範圍包括但不限於全基因組、特定的染色體、染色體片段、特定基因。 如本文所用,術語“Z值(Z-score)”也叫標準分值(standard score),是一個數值與平均數的差再除以標準差的過程。用公式表示為: Z score=(x-μ)/σ 其中x為某一具體數值,μ為算術平均值,σ為標準差;Z值代表著原始數值和參考平均值之間的距離,是以標準差為單位計算。 如本文所用,術語“部分緩解(PR, partial response)”指靶病灶最大徑之和減少≥30%,至少維持4周。 如本文所用,術語“疾病進展(PD, progressive disease)”指靶病灶最大徑之和至少增加≥20%,或出現新病灶。 如本文所用,術語“系統”、“設備”為相同含義。參考基因組
在本發明中,以人為例,所述參考基因組可以是全基因組,也可以是部分基因組。並且,所述參考基因組可以是連續的,也可以是不連續的。當所述參考基因組為部分基因組時,所述參考基因組的總覆蓋率(F)為全基因組的50%以上,較佳地,較佳地,60%以上,更佳地,70%以上,更佳地,80%以上,最佳地,95%以上,其中,所述總覆蓋率(F)指參考基因組占全基因組的百分比。 在一優選實施方式中,所述參考基因組為全基因組。 在一優選實施方式中,所述參考基因組為該物種(如人)所有染色體的全長、單條或多條染色體的全長、單條或多條染色體的一部分、或其組合。腫瘤負荷
在本發明中,所述“腫瘤負荷”指腫瘤對機體的危害程度,比如腫瘤的大小,腫瘤的活躍程度,腫瘤的轉移情況,不同部位的腫瘤對機體的危險程度。一些評價腫瘤負荷的指標包括(但不限於):腫瘤大小、腫瘤標記物高低、臨床症狀(喘憋、疼痛等等)、相關併發症(上腔靜脈綜合征等)、消耗情況(貧血、低蛋白血症等)。測序
在本發明中,可用常規的測序技術和平臺進行測序。測序平臺不受特別限制,其中第二代測序平臺包括(但不限於):Illumina公司的GA、GAII、GAIIx、HiSeq1000/2000/2500/3000/4000、X Ten、X Five、NextSeq500/550、MiSeq、MiSeqDx、MiSeq FGx、MiniSeq;Applied Biosystems的SOLiD;Roche的454 FLX;Thermo Fisher Scientific(Life Technologies)的Ion Torrent、Ion PGM、Ion Proton I/II;華大基因的BGISEQ1000、BGISEQ500、BGISEQ100;博奧生物集團的BioelectronSeq 4000;中山大學達安基因股份有限公司的DA8600;貝瑞和康的NextSeq CN500;紫鑫藥業旗下子公司中科紫鑫的BIGIS;華因康基因HYK-PSTAR-IIA。 第三代單分子測序平臺包括(但不限於):Helicos BioSciences公司的HeliScope系統,Pacific Bioscience的SMRT系統,Oxford Nanopore Technologies的GridION、MinION。測序類型可為單端(Single End)測序或雙端(Paired End)測序,測序長度可為30bp、40bp、50bp、100bp、300bp等大於30bp的任意長度,測序深度可為基因組的0.01、0.02、0.1、1、5、10、30倍等大於0.01的任意倍數。 在本發明中,優選Illumina公司的HiSeq2500高通量測序平臺,測序類型為單端(Single End)測序,測序長度41bp,測序數據量為5M。資料處理
在本發明中,資料處理通常包括以下步驟: (a)對待測樣本的基因組進行核酸提取、測序,以獲得基因組序列; (b)將所述樣本的基因組序列比對到參考基因組,得到序列在參考基因組上的位置; (c)將參考基因組分成一定長度的視窗,計算每個視窗b的拷貝數; (d)對每個視窗b進行Z檢驗,計算每個視窗的Z值;和 (e)計算基因組混亂度(GAS)。 其中,在步驟(a)中,具體還包括:所述待測樣本的類型為體液,體液可以是血液、組織間隙液(簡稱組織液或細胞間液)、淋巴液、腦脊液、尿液、唾液,檢測目標為體液中含有的DNA,DNA具體存在於游離的迴圈腫瘤細胞(CTC)、細胞外游離DNA(cfDNA)、外泌體等。所述待測樣本DNA的提取方式包括(但不限於):柱式提取、磁珠提取。對樣本進行文庫構建,採用高通量測序平臺,對樣本進行測序。 其中,在步驟(b)中,具體還包括:將測序結果去掉接頭及低質量數據,比對到參考基因組。參考基因組可為全基因組、任意染色體、染色體的一部分。參考基因組通常選擇已被公認確定的序列,如人的基因組可為NCBI或UCSC的hg18(GRCh18)、hg19(GRCh19)、hg38(GRCh38),或任意一條染色體及染色體的一部分。比對軟體可用任何一種免費或商務軟體,如BWA(Burrows-Wheeler Alignment tool)、SOAPaligner/soap2 (Short Oligonucleotide Analysis Package)、Bowtie/Bowtie2。將序列比對到參考基因組,得到序列在基因組上的位置。可以選擇在基因組上唯一比對的序列,去除基因組上多處比對的序列,消除重複序列對拷貝數計算帶來的誤差。 其中,在步驟(c)中,具體還包括:將基因組分成一定長度的視窗,根據測的資料量,視窗長度也可以為100bp-3,000,000bp(3M)範圍內相同或不同的整數。視窗的數量可以是1,000-30,000,000範圍內的任意整數。根據測的序列在基因組上的位置,統計落到每個視窗的序列數目、堿基分佈、參考基因組的堿基分佈。根據每個視窗的序列及堿基GC含量,校正每個視窗的拷貝數,校正方法包括但不限於Loess校正,計算每個視窗校正後的拷貝數。 其中,在步驟(d)中,具體還包括:取N(N為不少於30的自然數)個正常人的樣本,同樣的提取、建庫、測序條件,重複上述步驟(a)-(c),作為參考資料集。對於每個視窗bi
,都對應N個正常拷貝數值。 計算正常對照樣本拷貝數的算術平均值μi
,算術平均值μi
計算公式為:; 計算正常對照樣本拷貝數的標準差σi
,標準差的計算公式為:; X₁,X₂,X₃,......Xj
為正常樣本的拷貝數值。 計算待檢測樣本每個視窗bi
的Z值,Z值的計算公式為:; xi
為視窗bi
檢測的拷貝數值。 其中,在步驟(e)中,具體還包括:在整個基因組、某條染色體、染色體片段或基因周圍存在高重複區域,如近著絲粒、端粒、隨體、異染色質等區域。首先去除高重複區域,以消除對混亂度計算的影響。 在一優選實施方式中,去除的方法包括(但不限於): a. 根據參考基因組序列特徵去除 去除基因組上著絲粒、端粒、隨體、異染色質等高通量測序測不到的區域,去除基因組上著絲粒、端粒、隨體、異染色質附近L長度的區域,L可以為小於3M的任何長度;或 b. 根據正常樣本的拷貝數特徵去除 對於每個視窗bi,計算正常對照樣本在這個視窗的變異係數CVi
(Coefficient of Variation),CVi
計算公式為:; μi
為正常對照樣本拷貝數的算術平均值,σi
為正常對照樣本拷貝數的標準差。 CV從小到大排序,去除最大的前n%的視窗,n可以為大於0,小於等於5的任意數值。 其中,在步驟(e)中,具體還包括基因組混亂度(GAS)的計算方式: 首先確定混亂度的檢測範圍,檢測範圍包括但不限於整個基因組、特定染色體、特定染色體片段或特定的基因等1M到基因組長度(如人的基因組約3G)範圍內的任意值。在混亂度檢測範圍內,去除重複序列影響的視窗的Z值取絕對值,Z值絕對值從小到大排序,並將排好序的Z值絕對值平均分配到0%-100%範圍內,其中Z值絕對值最小值被分配至0%,Z值絕對值的最大值被分配給100%。計算對應於第m%到第p%範圍內的各視窗Z值絕對值的累計值,其中,m為30-98,較佳地,40-97,更佳地,60-96,最佳地,80-95,最佳地,95;p為80-100,較佳地,85-100,更佳地,90-100,最佳地,100,且p-m2(較佳地5,更佳地10,更佳地15,最佳地20),所述的累計值即為基因組混亂度(GAS),計算公式為:
在本發明中,提供了一種有效且可提高腫瘤檢測的靈敏性和通用性的鑒定樣本中腫瘤負荷的方法,包括步驟:(i)提供一待測樣本;(ii)對所述待測樣本進行測序,從而獲得所述樣本的基因組序列;(iii)將步驟(ii)獲得的基因組序列與參考基因組進行比對,從而獲得基因組序列在參考基因組上的位置資訊;(iv)將所述的參考基因組分成M個區域片段,其中每個區域片段為一個視窗b,計算每個視窗b的拷貝數;(v)對步驟(iv)的每個視窗b進行Z檢驗,從而計算每個視窗b的Z值;和 (vi)根據步驟(v)所得到的Z值,計算基因組混亂度(GAS),基於基因組混亂度的數值鑒定所述待測樣本中的腫瘤負荷。 In the present invention, a method for identifying tumor burden in a sample that is effective and can improve the sensitivity and versatility of tumor detection, includes the steps of: (i) providing a sample to be tested; (ii) analyzing the sample to be tested Performing sequencing to obtain the genomic sequence of the sample; (iii) comparing the genomic sequence obtained in step (ii) with a reference genome to obtain position information of the genomic sequence on the reference genome; (iv) comparing the The reference gene component is divided into M region fragments, where each region fragment is a window b, and the copy number of each window b is calculated; (v) performing a Z test on each window b in step (iv) to calculate each window b Z value; and (vi) Calculate genomic disorder (GAS) based on the Z value obtained in step (v), and identify the tumor burden in the test sample based on the value of the genomic disorder.
在本發明的一個優選例中,所述方法包括步驟:(a)對樣本基因組進行核酸提取、測序,以獲得基因組序列;(b)將序列比對到參考基因組,得到序列在基因組上的位置;(c)將參考基因組分成一定長度的視窗b,計算每個視窗b的拷貝數;以及(d)對每個視窗b進行Z檢驗,計算每個視窗b的Z值;計算基因組混亂度(GAS),從而基於基因組混亂度的數值鑒定樣本中的腫瘤負荷。 In a preferred example of the present invention, the method includes the steps of: (a) performing nucleic acid extraction and sequencing on a sample genome to obtain a genomic sequence; (b) aligning the sequence to a reference genome to obtain the position of the sequence on the genome ; (C) divide the reference gene component into a certain length of window b, calculate the copy number of each window b; and (d) perform a Z test on each window b, calculate the Z value of each window b, and calculate the degree of genome confusion ( GAS) to identify tumor burden in a sample based on numerical values of genomic disruption.
在本發明中,還提供了一種鑒定樣本中腫瘤負荷的系統(設備),包括:測序單元,所述測序單元用於對待測樣本進行核酸測序,從而獲得所述樣本的基因組序列;比對單元,所述比對單元與所述測序單元相連,用於將獲得的所述樣本的基因組序列與參考基因組進行比對,從而獲得基因組序列在參考基因組上的位置資訊;計算與檢驗單元,所述計算與檢驗單元和所述比對單元相連,用於計算所述參考基因組的每個視窗b的拷貝數,並對每個視窗進行Z檢驗,從而計算每個視窗b的Z值;以及 鑒定單元,所述鑒定單元和所述計算與檢驗單元相連,用於根據所得到Z的值,計算基因組混亂度(GAS),並基於基因組混亂度的數值鑒定樣本中的腫瘤負荷。 在一優選實施方式中,所述系統還包括校正單元,所述校正單元和所述計算與檢驗單元相連,用於校正所述參考基因組的每個視窗b的拷貝數,從而計算每個視窗b校正後的拷貝數。 本發明的主要優點包括: (1)本發明首次建立一種鑒定樣本中腫瘤負荷的方法和系統,本發明的方法和系統可準確、有效的鑒定樣本中腫瘤負荷。 (2)本發明的方法和系統可提高腫瘤檢測的靈敏性和通用性。 (3)本發明的方法和系統可減少腫瘤患者檢測時取樣帶來的痛苦,實現無創檢測。 (4)本發明的方法和系統可有效的檢測某些常規檢測無法取樣的患者; (5)本發明的方法和系統可對腫瘤患者即時檢測,監測用藥療效,對醫生用藥、治療做出一定的指導。 下面結合具體實施例,進一步陳述本發明。應理解,這些實施例僅用於說明本發明而不用於限制本發明的範圍。下列實施例中未注明詳細條件的實驗方法,通常按照常規條件如Sambrook等人,分子克隆:實驗室手冊(New York:Cold Spring Harbor Laboratory Press,1989)中所述的條件,或按照製造廠商所建議的條件。除非另外說明,否則百分比和份數按重量計算。 除非有特別說明,否則實施例所用的材料均為市售產品。實施例 1 本發明已經應用到15個例子,並取得良好的效果。為了使本發明的用法和效果更加易於理解和掌握,下面將舉一個實例進行進一步的闡述。實施的簡要流程圖如圖1所示,詳細實施過程如下:1 .對樣本基因組進行核酸提取、測序 在本實施例中,檢測樣本來源為某胃癌患者血液,提取血液中游離DNA(cfDNA)及白細胞。核酸提取採用康為世紀生物科技有限公司的CW2603核酸提取試劑盒,提取方法按照康為世紀生物科技有限公司提供的產品說明書操作。 採用康為世紀生物科技有限公司的CW2185建庫試劑盒進行文庫構建,上機測序。上機測序採用Illumina公司的HiSeq2500高通量測序平臺,按照Illumina公司提供的說明書操作。測序類型為單端(Single End)測序,測序長度41bp,測序數據量為5M。In the present invention, a system (equipment) for identifying tumor burden in a sample is also provided, including: a sequencing unit, the sequencing unit is configured to perform nucleic acid sequencing on a sample to be tested, thereby obtaining a genomic sequence of the sample; an alignment unit The comparison unit is connected to the sequencing unit, and is configured to compare the obtained genomic sequence of the sample with a reference genome, thereby obtaining position information of the genomic sequence on the reference genome; a calculation and inspection unit, said The calculation is connected to the inspection unit and the comparison unit, for calculating the copy number of each window b of the reference genome, and performing a Z test on each window, thereby calculating a Z value of each window b; and an identification unit; The identification unit and the calculation and inspection unit are connected to calculate a genomic disorder (GAS) based on the obtained value of Z, and identify a tumor burden in the sample based on the value of the genomic disorder. In a preferred embodiment, the system further includes a correction unit, the correction unit and the calculation and inspection unit are connected to correct a copy number of each window b of the reference genome, thereby calculating each window b Corrected copy number. The main advantages of the present invention include: (1) The present invention establishes a method and system for identifying tumor load in a sample for the first time. The method and system of the present invention can accurately and effectively identify tumor load in a sample. (2) The method and system of the present invention can improve the sensitivity and versatility of tumor detection. (3) The method and system of the present invention can reduce the pain caused by sampling during the detection of tumor patients and realize non-invasive detection. (4) The method and system of the present invention can effectively detect some patients who cannot be sampled by conventional tests; (5) The method and system of the present invention can detect tumor patients in real time, monitor the efficacy of medication, and make certain decisions for doctors' medication and treatment Guidance. The present invention is further described below in conjunction with specific embodiments. It should be understood that these examples are only used to illustrate the present invention and not to limit the scope of the present invention. The experimental methods without detailed conditions in the following examples are generally performed according to conventional conditions such as those described in Sambrook et al., Molecular Cloning: Laboratory Manual (New York: Cold Spring Harbor Laboratory Press, 1989), or according to the manufacturer Suggested conditions. Unless stated otherwise, percentages and parts are by weight. Unless otherwise specified, the materials used in the examples are all commercially available products. Example 1 The present invention has been applied to 15 examples and achieved good results. In order to make the usage and effect of the present invention easier to understand and master, an example will be further described below. A brief flow chart of the implementation is shown in Figure 1. The detailed implementation process is as follows: 1 . Nucleic acid extraction and sequencing of the sample genome In this embodiment, the source of the test sample is blood from a gastric cancer patient, and free DNA (cfDNA) and white blood cells are extracted from the blood. Nucleic acid extraction uses Kangwei Century Biotechnology Co., Ltd.'s CW2603 nucleic acid extraction kit. The extraction method is based on the product instructions provided by Kangwei Century Biotechnology Co., Ltd. The library was constructed using Kangwei Century Biotechnology Co., Ltd.'s CW2185 library construction kit, and sequenced on the computer. HiSeq2500 high-throughput sequencing platform from Illumina was used for sequencing on the machine, and the instructions provided by Illumina were used. The sequencing type is Single End sequencing, the sequencing length is 41bp, and the amount of sequencing data is 5M.
將測序結果去掉接頭及低質量數據,比對到參考基因組。參考基因組為人的基因組UCSC的hg19(GRCh19),比對軟體為BWA(Burrows-Wheeler Alignment tool),採用默認參數,將序列比對到參考基因組,得到序列在基因組上的位置,選擇在基因組上唯一比對的序列。 The sequencing results were removed from the adapter and low-quality data, and compared to the reference genome. The reference genome is the human genome UCSC hg19 (GRCh19), and the alignment software is BWA (Burrows-Wheeler Alignment tool). Using default parameters, the sequences are aligned to the reference genome to obtain the position of the sequence on the genome and select the genome Unique aligned sequences.
將基因組分成15489個視窗b(區域),每個視窗b長度為200K,根據序列在基因組上的位置,統計落到每個視窗b的序列數目、堿基分佈、參考基因組的堿基分佈。根據每個視窗b的序列及堿基GC含量,校正每個視窗b的拷貝數,校正方法為Loess,計算每個視窗b校正後的拷貝數。 The gene group is divided into 15489 windows b (areas), each window b is 200K in length. According to the position of the sequence in the genome, the number of sequences that fall into each window b, the distribution of base groups, and the base distribution of the reference genome are counted. According to the sequence of each window b and the base GC content, the copy number of each window b is corrected. The correction method is Loess, and the corrected copy number of each window b is calculated.
取100個正常人的樣本,同樣的提取、建庫、測序條件,重複上述1、2、3步驟,獲得正常對照樣本資料,作為參考資料集,計算待檢測樣本每個視窗bi的CV值。 Take 100 normal human samples and repeat the above steps 1, 2, and 3 for the same extraction, database, and sequencing conditions to obtain normal control sample data. As a reference data set, calculate the CV value of each window b i of the sample to be tested. .
對於每個視窗bi,都對應N(本實施例N=100)個正常拷貝數值。 For each window b i , there are N (N = 100 in this embodiment) normal copy values.
計算正常對照樣本拷貝數的算術平均值μi,算術平均
值μi計算公式為:
計算待檢測樣本每個視窗bi的CV值,CV值的計算公式為:
計算待檢測樣本每個視窗bi的Z值,Z值的計算公式為:
在本實施例中,每個視窗CV從小到大排序,去除最大的前5%的視窗,不參與以下混亂度計算。混亂度的檢測範圍為整個基因組;Z值取絕對值,並從小到大排序,計算第m%到第p%視窗Z值絕對值的累計值,其累計值即為基因組混亂度(GAS)。計算公式為:; mb
為排序在第m%的視窗,pb
為排序在第p%的視窗,其中,m為95,p為100。 用GAS的值鑒定體液中腫瘤負荷。7. 檢測結果
對十幾個樣本進行檢測。一個典型病理的情況如下所示。 檢測結果如表1、圖2和圖3所示。 表1 實施例1對某胃癌患者的臨床用藥效果做腫瘤負荷檢測結果
圖1顯示了體液中鑒定腫瘤負荷的分析方法流程圖。 圖2顯示了患者不同臨床用藥週期的腫瘤負荷檢測結果。 圖3顯示了S1-7全基因組拷貝數變異及對應的GAS。Figure 1 shows a flow chart of an analytical method for identifying tumor burden in body fluids. Figure 2 shows the tumor load test results of patients in different clinical medication cycles. Figure 3 shows the genome-wide copy number variation of S1-7 and the corresponding GAS.
Claims (10)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
??201610842333.8 | 2016-09-22 | ||
CN201610842333.8A CN106367512A (en) | 2016-09-22 | 2016-09-22 | Method and system for identifying tumor loads in samples |
Publications (2)
Publication Number | Publication Date |
---|---|
TW201814290A TW201814290A (en) | 2018-04-16 |
TWI670495B true TWI670495B (en) | 2019-09-01 |
Family
ID=57898089
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW106131581A TWI670495B (en) | 2016-09-22 | 2017-09-14 | Method and system for identifying tumor burden in a sample |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN106367512A (en) |
TW (1) | TWI670495B (en) |
WO (1) | WO2018054254A1 (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106367512A (en) * | 2016-09-22 | 2017-02-01 | 上海序康医疗科技有限公司 | Method and system for identifying tumor loads in samples |
WO2018148903A1 (en) * | 2017-02-16 | 2018-08-23 | 上海亿康医学检验所有限公司 | Auxiliary diagnosis method for urinary system tumours |
CN106755547A (en) * | 2017-03-15 | 2017-05-31 | 上海亿康医学检验所有限公司 | The Non-invasive detection and its recurrence monitoring method of a kind of carcinoma of urinary bladder |
CN108319817B (en) * | 2018-01-15 | 2020-12-25 | 无锡臻和生物科技有限公司 | Method and device for processing circulating tumor DNA repetitive sequence |
CN108229103B (en) * | 2018-01-15 | 2020-12-25 | 无锡臻和生物科技有限公司 | Method and device for processing circulating tumor DNA repetitive sequence |
CN108595918B (en) * | 2018-01-15 | 2021-03-16 | 无锡臻和生物科技有限公司 | Method and device for processing circulating tumor DNA repetitive sequence |
CN109182526A (en) * | 2018-10-10 | 2019-01-11 | 杭州翱锐生物科技有限公司 | Kit and its detection method for early liver cancer auxiliary diagnosis |
CN111583992B (en) * | 2020-05-11 | 2023-08-29 | 广州金域医学检验中心有限公司 | RNA level fusion gene mutation-caused tumor load analysis system and method |
CN114582427B (en) * | 2022-03-22 | 2023-04-07 | 成都基因汇科技有限公司 | Method for identifying introgression section and computer readable storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104428425A (en) * | 2012-05-04 | 2015-03-18 | 考利达基因组股份有限公司 | Methods for determining absolute genome-wide copy number variations of complex tumors |
US9121069B2 (en) * | 2007-07-23 | 2015-09-01 | The Chinese University Of Hong Kong | Diagnosing cancer using genomic sequencing |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104204220B (en) * | 2011-12-31 | 2017-06-06 | 深圳华大基因股份有限公司 | A kind of hereditary variation detection method |
CN113337604A (en) * | 2013-03-15 | 2021-09-03 | 莱兰斯坦福初级大学评议会 | Identification and use of circulating nucleic acid tumor markers |
CN104313136A (en) * | 2014-09-30 | 2015-01-28 | 江苏亿康基因科技有限公司 | Noninvasive human liver cancer early detection and differential diagnosis method and system |
CN105574361B (en) * | 2015-11-05 | 2018-11-02 | 上海序康医疗科技有限公司 | A method of detection genome copies number variation |
CN105844116B (en) * | 2016-03-18 | 2018-02-27 | 广州市锐博生物科技有限公司 | The processing method and processing unit of sequencing data |
CN106367512A (en) * | 2016-09-22 | 2017-02-01 | 上海序康医疗科技有限公司 | Method and system for identifying tumor loads in samples |
-
2016
- 2016-09-22 CN CN201610842333.8A patent/CN106367512A/en active Pending
-
2017
- 2017-09-13 WO PCT/CN2017/101573 patent/WO2018054254A1/en active Application Filing
- 2017-09-14 TW TW106131581A patent/TWI670495B/en not_active IP Right Cessation
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9121069B2 (en) * | 2007-07-23 | 2015-09-01 | The Chinese University Of Hong Kong | Diagnosing cancer using genomic sequencing |
CN104428425A (en) * | 2012-05-04 | 2015-03-18 | 考利达基因组股份有限公司 | Methods for determining absolute genome-wide copy number variations of complex tumors |
Non-Patent Citations (1)
Title |
---|
Leary, Rebecca J., et al. "Detection of chromosomal alterations in the circulation of cancer patients with whole-genome sequencing." Science translational medicine 4.162 (2012): 162ra154-162ra154. * |
Also Published As
Publication number | Publication date |
---|---|
WO2018054254A1 (en) | 2018-03-29 |
CN106367512A (en) | 2017-02-01 |
TW201814290A (en) | 2018-04-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI670495B (en) | Method and system for identifying tumor burden in a sample | |
US11031100B2 (en) | Size-based sequencing analysis of cell-free tumor DNA for classifying level of cancer | |
TWI636255B (en) | Mutational analysis of plasma dna for cancer detection | |
CN112805563A (en) | Cell-free DNA for assessing and/or treating cancer | |
TW201833329A (en) | Methods and systems for tumor detection | |
JP2019531700A5 (en) | ||
TWI679280B (en) | Non-invasive detection of bladder cancer and method for monitoring its recurrence | |
CN110800063A (en) | Detection of tumor-associated variants using cell-free DNA fragment size | |
TWI727938B (en) | Applications of plasma mitochondrial dna analysis | |
CN107849569B (en) | Lung adenocarcinoma biomarker and application thereof | |
WO2021139716A1 (en) | Biterminal dna fragment types in cell-free samples and uses thereof | |
CN107760688A (en) | A kind of BRCA2 gene mutation bodies and its application | |
CN107723370A (en) | A kind of fluorescence quantitative PCR detection system and its application for nasopharyngeal carcinoma gene screening | |
JP7170711B2 (en) | Use of off-target sequences for DNA analysis | |
WO2018186687A1 (en) | Method for determining nucleic acid quality of biological sample | |
US20230103637A1 (en) | Sequencing of viral dna for predicting disease relapse | |
WO2018148903A1 (en) | Auxiliary diagnosis method for urinary system tumours | |
WO2024118500A2 (en) | Methods for detecting and treating ovarian cancer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
MM4A | Annulment or lapse of patent due to non-payment of fees |