WO2015006932A1 - 一种染色体非整倍性检测方法及装置 - Google Patents
一种染色体非整倍性检测方法及装置 Download PDFInfo
- Publication number
- WO2015006932A1 WO2015006932A1 PCT/CN2013/079495 CN2013079495W WO2015006932A1 WO 2015006932 A1 WO2015006932 A1 WO 2015006932A1 CN 2013079495 W CN2013079495 W CN 2013079495W WO 2015006932 A1 WO2015006932 A1 WO 2015006932A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- window
- sample
- chromosome
- target
- sequencing
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2535/00—Reactions characterised by the assay type for determining the identity of a nucleotide base or a sequence of oligonucleotides
- C12Q2535/101—Sanger sequencing method, i.e. oligonucleotide sequencing using primer elongation and dideoxynucleotides as chain terminators
Definitions
- the invention relates to the technical field of genomics and bioinformatics, and particularly relates to a method and a device for detecting chromosome aneuploidy.
- Chromosomes are the basic substances that make up the nucleus.
- the normal human somatic chromosome number is 46, and has a certain shape and structure.
- Karyotype usually refers to the phenotypic characteristics of chromosomes, such as number, length, etc.
- karyotype detection can reflect abnormalities of chromosomes on a large scale, such as aneuploidy (deletion or duplication) of a chromosome, in genetic studies It plays an important role, for example, the karyotype detection of the fetal chromosome helps to reduce the risk of birth.
- non-invasive prenatal techniques include: 1) using serum markers such as adenine protein (AFP), free estriol ( ⁇ -HCG) and pregnancy-related protein ( ⁇ - ⁇ ) to detect serum and urine components of pregnant women, To calculate the risk of Down syndrome; 2) Visually screen the fetus by means of physics, such as sputum, X-ray, CT, magnetic resonance, etc.; 3) Genetics of the gametes before or after implantation into the uterine cavity Analysis of preimplantation genetic diagnosis (PGD) and the like.
- Invasive prenatal techniques include villus biopsy in the first trimester, fetal cord blood puncture in the second trimester, amniocentesis, fetal microscopy, and embryo biopsy.
- test results of non-invasive prenatal technology are not reliable enough, and the false positive rate and false negative rate are both high, while the pre-natal prenatal technical accuracy is high, but it is easy to bring risks to pregnant women and fetuses, such as abortion or amniocentesis.
- a method for detecting chromosomal aneuploidy includes the steps of: obtaining a distribution of a sequencing result of a test sample on a reference sequence, the test sample comprising a target sample from M target individuals and from N
- M and N are positive integers
- the sequencing result includes multiple read length sequences
- the reference sequence is divided into multiple windows. The distribution of the sequencing results of the test samples on the reference sequence appears to fall in each window.
- a chromosomal aneuploidy detecting apparatus includes: a data input unit for inputting data; a data output unit for outputting data; and a storage unit for storing data, including executable
- the program is connected to the data input unit, the data output unit, and the storage unit for executing an executable program stored in the storage unit, and the execution of the program includes completing the above-described method for detecting chromosome aneuploidy.
- a computer readable storage medium for storing a program for execution by a computer
- the storage medium may include: a read only memory, a random access memory, a magnetic disk or an optical disk, and the like.
- the method according to the present invention provides a method for detecting chromosome aneuploidy based on sequencing technology by reflecting the design of the deviation statistic reflecting the absence or repetition of the test sample and the reference chromosome, and is capable of sensitively detecting the abnormal number of any chromosome. .
- FIG. 1 is a schematic flow chart of a detecting method according to an embodiment of the present invention
- FIG. 2 is a schematic flow chart of a window dividing method according to another embodiment of the present invention
- Fig. 3 is a flow chart showing a GC correction method in another embodiment of the present invention.
- a method for detecting chromosome aneuploidy includes the following steps:
- the test sample contains a target sample from M target individuals and a control sample from N normal individuals, and M and N are positive integers.
- the target individual refers to an individual who needs to be tested, such as a pregnant woman who needs to perform prenatal testing, and a normal individual refers to a predetermined normal individual.
- the target individual is the same species as the normal individual, and preferably has an approximate basic state.
- the normal individual may be a normal pregnant woman with a normal fetus close to the gestational age.
- the source of the target sample and the control sample is not particularly limited, and may be, for example, selected from the group consisting of: pregnant women's peripheral blood, pregnant women's urine, pregnant women's cervical fetal trophoblasts, pregnant women's cervical mucus, fetal nucleated red blood cells, etc., as long as they can A nucleic acid sample containing fetal genetic information can be extracted.
- the target sample and the control sample have the same
- the source for example, is preferably peripheral blood of pregnant women, so that the fetus can be subjected to non-invasive prenatal testing and the sample acquisition method is simple.
- samples obtained by an invasive method may also be used, for example, the sample may be derived from fetal cord blood, placental tissue or chorion tissue, uncultured or cultured amniocytes, villous cells, and the like. It can be carried out by means of existing methods, such as a commercially available nucleic acid extraction kit.
- each target individual and N normal individuals can be grouped into one test sample, that is, the total number of test samples is N+1, and the total of M groups is obtained.
- Test samples, each group is tested and calculated according to the provided method, and M target individuals and N normal individuals can be combined to form a set of test samples for detection and calculation, that is, the total number of test samples is N+M.
- the total number of test samples used is N+1.
- the sequencing result of the test sample includes multiple read length sequences (ie, reads, also called “read segments,”).
- any detected or calculated data about the control sample can be pre-generated and saved.
- the data of the preset control sample is used to read the data as needed. Use, the following refers to the comparison of sample data will not be repeated. In other embodiments, the method of simultaneous detection and calculation of the control sample can also be employed.
- a third-generation sequencing platform (Metzker ML.
- Sequencing technologies-the next generation. Nat Rev Genet. 2010 Jan; ll(l): 31-46) may be used, including but not limited to Helicos' real single-molecule sequencing technology ( True Single Molecule DNA sequencing ), Pacific Biosciences' single molecule real-time (SMRTTM), and Life Technologies' semiconductor sequencing technology.
- This embodiment preferably uses a semiconductor sequencing platform from Life Technologies.
- each sample can be tagged with different barcodes for sample differentiation during sequencing (Micah Hamady, Jeffrey J Walker, J Kirk Harris et al. Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex. Nature Methods, 2008, March, Vol. 5 No. 3 ), thereby enabling simultaneous sequencing of multiple samples.
- the tag sequence is used to distinguish between different samples, but does not affect the other functions of the DNA molecule to which the tag sequence is added.
- the tag sequence length can be 4-12 bp.
- the sequencing depth used in obtaining the sequencing result of the test sample is excellent It was selected to be 0.2X, and a small fragment library was used, and the size was preferably 100 to 300 bp. In other embodiments, the sequencing depth may preferably be from 0.1X to 0.3X, and alternatively or alternatively, the library size is preferably from 50 to 500 bp.
- the amount of sequencing data can be reduced to save cost and shorten the time of detection and analysis, and the reliability and accuracy of the detection results can be ensured.
- the sequencing result data required for the analysis is about 5 M, which greatly reduces the cost of data generation and reduces the analysis.
- the difficulty of the calculation makes it possible to complete the analysis process within 24 hours, which helps to shorten the time for feedback.
- the reference sequence is divided into a plurality of windows, and the distribution result of the test sample on the reference sequence appears as the number of read long sequences falling in each window.
- the number of read length sequences in each window is denoted as r(i,j), where i is the number of the window, j is the number of the test sample, and i and j are positive integers.
- r(i,j) the number of read length sequences in each window.
- the reference sequence used is a known sequence and may be any reference template in the bioclass of the target individual to which it is obtained in advance.
- the reference sequence may select a human genome reference sequence in the National Center for Biotechnology Information (NCBI) database.
- the reference sequence is selected as the human genome reference sequence of version 37.3 (hgl9; NCBI Build 37.3) in the NCBI database.
- Dividing the window on the reference sequence may use various methods such that the sequencing result can be effectively counted.
- the window is divided according to a fixed window length and a fixed window spacing, and the fixed window length is preferably 100 Kb, fixed.
- the window spacing is preferably 10 kb or 20 kb.
- different fixed window lengths and fixed window spacings may also be selected.
- the fixed window length is preferably lkb ⁇ 1Mb, and at the same time or alternatively, the fixed window spacing is preferably lkb ⁇ 100kb.
- the length and spacing of the window can be set according to the abundance of fetal DNA in the sample. The principle of setting is that each window corresponds to a statistic and a chromosome position, which means that the distance of the window determines the accuracy of the detection.
- comparison softwares can be used, such as Tmap, BWA (Burrows-Wheeler Aligner), SOAP (Short Oligonucleotide Analysis Package), samtools, etc., which is not limited in this embodiment.
- fault tolerance that is, allowing several base mismatches
- error-tolerant alignment can be used.
- fault-tolerant alignment an average of 100 bp is allowed to have 1-3 fault tolerances.
- fault-tolerant alignment is generally used.
- I is the number of all windows on the reference sequence.
- the subsequent analysis operation is performed using the relative sequence number after the normalization process, in order to highlight the statistical significance of the data itself. In other embodiments, if not used
- Sd(i) is the standard deviation of R(iJ) in the i-th window.
- An optional calculation method is:
- the deviation statistic Z(iJ) represents whether the jth sample has a statistical meaning of missing or repeated in the i-th window.
- Z(ij)>0 tends to repeat
- Z(i , j) ⁇ 0 tends to be missing
- each window's Z(iJ) has a relatively independent statistical significance.
- the deviation statistic Z(iJ) is analyzed and analyzed according to the chromosome to which it belongs, that is, the deviation value Z(c J ) of the Z ( i J ) on the chromosome c of the target sample and the deviation value of the chromosome c Compare,
- cl is the number of the first window on chromosome c of the reference sequence
- cl is the number of all windows on chromosome c of the reference sequence.
- the accumulated value is the same # i ⁇ , corresponding to the number of horizontal adjustment width 5 values.
- mutation detection according to the embodiment of the present invention can be objectively used for judging chromosome aneuploidy, it is further used to detect the resulting genetic diseases, such as the Down syndrome of the fetus, Edwards syndrome. Etc.
- mutation detection according to embodiments of the present invention is not necessarily used for disease diagnosis or related purposes. For example, the presence of some chromosomal variations does not represent a disease risk or a health condition, or may be used for simple genetics. State science research.
- the deviation threshold is set according to the deviation statistics of all normal individuals on chromosome c. As mentioned above, since the deviation threshold is obtained from the control sample, it can be pre-calculated and saved. When the target individual is detected later, the same threshold setting can be used as long as the set of the control samples does not change. Of course, if you reduce, replace or add a control sample, you need to update the corresponding deviation threshold.
- a preferred threshold setting method used in the present embodiment includes the following steps:
- N is preferably not less than 30.
- the inspection rules can be selected according to the number of control samples and the required detection accuracy, and the corresponding confidence can be set, which can be performed according to the existing statistical data processing method. In the present embodiment, it is preferable to use the U test with a confidence level of 95%, and there is an advantage of "no false negative" at this confidence level. In other embodiments, other inspection rules such as T-test may be selected, and at the same time or alternatively, the confidence may be selected from 90% to 99.9%, such as 99%, 99.5%, 99.9%, and the like.
- a set of deviation thresholds obtained according to the above setting manner is as follows, wherein the format of the recorded data is (chromosome number; lower limit of the threshold; upper limit of the threshold):
- a method for detecting chromosome aneuploidy is provided, and the basic steps are the same as those of Embodiment 1, except that in Embodiment 1, the window is divided according to a fixed window length and a fixed window spacing. In the embodiment, the window is divided in the same manner as the number of unique alignment sequences included in each window.
- the unique alignment sequence is a sequence that specifies the unique position of the bit to the reference sequence.
- the window is divided by using the method of "the only number of identical alignment sequences included"
- the sequencing result of the test sample is compared to the reference sequence.
- This type of window can reduce the influence of repeated sequences and sputum areas on the detection results, and improve the reliability of detection.
- the method for dividing a window according to the same number of unique alignment sequences included in each window provided by this embodiment, referring to FIG. 2, includes the following steps:
- This set of base sequences can be obtained either by whole genome sequencing of a known sample (such as one of the above control samples) or by cutting the reference sequence by the length of the cut.
- the selected known samples can be subjected to deep sequencing, and the sequenced read length sequence is used as the group.
- Known base sequence Preferably, the library construction and sequencing methods can be selected such that the length of the obtained base sequence is comparable to the length of the read length sequence obtained by sequencing the test sample.
- the length of the cut can be determined first by simulating the generation of the set of known base sequences by cutting the reference sequence, and the length of the cut can usually be determined by the length of the read length sequence obtained by sequencing the test sample.
- the length of the cut can be a fixed length value close to the length of the read sequence of the test sample.
- the cut length can be selected from 200 to 300 bp.
- the reference sequence is then cut according to the length of the cut, for example by cutting the HG18 or HG19 according to the selected reference sequence.
- the set of known base sequences are aligned back to the reference sequence to obtain a distribution of unique aligned sequences.
- the adjacent K unique alignment sequences are divided into a group, thereby dividing a window covering each set of unique alignment sequences, and K is a positive integer.
- a method for detecting chromosomal aneuploidy the basic steps being the same as in Embodiment 1 or 2, except that the uncorrected relative sequence number R is used in Example 1 or 2 ( iJ) to calculate the deviation statistic Z(iJ), and in the present embodiment, R(iJ) is corrected before calculating Z(iJ).
- R(iJ) is denoted as Ra (hereinafter). i, j).
- R(iJ) it is preferred to correct R(iJ) according to the GC (guanine Guanine and cytosine Cytosine) content in each window of each test sample, so that the obtained Ra(i,j) has or is approximately positive.
- GC gallium Guanine and cytosine Cytosine
- R(iJ) can be normalized based on the GC content of the test sample in each window, so that Ra(iJ) has, for example, a statistical law that approximately conforms to the normal distribution.
- the distribution of R(iJ) (or Ra(iJ) ) is defined as the numerical value of R(iJ) as the abscissa and the number of R(iJ) windows containing the same value as the ordinate, the described R ( The distribution of the values of iJ).
- the so-called "identical value” means that the value is in the same gear range.
- a method for correcting R(iJ) according to the GC content provided by the embodiment, referring to FIG. 3, includes the following steps:
- the GC content of the test sample in each window can be calculated based on the sequencing results. Both the target sample and the normal sample can be corrected based on the GC content. As described above, the relevant data of the normal sample can be obtained and processed in advance.
- the same GC content means that the GC content value is in the same gear range.
- the span of the gear range is preferably 0.001.
- the span of the gear range may also preferably be 0.0005 to 0.005.
- the ratio of the median to the target value as the correction coefficient e(GC) 0 target value under the corresponding GC content is generally selected to be a value capable of representing the average number level, for example, in the present embodiment, the sequencing sample is preferably in all windows ( The average of R(iJ) including all chromosomes.
- a chromosomal aneuploidy detecting apparatus includes: a data input unit for inputting data; a data output unit for outputting data; and a storage unit for storing data, including Executed program; processor, with the above data input element number ⁇ output and storage ⁇ element number, connected:, J is stored in the execution unit or part of the step.
- Reference sequence Human genome reference sequence of version 37.3 (hgl9; NCBIBuild37.3) in the NCBI database,
- the window length is 100Kb and the window spacing is 20kb.
- Target sample 4 pregnant women's plasma
- control sample A set of control samples for determining the deviation thresholds listed in Example 1.
- the detection process is:
- the Snova DNA extraction kit (SnoMag Circulating DNA Kit) was used to extract the DNA of the above 4 plasma samples (the target individual number is shown in the attached table), and the extracted DNA was tested according to the stable proton database.
- a sequencing linker is added to the 170 bp DNA molecule in the fragment main band, and each target sample is added with a different tag sequence at the linker connection to facilitate differentiation.
- the constructed library (mainly about 250 bp) was PCR-injected into a water-in-oil state to form a single molecule particle.
- Sequencing DNA samples obtained from the above 4 samples were processed according to the Ion Proton instructions published by Life Technologies, and were sequenced on the machine. Each sample was distinguished according to the label sequence. Using the comparison software Tmap (obtained from Life Technologies' home page), the sequencing results were compared with the reference sequence for error-tolerant alignment, and the sequencing results were located on the reference sequence.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Organic Chemistry (AREA)
- Biophysics (AREA)
- Medical Informatics (AREA)
- Biotechnology (AREA)
- Analytical Chemistry (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Theoretical Computer Science (AREA)
- Genetics & Genomics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Microbiology (AREA)
- Pathology (AREA)
- Molecular Biology (AREA)
- Biochemistry (AREA)
- Artificial Intelligence (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Immunology (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
一种染色体非整倍性检测方法及装置,其中方法包括:获取测试样本的测序结果在参考序列上的分布情况,即落在参考序列上划分的每个窗口中的读长序列的数目,其中测试样本包含来自目标个体的目标样本和来自正常个体的对照样本;进而计算每个目标样本在每个窗口中的偏差统计量;将目标样本的某个染色体上的偏差统计量的平均值与相应的偏差阈值进行比较,根据比较结果判断该染色体是否缺失或重复,其中偏差阈值按照全部正常个体在该染色体上的偏差统计量设置。
Description
一种染色体非整倍性检测方法及装置 技术领域
本发明涉及基因组学及生物信息学技术领域, 具体涉及染色体非整 倍性检测方法及装置。
背景技术
染色体是组成细胞核的基本物质。 正常人的体细胞染色体数目为 46 条, 并有一定的形态和结构。 核型通常指染色体的表型特征, 例如数量、 长度等, 核型检测能够在较大尺度上反映染色体的异常情况, 例如某个 染色体的非整倍性 (缺失或重复) , 在遗传学研究上具有重要作用, 例 如对胎儿的染色体的核型检测有助于降低出生风险。
目前常用的胎儿产前检测技术分为无创产前技术和有创产前技术。 其中, 无创产前技术包括: 1 )利用曱胎蛋白 (AFP ) 、 游离雌三醇( β -HCG ) 和妊娠相关蛋白 (ΡΑΡΡ-Α ) 等血清标记物对孕妇血清与尿液成 分进行检测, 以计算唐氏综合症风险; 2 )借助物理学的方法对胎儿进行 可视化筛查, 例如 Β超、 X线、 CT、 磁共振等; 3 )对配子或移入到子 宫腔之前的胚胎进行遗传学分析的植入前遗传学诊断 (PGD ) 等。 有创 产前技术包括孕早期的绒毛活组织检查、 孕中期的胎儿脐带血穿刺、 羊 水穿刺、 胎儿镜检查及胚胎活检等。
目前无创产前技术的检测结果不够可靠, 假阳性率和假阴性率都偏 高, 而有创产前技术准确率高, 但容易给孕妇和胎儿胎来风险, 例如导 致流产或羊膜腔炎。
发明内容
依据本发明的一方面提供一种染色体非整倍性检测方法, 包括如下 步骤: 获取测试样本的测序结果在参考序列上的分布情况, 测试样本包 含来自 M个目标个体的目标样本和来自 N个正常个体的对照样本, M 和 N为正整数, 测序结果包括多个读长序列, 参考序列上划分有多个窗 口, 测试样本的测序结果在参考序列上的分布情况表现为落在每个窗口 中的读长序列的数目 r(i,j), 其中 i为窗口的编号, j为测试样本的编号, i和 j为正整数; 计算每个测试样本在每个窗口中的相对序列数 R(iJ) = r(i,j) / rpG) , 其中 rpG)为第 j个样本的 r(ij)的平均值; 计算每个目标样本 在每个窗口中的偏差统计量 Z(i,j) = [R(iJ) 圍 mean(i)] / sd(i),其中 mean(i) 为第 i个窗口中 R(iJ)的平均值, sd(i)为第 i个窗口中 R(iJ)的标准差; 将 目标样本的第 c号染色体上的 Z(i J)的平均值 Zp(c J)与第 c号染色体的偏 差阔值进行比较, 根据比较结果判断该目标样本的第 c号染色体是否缺 失或重复, 其中偏差阔值按照全部正常个体在第 c号染色体上的偏差统
计量设置。
依据本发明的另一方面提供一种染色体非整倍性检测装置, 包括: 数据输入单元, 用于输入数据; 数据输出单元, 用于输出数据; 存储单 元, 用于存储数据, 其中包括可执行的程序; 处理器, 与数据输入单元、 数据输出单元及存储单元数据连接, 用于执行存储单元中存储的可执行 的程序, 该程序的执行包括完成上述染色体非整倍性检测方法。
依据本发明的再一方面提供一种计算机可读存储介质, 用于存储供 计算机执行的程序, 本领域普通技术人员可以理解, 在执行该程序时, 通过指令相关硬件可完成上述染色体非整倍性检测方法的全部或部分步 骤。 所称存储介质可以包括: 只读存储器、 随机存储器、 磁盘或光盘等。
依据本发明的方法通过对偏差统计量的设计反映出测试样本与参考 染色体缺失或重复进行判断, 提供了一种基于测序技术的染色体非整倍 性检测手段, 能够灵敏地检测任意染色体的数目异常。
附图说明
本发明的上述和 /或附加的方面和优点从结合下面附图对实施方式 的描述中将变得明显和容易理解, 其中:
图 1是依据本发明的一种实施方式的检测方法的流程示意图; 图 2是依据本发明的另一种实施方式中的窗口划分方法的流程示意 图;
图 3是依据本发明的另一种实施方式中的 GC校正方法的流程示意 图。
具体实施方式
实施例 1
依据本发明的一种实施方式, 提供一种染色体非整倍性检测方法, 参考图 1 , 包括如下步骤:
101. 获取测试样本的测序结果在参考序列上的分布情况。
( 1 )测试样本包含来自 M个目标个体的目标样本和来自 N个正常 个体的对照样本, M和 N为正整数。
目标个体指需要进行检测的个体, 例如需要进行产前检测的孕妇, 正常个体指预先确定的正常的个体。 通常而言, 目标个体与正常个体为 同一物种, 优选地, 具有近似的基本状态, 例如, 若目标个体为孕妇, 则正常个体可以是孕周接近的怀有正常胎儿的正常孕妇。
本实施方式中, 目标样本和对照样本的来源不受特别限制, 例如可 以选自: 孕妇外周血、 孕妇尿液、 孕妇宫颈胎儿脱落滋养细胞、 孕妇宫 颈粘液、 胎儿有核红细胞等, 只要能够从中提取出含有胎儿遗传信息的 核酸样本即可。 本实施方式中, 优选目标样本和对照样本具有相同的来
源, 例如优选为孕妇外周血, 这样可以对胎儿进行无创产前检测且样本 获取方式简便。 由于样本中除胎儿核酸外还包含孕妇自身核酸, 因此为 避免干扰检测结果, 孕妇本身应当无染色体非整倍性问题, 当然, 这种 判断通常是十分明显的。 在其他实施方式中, 也可以使用有创方法获得 的样本, 例如样本可以来自胎儿的脐带血、 胎盘组织或绒毛膜组织、 未 培养或培养过的羊水细胞、 绒毛组细胞等。 定, 可 U釆用 种已有的手段进行, 例如商品化的核酸提取试剂盒。
需要说明的是, 若目标个体有两个以上, 即 M > 2 , 可以分别将每 个目标个体与 N 个正常个体组成一组测试样本, 即测试样本的总数为 N+1 ,共获得 M组测试样本,每组分别按照所提供方法进行检测和计算, 也可以将 M个目标个体与 N个正常个体组成一组测试样本进行检测和 计算, 即测试样本的总数为 N+M。 本实施方式中优选釆用测试样本的总 数为 N+1的方案。
( 2 ) 测试样本的测序结果包括多个读长序列 (即 reads , 也称 "读 段,, )。
由于正常个体是预先选择确定的, 因此关于对照样本的任何检测或 计算数据均可预先产生并保存下来, 本实施方式中釆用这种预置对照样 本的相关数据的方式, 在需要时读取使用, 以下涉及对照样本数据时不 再赘述。 在其他实施方式中, 也可以釆用对照样本同步检测和计算的方 式。 ' , 、 、 、 、 、 、 、 、 ,、, 、 、 、 , - ,雨 常会将提取自样本的核酸进行打断, 并根据所选用的测序方法进行相应 的文库 (library ) 制备, 然后进行测序。 例如, 可选用第三代测序平台 ( Metzker ML. Sequencing technologies-the next generation. Nat Rev Genet. 2010 Jan;l l(l):31-46 ), 包括但不限于 Helicos公司的真实单分子 测序技术 ( True Single Molecule DNA sequencing ), Pacific Biosciences 公司的单分子实时测序( single molecule real-time (SMRTTM) ),以及 Life Technologies 公司的半导体测序技术等。 本实施方式优选釆用 Life Technologies公司的半导体测序平台。 当需要同时检测多个目标样本时, 每个样本可以被加上不同的标签序列 (barcode ), 以用于在测序过程中 进行样本的区分 ( Micah Hamady, Jeffrey J Walker, J Kirk Harris et al. Error-correcting barcoded primers forpyrosequencing hundreds of samples in multiplex. Nature Methods, 2008, March, Vol.5 No.3 ), 从而实现同时对 多个样本进行测序。 标签序列用于区分不同样本, 但不影响添加标签序 列的 DNA分子的其他功能。 标签序列长度可以是 4-12bp。
本实施方式中, 在获取测试样本的测序结果时所使用的测序深度优
选为 0.2X, 并且使用小片段文库, 大小优选为 100 ~ 300bp。 在其他实施 方式中, 测序深度可优选为 0.1X ~ 0.3X, 同时地或可选地, 文库大小优 选为 50 ~ 500bp。 使用上述优选的各种低测序深度以及小片段文库, 既 能够减少测序的数据量以节省成本和缩短检测及分析的时间, 又能够保 证检测结果的可靠性和准确性。 例如, 在一种实施方式中, 釆用 0.2X的 测序深度和大小约为 lOObp的文库, 可使得所需要分析的测序结果数据 在 5M左右, 大大减小了数据产生的成本, 也降低了分析计算的难度, 使得在 24小时内完成分析过程成为可能, 有助于缩短结果反馈的时间。
( 3 )参考序列上划分有多个窗口, 测试样本的测序结果在参考序列 上的分布情况表现为落在每个窗口中的读长序列的数目。
简明起见, 将每个窗口中的读长序列的数目记为 r(i,j) , 其中 i为窗 口的编号, j为测试样本的编号, i和 j为正整数。 如前所述, 对于对照 样本而言, 其 r(ij)可以是预先测定并保存的。
所使用的参考序列是已知序列, 可以是预先获得的目标个体所属生 物类别中的任意的参考模板。 例如, 若目标个体是人类, 参考序列可选 择美国国家生物技术信息中心 ( NCBI, national center for biotechnology information )数据库中的人类基因组参考序列。 本实施方式中, 参考序 列选择为 NCBI数据库中版本 37.3 ( hgl9; NCBI Build 37.3 ) 的人类基 因组参考序列。
在参考序列上划分窗口可以使用各种使得测序结果能够被有效统计 的方式, 例如, 本实施方式中, 按照固定的窗口长度和固定的窗口间距 划分窗口, 固定的窗口长度优选为 100Kb, 固定的窗口间距优选为 10kb 或 20kb。 在其他实施方式中, 也可以选择不同的固定的窗口长度和固定 的窗口间距, 例如固定的窗口长度优选为 lkb ~ 1Mb , 同时地或可选地 , 固定的窗口间距优选为 lkb ~ 100kb。 窗口长度和间距可根据样本中胎儿 DNA的丰度进行设置,设置原理是每一个窗口对应一个统计量及一个染 色体位置, 这意味着窗口的距离决定了检测的精度。
在将测序结果比对到参考序列时,可使用各种比对软件,例如 Tmap, BWA ( Burrows-Wheeler Aligner ), SOAP ( Short Oligonucleotide Analysis Package ), samtools 等, 本实施方式对此不作限定。 根据比对软件, 可 釆用容错(即允许有若干个碱基错配(mismatch ) )或不容错比对, 釆用 容错比对时, 一般平均 100bp允许有 1 ~ 3个容错。 在釆用 Proton平台 测序时, 一般釆用容错比对。
102. 计算每个测试样本在每个窗口中的相对序列数。
简明起见, 将每个测试样本在每个窗口中的相对序列数记为 R(i,j) ,
R(ij) = r(ij) I rpG)
其中, rpG)为第 j个样本的 r(i,j)的平均值, 例如可表示为,
rpG) = [r(l ,J) + … + r(Ij)]/I
其中, I为参考序列上全部窗口的数目。
据分析, 只是在进行数值的分析、 计算和比较时使用未归一化的数值水 平, 均应当视为本实施方式的等同。 以下所涉及计算过程也均可以釆用 在数学或统计上等同或近似的方法对公式或算法进行变化, 同样应视为 等同, 不再赘述。 本实施方式不限制于具体计算公式的表达形式。
103. 计算每个目标样本在每个窗口中的偏差统计量。
简明起见, 将每个目标样本在每个窗口中的偏差统计量记为 Z(i,j) , Z(i,j) = [R(i,j) - mean(i)] I sd(i)
其中, mean(i)为第 i个窗口中 R(iJ)的平均值, 例如可表示为, mean(i) = [R(i,l) + ... + R(i,J)]/J
其中, J为全部测试样本的数目。 本实施方式中, J=l+N。 在其他实 施方式中, 若测试样本同时包含 M个目标样本, 则 J=M+N。
偏差统计量 Z(iJ)代表了第 j个样本在第 i个窗口是否出现了缺失或 重复的统计含义, 在当前的计算公式表现形式下, Z(ij)>0倾向于重复, Z(i,j)<0倾向于缺失, 每个窗口的 Z(iJ)具有相对独立的统计意义。
104. 将目标样本的某个染色体上的偏差统计量的平均值与相应的 偏差阈值进行比较。
( 1 ) 偏差统计量 Z(iJ)按照所属染色体进行分析比对, 即将目标样 本的第 c号染色体上的 Z ( i J )的平均值 Zp ( c J )与第 c号染色体的偏差阔值 进行比较,
Zp(cj) = [Z(cl,j) + ... + Z(cI-cl+lj)]/cI
其中, cl为参考序列的第 c号染色体上第一个窗口的编号, cl为参 考序列的第 c号染色体上全部窗口的数目。 例如累加值也是 同的 #i^, 相应调整阔值的数5 水平即可。 '
( 2 )根据比较结果判断该目标样本的第 c 号染色体是否缺失或重 复。 例如, 若 Zp(cJ)超过偏差阔值上限, 则可认为目标样本 j的第 c号 染色体出现重复 (例如 3体), 若 Zp(cJ)低于偏差阔值下限, 则可认为 目标样本 j的第 c号染色体出现缺失(例如单体), 由此可以给出目标样
本的数字化核型分析结果, 例如 "第 21号染色体 3体"、 "第 18号染色 体 3体"、 "第 13号染色体 3体"、 "X染色体缺失"、 "Y染色体缺失"等。
需要说明的是, 虽然依据本发明实施方式的变异检测的结果客观上 能够用于判断染色体非整倍性, 进而用于检测由此导致的遗传疾病, 例 如胎儿的唐氏综合症、 爱德华综合症等, 但是依据本发明实施方式的变 异检测也并不一定用于疾病诊断或相关的目的, 例如一些染色体变异的 存在并不代表着患病风险或健康状况, 或者也可以用于单纯的遗传多态 性科学研究。
( 3 )偏差阔值按照全部正常个体在第 c号染色体上的偏差统计量设 置。 如前所述, 由于偏差阔值是由对照样本得到的, 因此可以预先计算 和保存, 在后续对目标个体进行检测时, 只要对照样本的集合不变, 均 可使用相同的阔值设置。 当然若减少、 更换或增加对照样本则需要更新 相应的偏差阔值。 本实施方式所釆用的一种优选的阔值设置方式, 包括 如下步骤:
( 3.1 )以 N个正常个体的对照样本作为全部测试样本,计算每个对 照样本的 Zp(c,j)。 具体计算过程可参考上述步骤中的描述进行, 只是测 试样本中不再包含目标样本, 因此在设置偏差阔值时, 全部测试样本的 数目为 N。 为使获得的偏差阔值具有更好的可靠性, 本实施方式中, N 优选为不小于 30。
( 3.2 ) 按照设定的检验规则和置信度计算判断为正常所对应的 Zp(cJ)值边界, 以此作为第 c号染色体的偏差阔值。 可根据对照样本的 数目以及所需要的检测精度等选择检验规则并设置相应的置信度, 具体 可按照已有的统计数据处理方式进行。本实施方式中,优选釆用 U检验, 置信度为 95%, 在此置信度下具有 "没有假阴性" 的优点。 在其他实施 方式中, 也可选择 T检验等其他检验规则, 同时地或可选地, 置信度可 选择为 90% ~ 99.9%, 例如 99%、 99.5%、 99.9%等。
本实施方式中, 依据上述设置方式得到的一组偏差阔值如下所列, 其中记录数据的格式为 (染色体编号; 阔值下限; 阔值上限):
( 1 ; -0.1417365 ; 0.1417365 ) ( 2; -0.09237466; 0.09237466 ) ( 3 ; -0.1250404; 0.1250404 ) ( 4; -0.1265542; 0.1265542 ) ( 5 ; -0.08148388; 0.08148388 ) ( 6; -0.119122; 0.119122 ) ( 7 ; -0.1061317 ; 0.1061317 ) ( 8; -0.1155915 ; 0.1155915 ) ( 9; -0.1004392; 0.1004392 ) ( 10; -0.1106214; 0.1106214 ) ( 11 ; -0.09819914; 0.09819914 ) ( 12; -0.09005814; 0.09005814 ) ( 13 ; -0.1779642; 0.1779642 ) ( 14; -0.1436377 ; 0.1436377 ) ( 15 ; -0.1478246; 0.1478246 ) ( 16; -0.1764641 ; 0.1764641 ) ( 17 ; -0.147383 ; 0.147383 ) ( 18; -0.1891044; 0.1891044 )
( 19; -0.3332986; 0.3332986 ) ( 20; -0.206487; 0.206487 )
( 21 ; -0.2573099; 0.2573099 ) ( 22; -0.2096556; 0.2096556 ) ( X-男胎; -0.823347; 0.823347 ) ( X-女胎; -0.285388; 0.285388 ) ( Y-男胎; -1.228768; 1.228768 ) ( Υ-女胎; -1.217151 ; 1.217151 ) 实施例 2
依据本发明的另一种实施方式,提供一种染色体非整倍性检测方法, 基本步骤与实施例 1相同, 区别在于实施例 1中釆用按照固定的窗口长 度和固定的窗口间距划分窗口的方式, 而本实施方式中釆用按照每个窗 口中包含的唯一比对序列数相同的方式划分窗口。
唯一比对序列是指定位到参考序列唯一位置的序列, 在使用 "包含 的唯一比对序列数相同" 的方式来划分窗口的情况下, 在将测试样本的 测序结果比对到参考序列时,相应的也可以只统计唯一比对的读长序列 , 而放弃不能唯一比对的读长序列。 这种类型的窗口能够降低重复序列以 及 Ν区等对检测结果的影响, 提高检测的可靠性。
本实施方式提供的一种按照每个窗口中包含的唯一比对序列数相同 的方式划分窗口的方法, 参考图 2 , 包括如下步骤:
201. 获取一组已知的碱基序列。
这组碱基序列既可以通过对某个已知样本(例如上述对照样本中的 一个) 进行全基因组测序获得, 也可以通过按照切割长度切割参考序列 获得。
在釆用实际测序的方式来获得这组已知的碱基序列时, 为了获得足 够多的碱基序列, 可以对选定的已知样本进行深度测序, 以测序得到的 读长序列作为这组已知的碱基序列。 优选地, 可以选择建库和测序方法 使得获得的碱基序列的长度与测试样本进行测序所得到的读长序列的长 度相当。
在釆用切割参考序列的方式来模拟生成这组已知的碱基序列时, 可 以先确定切割长度, 通常可以按照对测试样本进行测序所获得的读长序 列的长度来确定切割长度。 例如, 切割长度可以是与测试样本的读长序 列长度接近的固定长度值, 例如若测试样本的读长序列约为 250bp, 则 可选择切割长度为 200 ~ 300bp。 然后按照切割长度切割参考序列, 例如 根据选定的参考序列切割 HG18或 HG19。
202. 将这组已知的碱基序列比对回参考序列,获得唯一比对序列的 分布情况。
203. 划分窗口。
例如, 将相邻的 K个唯一比对序列划分为一组, 以此划分覆盖每组 唯一比对序列的窗口, K为正整数。
实施例 3
依据本发明的另一种实施方式,提供一种染色体非整倍性检测方法, 基本步骤与实施例 1或 2相同, 区别在于实施例 1或 2中釆用未经校正 的相对序列数 R(iJ)来计算偏差统计量 Z(iJ), 而本实施方式中则在计算 Z(iJ)之前先对 R(iJ)进行校正, 简明起见, 以下将校正后的 R(iJ)记为 Ra(i,j)。
本实施方式中, 优选按照每个测试样本在每个窗口中的 GC (鸟嘌 呤 Guanine和胞嘧啶 Cytosine )含量对 R(iJ)进行校正, 使得到的 Ra(i,j) 具有或近似具有正态分布, 在计算 Z(iJ)时, 使用 Ra(iJ)。 这是因为客观 看来, 染色体非整倍性 (缺失或重复)对覆盖范围内的窗口的影响应当 是一致的, 测得的统计量 R(iJ)应当满足统计学的常见分布, 例如正态或 标准正态分布。 而根据已有的研究结果, GC含量会影响实际测序结果, 例如高和低 GC含量的区域的读长序列数量会低于中间态 GC含量的区 域的读长序列数量, 这主要和测序过程中使用的建库方法有关。 因此, 为使的检测结果更加可靠, 可以根据测试样本在每个窗口中的 GC含量 对 R(iJ)进行标准化校正,使得 Ra(iJ)具有例如近似符合正态分布的统计 规律。 所称 R(iJ) (或 Ra(iJ) ) 的分布是指, 以 R(iJ)的数值为横坐标, 含有相同数值的 R(iJ)的窗口的数目为纵坐标, 所描述的 R(iJ)的数值的 分布情况。 所称 "相同数值" 是指取值在同一档位区间中。
本实施方式提供的一种按照 GC含量对 R(iJ)进行校正的方法,参考 图 3 , 包括如下步骤:
301. 计算测试样本的 GC含量。
对于一个测试样本, 可根据测序结果计算该测试样本在每个窗口中 的 GC含量。 目标样本与正常样本都可以进行基于 GC含量的校正, 如 前所述, 正常样本的相关数据可以预先获得与处理。
302. 统计相同 GC含量的窗口中的 R(iJ)的中位数。
所称相同 GC含量指 GC含量值在同一档位区间中, 例如本实施方 式中, 档位区间的跨度优选为 0.001。 在其他实施方式中, 档位区间的 跨度也可优选为 0.0005 ~ 0.005。
303. 计算校正系数 s(GC)。
通常, 以中位数与目标值的比值作为相应 GC 含量下的校正系数 e(GC)0 目标值通常选择能够代表平均数量水平的值,例如本实施方式中 优选为该测序样本在全部窗口 (包括全部染色体) 的 R(iJ)的平均值。
304. 将 R(iJ)乘以 e(GC)得到校正后的 R(iJ)。 例如, 可表示为, Ra(ij) = s(GC) χ R(ij)
显然若直接对 r(i,j)进行 GC校正也是可以的, 是与上述校正过程等 同的方法。
本领域普通技术人员可以理解, 上述实施方式中各种方法的全部或 部分步骤可以通过程序来指令相关硬件完成, 该程序可以存储于一计算 机可读存储介质中, 存储介质可以包括: 只读存储器、 随机存储器、 磁 盘或光盘等。
依据本发明的另一方面还提供一种染色体非整倍性检测装置,包括: 数据输入单元, 用于输入数据; 数据输出单元, 用于输出数据; 存储单 元, 用于存储数据, 其中包括可执行的程序; 处理器, 与上述数据输入 元 数 ^输出 及存储 ^元数,连接:、、 J 于执行存储单元中存储的 或部分步骤。
行详细的描述。 下述检测过程所使用的具体参数设置为:
1. 釆用实施例 3检测方法, 其中窗口设置釆用实施例 1的方式,
2. 参考序列: NCBI数据库中版本 37.3 ( hgl9; NCBIBuild37.3 ) 的 人类基因组参考序列,
3. 窗口长度 100Kb , 窗口间距 20kb ,
4. 目标样本: 4例孕妇血浆, 对照样本: 确定实施例 1中所列偏差 阔值的一组对照样本。
检测过程为:
1. DNA 提取与建库: 使用 Snova DNA 提取试剂盒 ( SnoMag Circulating DNA Kit )提取上述 4例血浆样品(目标个体编号见附表 )的 DNA, 所提取 DNA按照测试稳定后的 proton建库流程进行建库, 在片 段主带集中在 170bp的 DNA分子两端加上测序接头, 每个目标样本在 接头连接时被加上不同的标签序列, 以便于区分。 建好的文库 (主带约 为 250bp )被 emulsion PCR成油包水状态, 形成包裹单分子颗粒。
2. 测序:对于获自上述 4例血浆的 DNA样本按照 Life Technologies 官方公布的 Ion Proton说明书进行操作, 进行上机测序, 每个样本根据 标签序列进行区分。 利用比对软件 Tmap (获自 Life Technologies公司主 页),将测序结果与参考序列进行不容错比对,得到测序结果在参考序列 上的定位。
3. 数据分析: 计算每个目标样本的 Zp(cJ) (每个目标样本分别与对 照样本集组成一组测试样本), 并使用相应的偏差阔值进行过滤,获得超 过阔值的检测结果。、 ― 、' 、 。 ' 、 , , 、 , 分析(包括羊水穿刺、 细胞培养、 染色、 分带等过程), 将分析结果与步 骤 3中的结果进行比对, 如下表所示:
目标个 染色体 标准的核型 依据本发明方
结论 体编号 编号 分析结果 法的检测结果
CQPT01 21 47,XY,+21 47,ΧΥ,+21 一致
CQPT02 18 47,ΧΧ,+18 47,ΧΧ,+18 一致
CQPT03 13 47,ΧΥ,+ 13 47,ΧΥ,+ 13 一致
CQPT04 X 45,ΧΟ 45,ΧΟ 一致 以上所述仅为本发明的较佳实施例, 应当理解, 这些实施例仅用以 解释本发明, 并不用于限定本发明。 对于本领域的一般技术人员, 依据 本发明的思想, 可以对上述具体实施方式进行变化。
Claims
1. 一种染色体非整倍性检测方法, 其特征在于, 包括如下步骤, 获取测试样本的测序结果在参考序列上的分布情况, 所述测试样本 包含来自 M个目标个体的目标样本和来自 N个正常个体的对照样本, M 和 N为正整数, 所述测序结果包括多个读长序列, 所述参考序列上划分 有多个窗口, 所述分布情况表现为落在每个窗口中的读长序列的数目 r(i,j), 其中 i为窗口的编号, j为测试样本的编号, i和 j为正整数;
计算每个测试样本在每个窗口中的相对序列数 R(iJ) = r(i,j) I rp(j), 其中 rpG)为第 j个样本的 r(ij)的平均值;
计算每个目标样本在每个窗口中的偏差统计量 Z(iJ) = [R(iJ) - mean(i)] I sd(i), 其中 mean(i)为第 i个窗口中 R(iJ)的平均值, sd(i)为第 i 个窗口中 R(iJ)的标准差;
将目标样本的第 c号染色体上的 Z(iJ)的平均值 Zp(c J)与第 c号染色 体的偏差阔值进行比较, 根据比较结果判断该目标样本的第 c号染色体 是否缺失或重复, 所述偏差阔值按照全部所述正常个体在第 c号染色体 上的偏差统计量设置。
2. 如权利要求 1所述的方法, 其特征在于, 所述目标样本和对照样 本的来源选自以下至少一种: 孕妇外周血、 孕妇尿液、 孕妇宫颈胎儿脱 落滋养细胞、 孕妇宫颈粘液和胎儿有核红细胞;
所述目标样本和对照样本的来源优选为孕妇外周血。
3. 如权利要求 1所述的方法, 其特征在于, 所述窗口的划分方式选 自: 按照固定的窗口长度和固定的窗口间距划分窗口, 按照每个窗口中 包含的唯一比对序列数相同的方式划分窗口;
所述固定的窗口长度优选为 lkb ~ 1Mb, 进一步优选为 100Kb; 和 / 或,
所述固定的窗口间距优选为 lkb ~ lOOkb, 进一步优选 5kb ~ 20kb, 更优选为 10kb。
4. 如权利要求 3所述的方法, 其特征在于, 所述按照每个窗口中包 含的唯一比对序列数相同的方式划分窗口, 包括如下步骤:
获取一组已知的碱基序列, 所述已知的碱基序列通过对已知样本进 行测序 ,、'或者 按照切,长度切割参考 列获得 , 所述切割长度 将所述已知的读长序列比对回参考序列, 获得唯一比对序列的分布 情况,
将相邻的 K个唯一比对序列划分为一组, 以此划分覆盖每组唯一比 对序列的窗口, K为正整数。
5. 如权利要求 1 所述的方法, 其特征在于, 在计算 Z(iJ)之前, 还 包括如下步骤:按照每个测试样本在每个窗口中的 GC含量对 R(iJ)进行 校正, 使得校正后的 R(iJ)具有或近似具有正态分布, 在计算 Z(iJ)时, 使用所述校正后的 R(i,j:)。
6. 如权利要求 5所述的方法, 其特征在于, 所述对 R(iJ)进行校正 包括如下步骤:
对于一个测试样本, 根据测序结果计算该测试样本在每个窗口中的 GC含量,
统计相同 GC含量的窗口中的 R(iJ)的中位数, 所述相同 GC含量指 GC含量值在同一档位区间中, 所述档位区间的跨度为 0.0005 ~ 0.005 , 优选为 0.001 ,
以所述中位数与目标值的比值作为相应 GC 含量下的校正系数 e(GC), 所述目标值优选为该测试样本在全部窗口的 R(iJ)的平均值, 将 R(iJ)乘以 e(GC)得到校正后的 R(i,j)。
7. 如权利要求 1所述的方法, 其特征在于, 在获取测试样本的测序 结果时所使用的测序深度为 0.1X ~ 0.3X, 优选为 0.2X; 和 /或,
在对测试样本进行测序时所构建的测序文库大小为 50 ~ 500b , 优 选为 100 ~ 300bp。
8. 如权利要求 1-7任意一项所述的方法, 其特征在于, 所述偏差阔 值的设置包括如下步骤:
以所述 N个正常个体的对照样本作为全部测试样本, 计算每个对照 样本的 Zp(c,j),
按照设定的检验规则和置信度计算判断为正常所对应的 Zp(cJ)值边 界, 以此作为第 c号染色体的偏差阔值;
所述检验规则优选为 U检验; 和 /或,
所述置信度优选为 90% ~ 99.9%, 进一步优选为 95%; 和 /或, 所述 N优选为不小于 30。
9. 如权利要求 1-7 任意一项所述的方法, 其特征在于, 所述 sd(i) 按照如下方式计算: , 其中, J为全部测试样本的数目。
10. 一种染色体非整倍性检测装置, 其特征在于, 包括:
数据输入单元, 用于输入数据;
数据输出单元, 用于输出数据;
存储单元, 用于存储数据, 其中包括可执行的程序;
处理器, 与所述数据输入单元、数据输出单元及存储单元数据连接,
用于执行所述可执行的程序, 所述程序的执行包括完成如权利要求 1-9 任意一项所述的方法。
11. 一种计算机可读存储介质, 其特征在于, 用于存储供计算机执 行的程序, 所述程序的执行包括完成如权利要求 1-9任意一项所述的方 法。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP13889435.7A EP3023504B1 (en) | 2013-07-17 | 2013-07-17 | Method and device for detecting chromosomal aneuploidy |
US14/905,617 US20160154931A1 (en) | 2013-07-17 | 2013-07-17 | Method and device for detecting chromosomal aneuploidy |
PCT/CN2013/079495 WO2015006932A1 (zh) | 2013-07-17 | 2013-07-17 | 一种染色体非整倍性检测方法及装置 |
CN201380004733.6A CN104520437B (zh) | 2013-07-17 | 2013-07-17 | 一种染色体非整倍性检测方法及装置 |
HK15109588.2A HK1208888A1 (zh) | 2013-07-17 | 2015-09-29 | 種染色體非整倍性檢測方法及裝置 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2013/079495 WO2015006932A1 (zh) | 2013-07-17 | 2013-07-17 | 一种染色体非整倍性检测方法及装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2015006932A1 true WO2015006932A1 (zh) | 2015-01-22 |
Family
ID=52345697
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2013/079495 WO2015006932A1 (zh) | 2013-07-17 | 2013-07-17 | 一种染色体非整倍性检测方法及装置 |
Country Status (5)
Country | Link |
---|---|
US (1) | US20160154931A1 (zh) |
EP (1) | EP3023504B1 (zh) |
CN (1) | CN104520437B (zh) |
HK (1) | HK1208888A1 (zh) |
WO (1) | WO2015006932A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104789686A (zh) * | 2015-05-06 | 2015-07-22 | 安诺优达基因科技(北京)有限公司 | 检测染色体非整倍性的试剂盒和装置 |
CN110428873A (zh) * | 2019-06-11 | 2019-11-08 | 西安电子科技大学 | 一种染色体倍数异常检测方法及检测系统 |
WO2021134513A1 (zh) * | 2019-12-31 | 2021-07-08 | 深圳华大医学检验实验室 | 确定染色体非整倍性、构建分类模型的方法和装置 |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108733979A (zh) * | 2017-10-30 | 2018-11-02 | 成都凡迪医疗器械有限公司 | Nipt的gc含量校准方法、装置及计算机可读存储介质 |
CN108363903B (zh) * | 2018-01-23 | 2022-03-04 | 和卓生物科技(上海)有限公司 | 一种适用于单细胞的染色体非整倍性检测系统及应用 |
CN110993029B (zh) * | 2019-12-26 | 2023-09-05 | 北京优迅医学检验实验室有限公司 | 一种检测染色体异常的方法及系统 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1376282A (zh) * | 1999-09-10 | 2002-10-23 | 威廉·L·克劳利 | 在基因算法、信息编码和非重复加密中采用合成基因 |
CN101849236A (zh) * | 2007-07-23 | 2010-09-29 | 香港中文大学 | 利用基因组测序诊断胎儿染色体非整倍性 |
CN102753703A (zh) * | 2010-04-23 | 2012-10-24 | 深圳华大基因科技有限公司 | 胎儿染色体非整倍性的检测方法 |
CN103003447A (zh) * | 2011-07-26 | 2013-03-27 | 维里纳塔健康公司 | 用于确定样品中存在或不存在不同非整倍性的方法 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005039389A2 (en) * | 2003-10-22 | 2005-05-06 | 454 Corporation | Sequence-based karyotyping |
LT2562268T (lt) * | 2008-09-20 | 2017-04-25 | The Board Of Trustees Of The Leland Stanford Junior University | Neinvazinis fetalinės aneuploidijos diagnozavimas sekvenavimu |
ES2741966T3 (es) | 2011-12-31 | 2020-02-12 | Bgi Genomics Co Ltd | Método para detectar una variación genética |
-
2013
- 2013-07-17 CN CN201380004733.6A patent/CN104520437B/zh active Active
- 2013-07-17 EP EP13889435.7A patent/EP3023504B1/en active Active
- 2013-07-17 US US14/905,617 patent/US20160154931A1/en not_active Abandoned
- 2013-07-17 WO PCT/CN2013/079495 patent/WO2015006932A1/zh active Application Filing
-
2015
- 2015-09-29 HK HK15109588.2A patent/HK1208888A1/zh unknown
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1376282A (zh) * | 1999-09-10 | 2002-10-23 | 威廉·L·克劳利 | 在基因算法、信息编码和非重复加密中采用合成基因 |
CN101849236A (zh) * | 2007-07-23 | 2010-09-29 | 香港中文大学 | 利用基因组测序诊断胎儿染色体非整倍性 |
CN102753703A (zh) * | 2010-04-23 | 2012-10-24 | 深圳华大基因科技有限公司 | 胎儿染色体非整倍性的检测方法 |
CN103003447A (zh) * | 2011-07-26 | 2013-03-27 | 维里纳塔健康公司 | 用于确定样品中存在或不存在不同非整倍性的方法 |
Non-Patent Citations (4)
Title |
---|
CHIU, R.W. ET AL.: "Noninvasive Prenatal Diagnosis of Fetal Chromosomal Aneuploidy by Massively Parallel Genomic Sequencing of DNA in Maternal Plasma", PNAS, vol. 105, no. 51, 23 December 2008 (2008-12-23), pages 20458 - 20463, XP055284693 * |
METZKER ML: "Sequencing technologies-the next generation", NAT REV GENET, vol. 11, no. 1, January 2010 (2010-01-01), pages 31 - 46 |
MICAH HAMADY; JEFFREY J WALKER; J KIRK HARRIS ET AL.: "Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex", NATURE METHODS, vol. 5, no. 3, March 2008 (2008-03-01), XP055179057, DOI: doi:10.1038/nmeth.1184 |
See also references of EP3023504A4 |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104789686A (zh) * | 2015-05-06 | 2015-07-22 | 安诺优达基因科技(北京)有限公司 | 检测染色体非整倍性的试剂盒和装置 |
CN110428873A (zh) * | 2019-06-11 | 2019-11-08 | 西安电子科技大学 | 一种染色体倍数异常检测方法及检测系统 |
CN110428873B (zh) * | 2019-06-11 | 2021-07-23 | 西安电子科技大学 | 一种染色体倍数异常检测方法及检测系统 |
WO2021134513A1 (zh) * | 2019-12-31 | 2021-07-08 | 深圳华大医学检验实验室 | 确定染色体非整倍性、构建分类模型的方法和装置 |
Also Published As
Publication number | Publication date |
---|---|
US20160154931A1 (en) | 2016-06-02 |
EP3023504A1 (en) | 2016-05-25 |
HK1208888A1 (zh) | 2016-03-18 |
EP3023504B1 (en) | 2019-10-02 |
CN104520437B (zh) | 2016-09-14 |
EP3023504A4 (en) | 2017-04-05 |
CN104520437A (zh) | 2015-04-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2022200046B2 (en) | Maternal plasma transcriptome analysis by massively parallel RNA sequencing | |
JP5938484B2 (ja) | ゲノムのコピー数変異の有無を判断する方法、システム及びコンピューター読み取り可能な記憶媒体 | |
WO2015006932A1 (zh) | 一种染色体非整倍性检测方法及装置 | |
KR101817785B1 (ko) | 다양한 플랫폼에서 태아의 성별과 성염색체 이상을 구분할 수 있는 새로운 방법 | |
US20190338349A1 (en) | Methods and systems for high fidelity sequencing | |
KR20170036734A (ko) | 생물학적 샘플 중의 무세포 핵산의 분획을 결정하기 위한 방법 및 장치 및 이의 용도 | |
CN105051208B (zh) | 确定胚胎基因组中预定区域碱基信息的方法、系统和计算机可读介质 | |
TW201823472A (zh) | 基於單倍型之通用非侵入性單基因疾病產前檢測 | |
US11869630B2 (en) | Screening system and method for determining a presence and an assessment score of cell-free DNA fragments | |
CN108229099B (zh) | 数据处理方法、装置、存储介质及处理器 | |
WO2016112539A1 (zh) | 确定胎儿核酸含量的方法和装置 | |
JP2024534899A (ja) | 非侵襲性出生前検査のための方法及びデバイス | |
CN118711655A (zh) | 一种基于重叠混合测序的肿瘤突变负荷计算方法 | |
BR112017001481B1 (pt) | Método e dispositivo para a determinação de uma fração de ácidos nucleicos fetal livres de células em uma amostra de sangue periférico de uma mulher grávida, sistema para determinar o sexo de gêmeos, e sistema para determinação de uma aneuploidia cromossômica de gêmeos, sistema para detectar quimera fetal | |
EA046998B1 (ru) | Анализ транскриптома материнской плазмы с применением массивного параллельного секвенирования рнк |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13889435 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14905617 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2013889435 Country of ref document: EP |