WO2013053182A1 - 检测核酸样本中预定事件的方法和系统以及捕获芯片 - Google Patents

检测核酸样本中预定事件的方法和系统以及捕获芯片 Download PDF

Info

Publication number
WO2013053182A1
WO2013053182A1 PCT/CN2011/084380 CN2011084380W WO2013053182A1 WO 2013053182 A1 WO2013053182 A1 WO 2013053182A1 CN 2011084380 W CN2011084380 W CN 2011084380W WO 2013053182 A1 WO2013053182 A1 WO 2013053182A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequencing
nucleic acid
chromosome
acid sample
predetermined
Prior art date
Application number
PCT/CN2011/084380
Other languages
English (en)
French (fr)
Inventor
蒋慧
陈芳
葛会娟
李培培
李旭超
汪建
王俊
杨焕明
张秀清
Original Assignee
深圳华大基因研究院
深圳华大基因科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳华大基因研究院, 深圳华大基因科技有限公司 filed Critical 深圳华大基因研究院
Priority to CN201180074169.6A priority Critical patent/CN105392893A/zh
Priority to US14/351,468 priority patent/US20140249038A1/en
Publication of WO2013053182A1 publication Critical patent/WO2013053182A1/zh
Priority to HK16103726.7A priority patent/HK1215812A1/zh
Priority to US16/023,868 priority patent/US20180371539A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • C12Q1/6874Methods for sequencing involving nucleic acid arrays, e.g. sequencing by hybridisation
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Definitions

  • the invention relates to the field of biomedicine.
  • the invention relates to methods and systems for detecting predetermined events in nucleic acid samples and capture chips. Background technique
  • Monogenic disorders are diseases or pathological traits controlled by a pair of alleles, also known as Mendelian diseases or monogenic genetic diseases, which can be genetically classified into autosomal recessive genetic diseases (AR), often Chromosomal dominant genetic disease (AD), X-linked recessive genetic disease (XR), X-linked dominant inheritance (XD), and Y-linked genetic disease; according to data published on the Human Genome Project Information website, there are 6,000 known species.
  • a single-gene genetic disease with clinical symptoms and a clear genetic mechanism http: ⁇ www.ncbi.nlm.nih.gov/omim).
  • the present invention aims to solve at least one of the technical problems existing in the prior art. To this end, it is an object of the present invention to provide a method for efficiently detecting a predetermined event in a nucleic acid sample.
  • the invention proposes a method of detecting a predetermined event in a nucleic acid sample.
  • the method of detecting a predetermined event in a nucleic acid sample comprises the steps of: constructing a sequencing library for the nucleic acid sample; sequencing the sequencing library to obtain a sequencing result composed of a plurality of sequencing data; Sequencing data from a predetermined region; and determining the occurrence of the predetermined event based on the composition of the sequencing data from the predetermined region.
  • the above method can effectively detect a predetermined event in a nucleic acid sample, for example, can effectively detect a mutation type in a SNP site, or can effectively perform aneuploidy of a prenatal chromosome.
  • the invention proposes a system for detecting a predetermined event in a nucleic acid sample.
  • the system for detecting a predetermined event in a nucleic acid sample comprises: a library construction device, the library construction device being adapted to construct a sequencing library for the nucleic acid sample; a sequencing device, the sequencing device and the Library construction devices are coupled and adapted to sequence the sequencing library to obtain sequencing results consisting of multiple sequencing data; An analysis device adapted to select sequencing data from the predetermined region from the sequencing results, and to determine the occurrence of the predetermined event based on the ratio of the sequencing data from the predetermined region to the total sequencing data.
  • the method for detecting a predetermined event in a nucleic acid sample described above can be effectively implemented, thereby effectively detecting a predetermined event in a nucleic acid sample, for example, can effectively detect a mutation type in a SNP site, or can be effective
  • the aneuploidy of prenatal chromosomes is performed.
  • the invention proposes a capture chip.
  • the capture chip includes: a chip body; and a plurality of oligonucleotide probes disposed on a surface of the chip body, wherein the oligo Nucleotide probes are specific for a predetermined region in the human genome.
  • the oligonucleotide probe based on the capture chip is specific to a predetermined region in the human genome, and thus, the capture chip can be effectively applied to the aforementioned method for detecting a predetermined event in a nucleic acid sample, effectively determining from a predetermined schedule.
  • the sequencing data of the region enables efficient detection of predetermined regions in the human genome.
  • FIG. 1 is a schematic structural diagram of a system for detecting a predetermined event in a nucleic acid sample according to an embodiment of the present invention
  • FIG. 2 is a schematic structural view of a system for detecting a predetermined event in a nucleic acid sample according to still another embodiment of the present invention
  • the SNP is detected, and according to the base probability distribution when the mother is heterozygous for the fetus, the simulated frequency of each base at different sequencing depths is randomly generated, and the Bayesian model shown in Formula I is used to calculate the difference.
  • FIG. 5 is a schematic structural view of a capture chip according to an embodiment of the present invention. detailed description
  • the invention proposes a method of detecting a predetermined event in a nucleic acid sample.
  • predetermined event refers to a mutation or abnormality that may be present in a nucleic acid sample, such as a genetic variation (http://en.wikipedia.org/wiki/Genetic_variation). The site or region of occurrence of these mutations or abnormalities has been previously known or reported.
  • the predetermined event that can be detected may be a structural variation of the nucleic acid sequence such as deletion, insertion, mutation, repetition, ectopic And inversion, etc., may also be a variation in the number of chromosomes such as aneuploidy, or may be a molecular genetic marker including a single nucleotide polymorphism (SNP), a small satellite, and a microsatellite sequence (STR) in the genome. .
  • SNP single nucleotide polymorphism
  • STR microsatellite sequence
  • the inventors have discovered that the specific region of a nucleic acid sample containing a site at which a predetermined event may occur can be detected, and the sequencing results of these specific regions can be formed (for example, at a specific site, the frequency at which each ATGC base appears) Performing the analysis can effectively determine whether the predetermined event or the type of the predetermined event described above occurs in the nucleic acid sample, for example, the type of the SNP can be determined. It should be noted that, based on the method of the present invention, based on the judgment of whether a "predetermined event" occurs, further analysis may be performed on the detected results, and further conclusions may be obtained, for example, according to the implementation of the present invention.
  • the method can be further applied to achieve an effective paternity test.
  • predetermined event shall be understood broadly to include not only items that can be directly derived from sequencing results, but also items that are obtained by further analysis of the test results, such as determining different nucleic acids. The kinship between the samples.
  • the method of detecting a predetermined event in a nucleic acid sample can include the following steps:
  • the type of the nucleic acid sample is not particularly limited and may be deoxyribonucleic acid (DNA) or ribonucleic acid (RA), preferably DNA.
  • DNA deoxyribonucleic acid
  • RA ribonucleic acid
  • the source of the nucleic acid sample is also not particularly limited.
  • the nucleic acid sample that can be used is at least one selected from the group consisting of a human genomic DNA sample and a free nucleic acid.
  • the genomic DNA sample is genomic DNA derived from human leukocytes or maternal plasma.
  • the inventors have found that with the methods of the invention, specific events such as nucleic acid mutations in the human genome can be efficiently determined.
  • specific events such as nucleic acid mutations in the human genome can be efficiently determined.
  • the genetic characteristics of the fetus can be effectively analyzed to achieve prenatal diagnosis or paternity testing for the fetus.
  • sequencing libraries for nucleic acid samples, those skilled in the art can appropriately select according to different sequencing technologies.
  • a nucleic acid sample from a biological sample
  • the method and apparatus for taking a nucleic acid sample are also not particularly limited, and can be carried out using a commercially available nucleic acid extraction kit.
  • the sequencing library is applied to a sequencing instrument, the sequencing library is sequenced, and corresponding sequencing results are obtained, which are composed of a plurality of sequencing data.
  • the method and apparatus that can be used for sequencing according to embodiments of the present invention are not particularly limited, and include, but are not limited to, dideoxy chain termination method; preferably high-throughput sequencing methods, whereby high utilization of these sequencing devices can be utilized
  • the characteristics of flux and deep sequencing further improve the efficiency of determining the aneuploidy of nucleated red blood cells. Thereby, the subsequent analysis of the sequencing data is improved, especially the accuracy and accuracy of the statistical test analysis.
  • the high throughput sequencing methods include, but are not limited to, second generation sequencing techniques or single molecule sequencing techniques.
  • the second generation sequencing platform (technology) (see Metzker ML. Sequencing technologies-the next generation. Nat Rev Genet. 2010 Jan; ll(l): 31-46, which is incorporated herein by reference in its entirety) Not limited to Illumina-Solexa (GA TM , HiSeq2000TM, etc.), ABI-Solid and Roche-454 (pyrophosphate sequencing) sequencing platforms; single molecule sequencing platforms (technologies) including but not limited to Helicos's true single molecule sequencing technology (True Single Molecule DNA sequencing ) , Pacific Biosciences single molecule real-time (SMRTTM), and nanopore sequencing technology from Oxford Nanopore Technologies (see Rusk, Nicole (2009-04-01). Cheap Third-Generation Sequencing. Nature Methods 6 (4): 244-245, which is incorporated herein in its entirety by reference.
  • the whole genome sequencing library can be sequenced using at least one selected from the group consisting of Illumina-Solexa, ABI-SOLiD, Roche-454, and single molecule sequencing devices.
  • the obtained sequencing results are processed to determine sequencing data from a predetermined region.
  • predetermined region shall be taken broadly and refers to any region of a nucleic acid molecule that contains a site at which a predetermined event may occur. For SNP analysis, it can refer to a region containing a SNP site.
  • the predetermined region refers to the full length or portion of the chromosome to be analyzed, i.e., all sequencing data from that chromosome is selected.
  • the method of selecting the sequencing data from the corresponding region from the sequencing results can be not particularly limited.
  • sequencing data from a predetermined region can be obtained by aligning all of the obtained sequencing data with a known nucleic acid reference sequence.
  • the sequencing of the sequencing library to be sequenced can also be completed before the sequencing operation, so that the sequencing data from the predetermined region can be directly obtained.
  • determining sequencing data from a predetermined region may include, after obtaining the sequencing result, screening the sequencing result by a comparison method to obtain sequencing data from a predetermined region. It is also possible to select a sequencing library by sequencing before it is finally obtained from a predetermined region. The sequencing results of the sequencing data.
  • the method of selecting a sequencing library is not particularly limited, and may be performed at any stage of constructing a sequencing library, for example, using a probe of a predetermined region specificity.
  • a genome can be interrupted to obtain a DNA fragment, a DNA probe can be screened using a specific probe, and a subsequent library construction operation can be performed on the selected DNA fragment, thereby obtaining a sequencing library from a predetermined region.
  • a subsequent library construction operation can be performed on the selected DNA fragment, thereby obtaining a sequencing library from a predetermined region.
  • the method before the sequencing library is sequenced, the method further includes the step of selecting the sequencing library by using a probe, wherein the probe is specific to the predetermined region. .
  • the sequencing library can be initially screened before sequencing, thereby increasing the proportion of the data that can be directly analyzed in the obtained sequencing data, and further increasing the sequencing depth, thereby simultaneously performing multiple predetermined regions of the nucleic acid sample.
  • Sequencing and analysis the form of the probe is not particularly limited.
  • the probe is arranged on a chip. Thus, by placing the probe on the chip, it is possible to further improve the efficiency of detection and analysis of the nucleic acid sample by realizing high-throughput screening of a plurality of predetermined regions of the sequencing library.
  • probes for screening a plurality of SNP sites can be integrated on one chip, and a plurality of different diseases can be simultaneously detected by one hybridization reaction.
  • the detection method of the embodiment of the present invention can detect a large number of SNP sites simultaneously, thereby realizing effective paternity testing and improving the effectiveness and timeliness of paternity testing.
  • the chromosomal abnormality can be detected by the detection method of the embodiment of the present invention by using the above-described chip for detecting a single-gene disease, for example, the chromosome is effectively realized in the embodiment of the present invention. Detection of aneuploidy such as trisomy 21.
  • a plurality of samples can be simultaneously detected as long as a different and known sequence of tags is added during the process of constructing a library for each sample. It greatly improves the throughput of detection, reduces the operation process and reagent loss of multiple detections in clinical applications, saves time and reduces costs, and provides great support for large-scale clinical non-invasive prenatal screening work in the future.
  • a method of determining sequencing data from a predetermined region by comparison may also be combined with a method of sequencing a predetermined region by a probe, thereby improving selection of sequencing data from a predetermined region.
  • the accuracy For relatively short detections of predetermined regions, for example for detection of a type of SNP mutation, it is possible to rely solely on probe hybridization screening libraries for screening of sequencing data.
  • the selection of the sequencing result further includes removing the result of poor sequencing quality from the sequencing result, and in this regard, those skilled in the art can perform filtering according to predetermined criteria.
  • the method further comprises: comparing the sequencing result with a known nucleic acid sequence to obtain a unique alignment sequence; And selecting sequencing data from the predetermined region from the unique alignment sequence.
  • the occurrence of the predetermined event can be judged based on the composition of the sequencing data from the predetermined region.
  • composition of sequencing data means that, for the region under study, all sequencing data, including the sequencing results of all the sites obtained, and the corresponding results The number of readings (reads). The inventors propose that the composition of these sequencing data can be analyzed by statistical analysis methods to eliminate accidental errors, thereby obtaining the sequencing results most likely to reflect the real situation.
  • the inventors have proposed an analysis method for SNPs.
  • the predetermined region is a nucleic acid fragment containing a known SNP
  • the predetermined event is a mutation type of the SNP site
  • determining that the predetermined event occurs in the nucleic acid sample further comprises: determining The SNP sites are the ratio of the sequencing data of bases VIII, T, G, and C, respectively, to the total sequencing data; and based on the ratio, the Bayesian model is used to determine the base with the highest probability of occurrence at the SNP site.
  • the mutation type of the SNP in the predetermined region can be effectively determined, and the paternity test can be performed by detecting the mutation type of the plurality of SNP sites in the fetus and its parents. And this method can effectively detect multiple types of mutations and expand the scope of disease detection.
  • is the base error rate, which is the proportion of bases that are measured wrong during sequencing.
  • Equation I is a Bayesian expansion that can be used to calculate the probability of current sequencing results when the predetermined region of the nucleic acid sample is a different genotype.
  • the genotype with the highest probability is the actual genotype determined according to the analytical method of the present invention.
  • ge «o3 ⁇ 4pe 0 is when the actual genotype is i, the current The probability of sequencing data can be determined by the formula
  • Pr(genotype i
  • sequence) represents the probability of occurrence of different genotypes in the current sequencing data.
  • the type of the specific nucleic acid site of the sample can be effectively determined, for example, multiple SNPs can be simultaneously determined.
  • the type of mutation can effectively detect the blood relationship between the samples, achieve effective paternity testing, and achieve effective detection of multiple diseases at the same time.
  • the above analysis method using the Bayesian model can also be applied to the analysis of other nucleic acid variations. Different from the traditional single-site PCR method, this method not only involves more sites, but also the detection results are more reliable, and at the same time, multiple samples can be detected, and the flux is greatly increased, which simplifies the operation process to a large extent.
  • the present invention also proposes a method of analyzing chromosome aneuploidy.
  • the predetermined region is a first chromosome in the genome
  • the predetermined event is aneuploidy of the first chromosome.
  • determining the occurrence of the predetermined event based on the number of sequencing data from the predetermined region further includes the following steps:
  • the ratio of the sequencing data from the first chromosome to the total sequencing data is determined, that is, the sequencing data from the first chromosome can be determined by comparing the sequencing data with the known genomic information, and the sequencing from the first chromosome is separately performed. The total amount of data, as well as the amount of total sequencing data, is compared to obtain the ratio of sequencing data from the first chromosome to the total sequencing data.
  • first chromosome as used herein should be understood broadly, and it can refer to any chromosome of interest that is expected to be studied, the number of which is not limited to one chromosome, and even all chromosomes can be analyzed at the same time.
  • the first chromosome is at least one selected from the group consisting of human chromosome 21, chromosome 18, chromosome 13 chromosome, X chromosome, and Y chromosome.
  • a common human chromosomal disease can be effectively determined.
  • the inventors of the present invention have surprisingly found that a method for determining chromosome aneuploidy according to an embodiment of the present invention can be very effectively applied to the detection of human chromosome 21, chromosome 18, chromosome 13, X chromosome and Y chromosome. Aneuploidy.
  • the method for determining chromosome aneuploidy can be very effectively applied to prenatal testing of pregnant women, which can greatly shorten the time of detection and damage to pregnant women, and avoid the possibility of routine detection. Abortion risk.
  • the source of the nucleic acid sample for studying chromosome aneuploidy is not particularly limited, and according to a specific example, the nucleic acid sample is genomic DNA extracted from maternal plasma. Therefore, the genetic diseases related to fetal chromosome aneuploidy are further detected under the premise of no damage to the fetus.
  • the non-invasive sampling method used in the method avoids the risk of abortion caused by traditional amniocentesis and the like, and the auxiliary facilities such as ultrasound are omitted, and the sampling is simpler and more convenient.
  • the ratio of the sequencing data of the first chromosome to the total sequencing data is significant compared with the normal nucleic acid sample. the difference.
  • the aneuploidy of the chromosome can be effectively determined, thereby enabling effective detection of fetal hereditary diseases before delivery.
  • predetermining parameter refers to the operation and analysis performed by repeating a nucleic acid sample of a known genome normal for a single cell of a biological sample. Relevant data about specific chromosomes. Those skilled in the art will appreciate that the same sequencing conditions and mathematical methods can be used to obtain relevant parameters for a particular chromosome, as well as relevant parameters for normal cells. Here, the relevant parameters of the normal nucleic acid sample can be used as a control parameter.
  • predetermined as used herein shall be understood broadly and may be determined experimentally in advance, or may be obtained by parallel experiments when performing biological sample analysis.
  • the predetermined parameter is a ratio of sequencing data from the first chromosome obtained from a normal nucleic acid sample to total sequencing data.
  • the difference between the ratio of the sequencing data from the first chromosome to the total sequencing data and the predetermined parameter can be expressed by any known mathematical method, for example, by comparing the ratio with a predetermined parameter, and The obtained result is compared with a threshold value, and if it is larger than the threshold value, it is determined that the nucleic acid sample is the first chromosome 3 body for the chromosome.
  • the method further includes performing a Student's test on the ratio and the parameter.
  • the accuracy and accuracy of the sequencing analysis results can be further improved.
  • the threshold may be set to at least 1.5, such as at least 2, more preferably at least 3, after performing the T-test.
  • a system 1000 for detecting a predetermined event in a nucleic acid sample includes a library construction device 100, a sequencing device 200, and an analysis device 300, in accordance with an embodiment of the present invention.
  • the above-described method of detecting a predetermined event in a nucleic acid sample according to an embodiment of the present invention can be effectively implemented. The advantages of this method have been described in detail above and will not be described again.
  • library construction device 100 is adapted to construct a sequencing library for a nucleic acid sample.
  • a method and a flow for constructing a sequencing library for a nucleic acid sample those skilled in the art can appropriately select according to different sequencing technologies.
  • a manufacturer of a sequencing instrument such as Illumina.
  • the method and apparatus for extracting a nucleic acid sample from a biological sample are also not particularly limited, and can be carried out using a commercially available nucleic acid extraction kit.
  • the sequencing device 200 is coupled to the library construction device 100 and is adapted to sequence the sequencing library to obtain sequencing results consisting of multiple sequencing data.
  • the method and apparatus that can be used for sequencing according to embodiments of the present invention are not particularly limited.
  • second generation sequencing techniques can be employed, and third generation and fourth generation or more advanced sequencing techniques can also be employed.
  • at least one selected from the group consisting of Illumina-Solexa, ABI-SOLiD, Roche-454, and a single molecule sequencing device can be used for the whole gene.
  • the sequencing library was sequenced.
  • the system may further include a library screening device 400, in accordance with an embodiment of the present invention.
  • a probe is disposed in the library screening device 400, the probe being specific to a predetermined region to perform thinning of the sequencing library using a probe.
  • the sequencing library can be subjected to preliminary dilution before sequencing, thereby increasing the proportion of the directly-analyzed data in the obtained sequencing data, and further increasing the sequencing depth to achieve multiple predetermined regions of the nucleic acid sample simultaneously.
  • the probe is in the form of a chip.
  • the library screening device 400 described herein can be disposed in any step of library construction, either after breaking a nucleic acid sample such as genomic DNA to obtain a DNA sheet, or in a sequencing library for obtaining genomic DNA. After that, before sequencing.
  • the analysis device 300 is coupled to the sequencing device 200 and is adapted to receive sequencing results from the sequencing device 200, selecting sequencing data from the predetermined region from the sequencing results, further based on the number of sequencing data from the predetermined region And determining the occurrence of the predetermined event.
  • the sequencing data from the predetermined region selected from the sequencing results has been described in detail above and will not be described herein.
  • the related sequence information may be pre-stored in the analysis device 300, or the analysis device 300 may be connected to a remote database (not shown) for networking operation.
  • the analysis device 300 is adapted to detect and analyze SNPs.
  • the predetermined region is a nucleic acid fragment containing a known SNP
  • the predetermined event is a mutation type of a SNP site
  • the analyzing device 300 is adapted to: determine that the base is eight at the SNP site , the ratio of the sequencing data of T, G, and C respectively to the total sequencing data; and based on the ratio, using a Bayesian model to determine the base with the highest probability of occurrence at the SNP site, in order to determine the nucleic acid sample
  • the type of mutation at the SNP site is adapted to detect that the base is eight at the SNP site , the ratio of the sequencing data of T, G, and C respectively to the total sequencing data; and based on the ratio, using a Bayesian model to determine the base with the highest probability of occurrence at the SNP site, in order to determine the nucleic acid sample
  • the type of mutation at the SNP site is adapted to detect and analyze SNPs.
  • the analyzing device 300 can be used to analyze the aneuploidy of the chromosome, and thus, the predetermined region is the first chromosome in the genome, and the predetermined event is aneuploidy of the first chromosome, wherein the analyzing device 300 is adapted to: determine a ratio of sequencing data from the first chromosome to total sequencing data; and determine, based on a difference between the ratio and a predetermined parameter, the nucleic acid sample for the first chromosome Whether it has aneuploidy.
  • the aneuploidy of the chromosome can be effectively determined, thereby enabling effective examination of the fetal hereditary disease before delivery. Measurement.
  • the first chromosome is at least one selected from the group consisting of human chromosome 21, chromosome 18, chromosome 13, X chromosome, and Y chromosome.
  • the analysis device 300 further comprises a T-test device (not shown) for performing a T-test on the ratio and the parameters. Thereby, the accuracy and accuracy of the sequencing analysis results can be further improved.
  • the method for detecting a predetermined event in a nucleic acid sample described above can be effectively implemented, thereby effectively detecting a predetermined event in a nucleic acid sample, for example, can effectively detect a mutation type in a SNP site, or can be effective
  • the analysis of aneuploidy of prenatal chromosomes was performed.
  • the term "connected” as used herein shall be understood broadly and may be either directly connected or indirectly connected as long as the above functional connections are achieved.
  • the invention also proposes a capture chip for the aforementioned method for detecting a predetermined event in a nucleic acid sample.
  • the chip 2000 includes a chip body 2001 and a plurality of oligonucleotide probes 2002.
  • the plurality of oligonucleotide probes 2002 are disposed on the surface of the chip body 2001, wherein the oligonucleotide probes are specific for a predetermined region in the human genome.
  • the capture chip it is possible to efficiently capture a nucleic acid sample corresponding to a predetermined region in the sample, whereby the efficiency of the method of detecting a predetermined event in the nucleic acid sample can be effectively improved.
  • the predetermined region of interest is first determined, and then the sequence of the oligonucleotide probe is determined based on the sequence characteristics of the predetermined region.
  • the type of the predetermined area is not particularly limited.
  • the predetermined region is a gene region associated with a disease in a human genome.
  • the gene region is located on chromosomes 18, 13 or 21 of the human genome.
  • the predetermined region is a nucleic acid fragment containing a known SNP.
  • the chip can be utilized to simultaneously screen a large amount of SNP related information.
  • Example 1 Detection of SNP locus
  • the samples taken included peripheral blood of a father and a mother during pregnancy in a family.
  • the cord was taken after birth and collected by an EDTA anticoagulation tube.
  • the mother's peripheral blood during pregnancy 1600g, centrifuged at 4 ° C for 10 minutes, the blood cells and plasma were separated, and the plasma was further centrifuged at 16000g for 10 minutes at 4 ° C to further remove residual white blood cells.
  • TIANamp Micro DNA Kit TIANGEN
  • Female's peripheral blood and fetal cord blood are directly extracted from the DNA using the kit. All DNA samples obtained, except plasma DNA samples, were interrupted to a 500 bp fragment using a CovarisTM interrupter.
  • the obtained DNA fragment was constructed according to the instructions provided by HiSeq2000TM sequencer manufacturer illumia®, and the sequencing library was obtained. The specific steps are as follows: End repair:
  • Klenow fragment (having 5' ⁇ 3' polymerase activity and 3' ⁇ 5 'exonuclease activity) 1 ⁇
  • the ligation product was recovered using a PCR purification kit (QIAGEN). The sample was finally dissolved in 32 ⁇ l of buffer.
  • the PCR reaction procedure is as follows:
  • the PCR product was recovered using a PCR purification kit (QIAGEN). The sample was finally dissolved in 50 ⁇ l of buffer. The constructed library was tested to the requirements of the Agilent® Bioanalyzer 2100, and the library was quantified by Q-PCR. After passing the test, the NilithGen custom-made solid phase chip 11032 I HG 19_BGI_exon_chrM_cap_HX3 (for the chip) The details are as follows. Hybridization, the hybridized product was sequenced using an illumina® HiSeq2000TM sequencer, and the number of sequencing cycles was PE101Index (ie, bidirectional lOlbp Index sequencing). The parameter setting and operation method of the instrument are carried out according to the HiSeq2000TM sequencer operating instructions provided by the manufacturer Illumina® (this manual can be obtained from http://www.illumina.com/support/documentation.ilmn).
  • the genomic sequence information Hgl9 was used as a reference sequence, and a total of 7464 probes with an average length of 150 bp were designed, which covered the region of the reference genome of 1.8M.
  • the Roche NimbleGen company is integrated into the miscellaneous core piece, which is 110321 HG 19_BGI_exon_chrM_cap_HX3.
  • the probe design can also be handed over to the chip company to achieve the same or similar effect as long as the probe effectively covers the area.
  • the amount of data obtained by sequencing is shown in Table-1.
  • the sequencing depth of white blood cell samples of parents and children is about 50x, and the depth of sequencing of peripheral blood samples of mothers during pregnancy is about 300x.
  • the parameter is set to ( -V 5 -S 40 -1 40 -r 1 ). Only the reads in the comparison result that are uniquely aligned to the target area of the chip are subjected to subsequent analysis.
  • whole genome sequencing and chip data have been used as standard results. Therefore, all SNP loci that fall on the target region of the chip are selected as candidate sites for analysis.
  • the coverage depth and the distribution of A, T, C, and G at each SNP site were counted, and the sites with lower coverage were filtered out, and finally the base distribution of the inferred site was obtained.
  • the genotypes of the fetus in the parental genome and the maternal peripheral blood were extrapolated according to the Bayesian model listed in Equation I. The specific data are shown in Table-2.
  • Example 2 Chromosome aneuploidy test Select a plasma sample of pregnant women who has confirmed the fetus as T21 (Twenty-one Trisomy Syndrome), and two cases of plasma samples of pregnant women with normal fetuses. Plasma DNA was extracted and the library was performed according to the method shown in Example 1. Construction, sequencing libraries were captured using the same capture chip as in Example 1, and sequenced using an Illumina® HiSeq2000TM sequencer. For abnormal chromosome number detection, the valid data obtained by sequencing are shown in Table-3. The sequencing depth of each sample was about 50x.
  • the alignment procedure was consistent with the SNP genotype inference of Example 1.
  • the analysis is based on the ratio of the uniqueness of the chromosomes in each chromosome to the proportion of reads in the whole genome sequencing data.
  • the ratio of the normal sample is used as a control for the division, and the obtained relative reads distribution is subjected to a T test, wherein the outliers exceeding the significant limit are the number of abnormal chromosomes.
  • the results are shown in Figure 4.
  • the other chromosomes were within the threshold, while chromosome 21 exceeded the threshold (3), as indicated by the arrow in Figure 4. By threshold screening, the number of chromosome 21 abnormalities can be successfully detected.
  • Table -3 Sampling data production

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Biochemistry (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Pathology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

本发明公开了检测核酸样本中预定事件的方法。其中,检测核酸样本中预定事件的方法,包括以下步骤:针对核酸样本构建测序文库;对测序文库进行测序,以便获得由多个测序数据构成的测序结果;确定来自预定区域的测序数据;以及基于来自预定区域的测序数据的组成,判断在核酸样本中发生所述预定事件。

Description

检测核酸样本中预定事件的方法和系统以及捕获芯片 优先权信息
本申请请求 2011 年 10 月 14 日向中国国家知识产权局提交的、 专利申请号为 201 110311333.2的专利申请的优先权和权益, 并且通过参照将其全文并入此处。 技术领域
本发明涉及生物医学领域。具体地,本发明涉及检测核酸样本中预定事件的方法和系统 以及捕获芯片。 背景技术
单基因病 (monogenic disorders)是由一对等位基因控制的疾病或病理性状, 又称孟德尔 疾病或单基因遗传病, 其按遗传方式可以分为常染色体隐性遗传病(AR )、 常染色体显性遗 传病 ( AD )、 X连锁隐形遗传病 ( XR )、 X连锁显性遗传 ( XD )和 Y连锁遗传病等; 根据 人类基因组计划信息网站公布的数据显示, 现有 6000种已知临床症状且遗传机理明确的单 基因遗传病 ( http:〃 www.ncbi.nlm.nih.gov/omim )。
然而, 目前的相关检测手段仍有待改进。 发明内容
本发明旨在至少解决现有技术中存在的技术问题之一。为此,本发明的一个目的在于提 出能够有效检测核酸样本中预定事件的方法。
根据本发明的第一方面,本发明提出了一种检测核酸样本中预定事件的方法。根据本发 明的实施例, 该检测核酸样本中预定事件的方法包括以下步骤: 针对所述核酸样本构建测 序文库; 对所述测序文库进行测序, 以便获得由多个测序数据构成的测序结果; 确定来自 预定区域的测序数据; 以及基于所述来自预定区域的测序数据的组成, 判断所述预定事件 的发生。 利用上述方法能够有效地对核酸样本中的预定事件进行检测, 例如可以有效地检 测 SNP位点中的突变类型, 或者能够有效地进行产前染色体的非整倍性。
根据本发明的第二方面,本发明提出了一种用于检测核酸样本中预定事件的系统。根据 本发明的实施例, 该用于检测核酸样本中预定事件的系统包括: 文库构建装置, 所述文库 构建装置适于针对所述核酸样本构建测序文库; 测序装置, 所述测序装置与所述文库构建 装置相连, 并且适于对所述测序文库进行测序, 以便获得由多个测序数据构成的测序结果; 分析装置, 所述分析装置适于从所述测序结果中选择来自预定区域的测序数据, 并且基于 所述来自预定区域的测序数据占总测序数据的比例, 判断所述预定事件的发生。 利用该系 统, 能够有效地实施前面所述的检测核酸样本中预定事件的方法, 从而有效地对核酸样本 中的预定事件进行检测, 例如可以有效地检测 SNP位点中的突变类型, 或者能够有效地进 行产前染色体的非整倍性。
根据本发明的第三方面, 本发明提出了一种捕获芯片。根据本发明的实施例, 该捕获芯 片包括: 芯片本体; 以及多个寡核苷酸探针, 所述多个寡核苷酸探针设置在所述芯片本体 的表面上, 其中, 所述寡核苷酸探针对于人基因组中的预定区域是特异性的。 基于该捕获 芯片所具有的寡核苷酸探针对于人基因组中的预定区域是特异性的, 因而, 该捕获芯片可 以有效地应用于前述检测核酸样本中预定事件的方法, 有效地确定来自预定区域的测序数 据, 从而能够有效的对人基因组中的预定区域进行检测。
本发明的附加方面和优点将在下面的描述中部分给出, 部分将从下面的描述中变得明 显, 或通过本发明的实践了解到。 附图说明
本发明的上述和 /或附加的方面和优点从结合下面附图对实施例的描述中将变得明显和 容易理解, 其中:
图 1是根据本发明一个实施例的检测核酸样本中预定事件的系统的结构示意图; 图 2是根据本发明又一个实施例的检测核酸样本中预定事件的系统的结构示意图; 图 3是根据本发明一个实施例的检测 SNP, 根据母亲杂合胎儿纯合时的碱基概率分布, 随机产生不同测序深度时各碱基的模拟频数,使用公式 I所示的贝叶斯模型进行运算,得到 不同测序深度时的准确度的结果, 其中, 胎儿浓度表示母亲外周血中胎儿游离 DNA占血浆 DNA的百分比, 检测效率表示该模型的检测效率即 1-FN (假阴性);
图 4是根据本发明一个实施例的检测染色体非整倍性的结果; 以及
图 5是才艮据本发明一个实施例的捕获芯片的结构示意图。 具体实施方式
下面详细描述本发明的实施例,所述实施例的示例在附图中示出,其中自始至终相同或 类似的标号表示相同或类似的元件或具有相同或类似功能的元件。 下面通过参考附图描述 的实施例是示例性的, 仅用于解释本发明, 而不能理解为对本发明的限制。 术语 "第一"、 "第二" 等仅用于方便描述目的, 而不能理解为指示或暗示相对重要性。 在本发明的描述 中, 除非另有说明, "多个" 的含义是两个或两个以上。
检测核酸样本中预定事件的方法
根据本发明的实施例,本发明提出了一种检测核酸样本中预定事件的方法。在本文中所 使用的术语 "预定事件" 是指核酸样本中可能存在的突变或者异常, 例如遗传变异 ( ( http://en.wikipedia.org/wiki/Genetic_variation ) )。 这些突变或者异常的发生位点或者区域 已经预先知道或者有所报道, 根据本发明的实施例的方法, 能够检测的预定事件可以为核 酸序列的结构变异如缺失、 插入、 突变、 重复、 异位和倒位等, 也可以为染色体数目的变 异如非整倍体等, 或者可以为分子遗传标记包含基因组内的单核苷酸多态性(SNP )、 小卫 星及微卫星序列 (STR )等。 发明人发现, 可以通过检测包含可能发生预定事件的位点的核 酸样本中的特定区域, 并对这些特定区域的测序结果的构成(例如, 在特定的位点, ATGC 碱基各自出现的频率)进行分析, 可以有效地确定核酸样本中是否发生上述预定事件或者 上述预定事件的类型, 例如可以确定 SNP的类型。 需要说明的是, 基于本发明的方法, 在 对是否发生 "预定事件" 进行判断的基础上, 还可以对这些所检测的结果进行进一步分析, 可以得出进一步的结论, 例如根据本发明的实施例, 在获得 SNP的信息之后, 可以进一步 将该方法应用于实现有效的亲子鉴定。 因而, 在本文中所使用的术语 "预定事件" 应做广 义理解, 其不仅包括可以直接通过测序结果得出的项目, 还可以包括通过对检测结果进行 进一步分析所得到的项目, 例如判断不同核酸样本之间的亲缘关系。
根据本发明的实施例, 检测核酸样本中预定事件的方法可以包括下列步骤:
首先, 针对核酸样本构建测序文库。根据本发明的实施例, 核酸样本的类型并不受特别 限制, 可以是脱氧核糖核酸(DNA ), 也可以是核糖核酸(R A ), 优选 DNA。 本领域技术 人员可以理解, 对于 R A样本, 可以通过常规手段将其转换为具有相应序列的 DNA样本, 进行检测。 另外, 核酸样本的来源也不受特别限制。 根据本发明的一些实施例, 可以釆用 的核酸样本为选自人的基因组 DNA样本和游离核酸的至少一种, 优选, 所述基因组 DNA 样本是来自人白细胞或孕妇血浆的基因组 DNA。 发明人发现, 利用本发明的方法, 能够有 效地确定人类基因组中的特定事件例如核酸突变。 另外, 通过对人外周血尤其是孕妇外周 血中提取的游离核酸或者基因组 DNA进行分析, 可以有效地对胎儿的遗传性状进行分析, 实现对胎儿无损的产前诊断或者亲子鉴定。 关于针对核酸样本, 构建测序文库的方法和流 程, 本领域技术人员可以根据不同的测序技术进行适当选择, 关于流程的细节, 可以参见 测序仪器的厂商例如 Illumina公司所提供的规程, 例如参见 Illumina公司 Multiplexing Sample Preparation Guide ( Part#1005361; Feb 2010 ) 或 Paired-End SamplePrep Guide ( Part#1005063; Feb 2010 ), 通过参照将其并入本文。 根据本发明的实施例, 从生物样本提 取核酸样本的方法和设备, 也不受特别限制, 可以釆用商品化的核酸提取试剂盒进行。 在获得测序文库之后, 将测序文库应用于测序仪器, 对测序文库进行测序, 并获得相应 的测序结果, 该测序结果是由多个测序数据构成的。 根据本发明的实施例, 可以用于进行 测序的方法和设备并不受特别限制, 包括但不限于双脱氧链终止法; 优选高通量的测序方 法, 由此, 能够利用这些测序装置的高通量、 深度测序的特点, 进一步提高了确定有核红 细胞染色体非整倍性的效率。 从而, 提高后续对测序数据进行分析, 尤其是统计检验分析 时的精确性和准确度。
所述高通量的测序方法包括但不限于第二代测序技术或者是单分子测序技术。
所述第二代测序平台 (技术)(可参见 Metzker ML. Sequencing technologies-the next generation. Nat Rev Genet. 2010 Jan;ll(l):31-46, 通过参照将其全文并入本文)包括但不限于 Illumina-Solexa ( GATM,HiSeq2000™等)、 ABI-Solid和 Roche-454 (焦磷酸测序)测序平台; 单分子测序平台 (技术) 包括但不限于 Helicos公司的真实单分子测序技术(True Single Molecule DNA sequencing ) , Pacific Biosciences 公司单分子实时测序 ( single molecule real-time (SMRT™) ), 以及 Oxford Nanopore Technologies公司的纳米孔测序技术等(可参见 Rusk, Nicole (2009-04-01). Cheap Third-Generation Sequencing. Nature Methods 6 (4): 244-245 , 通过参照将其全文并入本文)。
随着测序技术的不断进化,本领域技术人员能够理解的是还可以釆用其他的测序方法和 装置进行全基因组测序。
根据本发明的具体示例, 可以利用选自 Illumina-Solexa、 ABI-SOLiD、 Roche-454和单 分子测序装置的至少一种对所述全基因组测序文库进行测序。 接下来, 将所得到的测序结 果进行处理, 确定来自预定区域的测序数据。 在本文中所使用的术语 "预定区域" 应作广 义理解, 是指任何包含可能发生预定事件位点的核酸分子的区域。 对于 SNP分析而言, 可 以是指包含 SNP位点的区域。 对于分析染色体非整倍性, 则预定区域指的是所要分析的染 色体的全长或者部分, 即选择所有来自该染色体的测序数据。 从测序结果中选择来自相应 区域的测序数据的方法可以不受特别限制。 根据本发明的实施例, 可以通过将所得到的所 有测序数据与已知的核酸参照序列进行比对, 从而得到来自于预定区域的测序数据。 另夕卜, 也可以在进行测序操作之前, 完成对进行测序的测序文库的筛选, 从而可以直接获得来自 预定区域的测序数据。 由此, 根据本发明的实施例, 确定来自预定区域的测序数据, 可以 包括在获得测序结果之后, 通过比对等方法对测序结果进行筛选, 得到来自预定区域的测 序数据。 也可以通过在测序之前就对测序文库进行选择, 从而最终获得由来自预定区域的 测序数据构成的测序结果。 根据本发明的实施例, 对测序文库进行选择的方法并不受特别 限制, 可以是在构建测序文库的任何阶段进行, 例如可以釆用预定区域特异性的探针进行。 根据本发明的实施例, 可以将基因组打断获得 DNA片段, 使用特异性的探针对 DNA片段 进行筛选, 并对筛选得到的 DNA片段进行后续的文库构建操作, 从而得到来自预定区域的 测序文库。 当然, 也可以在获得 DNA测序文库之后, 利用特定区域特异性的探针对测序文 库进行筛选, 从而筛选得到来自预定区域的测序文库。 因而, 根据本发明的实施例, 可以 在将所述测序文库进行测序之前, 进一步包括利用探针对所述测序文库进行稀选的步骤, 其中所述探针对于所述预定区域是特异性的。 由此, 可以在测序之前, 对测序文库进行初 步筛选, 从而提高所得到的测序数据中可以直接进行分析的数据的比例, 并且可以进一步 提高测序深度, 实现同时对核酸样本的多个预定区域进行测序和分析。 根据本发明的实施 例, 探针的形式并不受特别限制。 根据本发明的实施例, 所述探针设置在芯片上。 由此, 通过将探针设置在芯片上, 可以通过实现高通量筛选多种预定区域的测序文库, 进一步提 高对核酸样本进行检测分析的效率。 本领域技术人员, 可以根据需要设计探针, 并且目前 有制造商可以提供探针合成以及芯片制作的服务, 例如可以设计针对 MHC 区域的杂交芯 片,或者针对多个 SNP (可以多至数量级为万 )。根据本发明的实施例,可以将筛选多个 SNP 位点的探针集成在一个芯片上, 通过一次杂交反应可以同时检测多种不同的疾病。 进一步, 发明人发现, 利用检测单基因疾病的芯片, 通过本发明实施例的检测方法, 基于可以同时 检测大量的 SNP位点, 因而, 可以实现有效的亲子鉴定, 提高亲子鉴定的有效性和时效性 并且, 根据本发明的实施例, 利用上述检测单基因疾病的芯片, 通过本发明实施例的检测 方法, 还可以对染色体异常进行检测, 例如在本发明的实施例中有效地实现了对染色体非 整倍性如 21三体综合症的检测。 另外, 根据本发明的实施例, 可以同时检测多种样品, 只 要在各个样品构建文库的过程中, 添加不同的且已知序列的标签即可。 大大提高了检测的 通量, 减少了临床应用中多次检测的操作过程和试剂损耗, 节省了时间, 降低了成本, 为 未来大规模用于临床无创产前筛查工作提供巨大支持。
另外,根据本发明的实施例, 通过比对确定来自预定区域的测序数据的方法, 也可以和 通过探针稀选预定区域的测序文库的方法相结合, 从而可以提高选择来自预定区域的测序 数据的精确性。对于预定区域比较短的检测, 例如对于目的在于检测 SNP突变类型的检测, 可以仅仅依靠探针杂交筛选文库来进行测序数据的筛选。 另外, 根据本发明的实施例, 对 测序结果进行选择, 进一步包括从测序结果中除去测序质量不好的结果, 关于这一点, 本 领域技术人员可以根据预定的标准进行过滤。 根据本发明的实施例, 在获得所述测序结果 之后, 进一步包括: 将所述测序结果与已知的核酸序列进行比对, 以便获得唯一比对序列; 以及从所述唯一比对序列选择来自预定区域的测序数据。 由此, 可以进一步提高对核酸样 本进行检测分析的准确性或效率。
在从测序结果中选择来自预定区域的测序数据之后,可以基于所述来自预定区域的测序 数据的组成, 判断所述预定事件的发生。 对于来自预定区域的测序数据, 尤其是通过二代 测序等高通量深度测序所得到的测序结果, 相同的位点, 会被检测多次, 同时也会有一定 的误差, 或者发生了其他的突变, 在本文中所使用的术语 "测序数据的组成" 的含义指的 是, 对于所研究的区域, 所有的测序数据, 包括所得到的所有位点的测序结果, 以及各种 结果所对应的读数(reads ) 的数目。 发明人提出, 可以通过统计分析的方法, 对这些测序 数据的组成进行分析, 排除偶然发生的误差, 从而得到最可能反映真实情况的测序结果。
为此, 发明人提出了一种针对 SNP的分析方法。 对于 SNP的分析方法, 所述预定区域 是包含已知 SNP的核酸片段, 所述预定事件为 SNP位点的突变类型, 其中, 判断在所述核 酸样本中发生所述预定事件进一步包括: 确定在 SNP位点分别为碱基八、 T、 G、 C的测序 数据分别占总测序数据的比例; 以及基于所述比例, 利用贝叶斯模型, 确定在所述 SNP位 点出现概率最高的碱基, 以便确定所述核酸样本中 SNP位点的突变类型。 由此, 可以有效 地确定预定区域中 SNP的突变类型,进而可以通过对胎儿及其父母中多个 SNP位点的突变 类型进行检测, 来进行亲子鉴定。 并且利用该方法能够有效地对多种变异类型进行检测, 扩大了疾病检测的范围。
发明人发现在特定位点, 四种碱基(A、 T、 C和 G ) 的出现是相互排斥的, 同时仅有 这四种可能, 因而在特定的位点出现特定碱基的概率服从四项分布。 因而, 当其基因型为 纯合型, 例如 AA , 则四种碱基出现的概率为:
Figure imgf000007_0001
注: *Pr(Base)表示碱基所出现的概率;
δ为碱基错误率, 即在测序过程中碱基被测错的比例
当其基因型为杂合型, 例如 ΑΓ , 则四种碱基出现的概率为:
Figure imgf000007_0002
注: *Pr(Base)表示碱基所出现的概率; δ为碱基错误率, 即在测序过程中碱基被测错的比例
根据四项分布的规律, 对于 η个测序结果中, Α出现 αΑ次、 Τ出现 αΓ次、 C出现 oc次 且 G出现 flG次的概率是
Figure imgf000008_0001
其中 <¾+ θΓ+<¾+<¾?=η,
PA . Ρτ . 和 pG 分别 表示碱基 A T C 和 G 的 出 现概率 , i e {^4Α, ΤΓ, CC, GG, AT, AC, AG, CT, CG, GT}。 由于目前测序技术的测序深度比较高, 所以 没有必要将先验的概率引入, 所以, 可以假定在观察前, 每种基因型出现的概率相等, 即 ^{genotype = 0 = 0.1 , 因为样本空间中 i {AA, ΊΤ, CC, GG, AT, AC, AG, CT, CG, GT}共有 i 0 种可能出现的情况。
基于以上前提, 可以通过贝叶斯模型, 对测序结果进行分析, 即利用下列方程:
Figure imgf000008_0002
公式 I是贝叶斯展开式, 可以分别计算在核酸样本中预定区域为不同的基因型时, 得到 当前的测序结果的概率。 概率最大时的基因型, 即为根据本发明的分析方法确定的实际基 因型。 其中, Pr(ge«o¾pe = )是指某种基因型的出现概率, 基于前述分析, 这里全都默认为 0.1 ; PrO e«ce | ge«o¾pe = 0是当实际基因型为 i时, 得到当前测序数据的概率, 可以由 公式
■藝議議議國議國 i議 illii議議圍隱隱變議變國 續誦 计算得到; Pr(genotype = i | sequence)代表在当前测序数据中, 不同基因型出现的概率。 借助上述贝叶斯模型的分析,可以将测序结果中,在特定位点出现特定碱基的概率进行 计算, 从而得到概率最高的测序结果, 由此, 可以确定针对该位点的基因型。 即出现概率 最大的基因型, 将会被认定为本位点的基因型。 另外可以将计算得到出现概率最大的基因 型所对应的 pr( ¾Pe ^ l ^^"Ci , 根据公式_1 () * 1()^ ^")转化成质量值, 来衡量本次 基因型决定的可靠性, 其中 Pr表示该基因型的出现概率。
由此, 可以有效地对样本特定核酸位点的类型进行确定, 例如可以同时确定多个 SNP 的突变类型, 从而可以有效地对样本之间的血缘关系进行检测, 实现有效的亲子鉴定, 也 可以实现同时对多种疾病的有效检测。 当然本领域技术人员可以理解, 上述利用贝叶斯模 型的分析方法, 也可以适用于其他核酸变异情况的分析。 与传统单个位点 PCR方法不同, 本方法不但涉及较多位点, 检测结果更加可靠, 且同时可检测多个样品, 通量大大增加, 使操作流程较大程度得到简化。
另夕卜, 本发明还提出了一种分析染色体非整倍性的方法。 由此, 根据本发明的一个实施 例, 所述预定区域是基因组中的第一染色体, 预定事件为所述第一染色体的非整倍性。 进 而, 根据本发明的实施例, 基于来自预定区域的测序数据的数目, 判断所述预定事件的发 生进一步包括下列步骤:
首先,确定来自第一染色体的测序数据占总测序数据的比例, 即可以通过将测序数据与 已知基因组信息进行比对, 确定来自第一染色体的测序数据, 并分别对来自第一染色体的 测序数据的总量, 以及总测序数据的量进行比较, 从而获得来自第一染色体的测序数据占 总测序数据的比例。 这里所使用的术语 "第一染色体" 应做广义理解, 其可以是指任何期 望研究的目的染色体, 其数目并不仅限于一条染色体, 甚至可以同时将全部染色体进行分 析。 根据本发明的实施例, 第一染色体为选自人类 21号染色体、 18号染色体、 13号染色 体、 X染色体和 Y染色体的至少一种。 由此, 能够有效地确定常见的人类染色体疾病。 本 发明的发明人惊奇地发现, 根据本发明实施例, 确定染色体非整倍性的方法, 能够非常有 效地应用于检测人类 21号染色体、 18号染色体、 13号染色体、 X染色体和 Y染色体的非 整倍性。 因而, 根据本发明的实施例的确定染色体非整倍性的方法, 能够非常有效地应用 于孕妇的产前检测, 可以极大地缩短检测的时间以及对孕妇的伤害, 避免常规检测可能带 来的流产风险。 根据本发明的实施例, 用于研究染色体非整倍性的核酸样本的来源不受特 别限制, 根据具体的实例, 所述核酸样本为从孕妇血浆中提取的基因组 DNA。 由此, 进一 步在对胎儿没有损伤的前提下, 实现对胎儿染色体非整倍性相关的遗传性疾病进行检测。 本方法所用无创取样的方式避免了传统羊水穿刺等方法带来的流产风险, 省去了超声等辅 助设施, 取样更加简单方便。
接下来,在获得来自第一染色体的测序数据占总测序数据的比例之后, 因为如果存在非 整倍性, 则第一染色体的测序数据占总测序数据的比例会与正常的核酸样本有显著的区别。 因而, 基于来自第一染色体的测序数据占总测序数据的比例与预定参数的差异, 可以确定 关于第一染色体, 所述核酸样本是否具有非整倍性。 由此, 可以有效地确定染色体的非整 倍性, 从而可以实现产前对胎儿遗传性疾病的有效检测。 在本文中所使用的术语 "预定参 数" 是指将已知基因组正常的核酸样本重复针对生物样本单细胞实施的操作和分析所得到 的关于特定染色体的相关数据。 本领域技术人员能够理解的是, 可以釆用相同的测序条件 和数学运算方法, 分别获得特定染色体的相关参数, 以及正常细胞的相关参数。 这里, 可 以将正常核酸样本的相关参数作为对照参数。 另外, 本文中所使用的术语 "预定", 应做广 义理解, 可以是预先通过实验确定的, 也可以是在进行生物样本分析时, 釆用平行实验获 得的。 由此, 根据本发明的一个实施例, 所述预定参数是从正常核酸样本获得的来自于所 述第一染色体的测序数据占总测序数据的比例。 根据本发明的实施例, 来自第一染色体的 测序数据占总测序数据的比例与预定参数的差异, 可以通过任何已知的数学方法体现出来, 例如, 可以通过将比例与预定参数进行比较, 并将所得到的结果与阈值相比较, 如果大于 阈值, 则判断为针对该染色体, 核酸样本为第一染色体 3体。 另外, 才艮据本发明的一个实 施例, 进一步包括对所述比例与所述参数进行 T检验(student's test )。 由此, 能够进一步 提高测序分析结果的准确度和精确度。 本领域技术人员可以理解, 在进行相关的数学统计 检验后, 也可以相应地设置不同的阙值, 来进行上述相似的分析。 根据本发明的实施例, 在进行 T检验后, 阈值可以设置为至少 1.5 , 例如至少 2, 更优选至少 3。
用于检测核酸样本中预定事件的系统
根据本发明的第二方面,本发明提出了一种用于检测核酸样本中预定事件的系统 1000。 参考图 1 , 根据本发明的实施例, 该用于检测核酸样本中预定事件的系统 1000包括文库构 建装置 100、 测序装置 200以及分析装置 300。 借助根据本发明实施例的用于检测核酸样本 中预定事件的系统, 能够有效地实施上述根据本发明实施例的检测核酸样本中预定事件的 方法。 关于该方法的优点, 前面已经进行了详细描述, 不再赘述。
根据本发明的实施例,文库构建装置 100适于针对核酸样本构建测序文库。根据本发明 的实施例, 关于针对核酸样本, 构建测序文库的方法和流程, 本领域技术人员可以根据不 同的测序技术进行适当选择, 关于流程的细节,可以参见测序仪器的厂商例如 Illumina公司 所提供的规程, 例如参见 Illumina 公司 Multiplexing Sample Preparation Guide ( Part#1005361; Feb 2010 )或 Paired-End SamplePrep Guide ( Part#1005063; Feb 2010 ), 通 过参照将其并入本文。 根据本发明的实施例, 从生物样本提取核酸样本的方法和设备, 也 不受特别限制, 可以釆用商品化的核酸提取试剂盒进行。
根据本发明的实施例,测序装置 200与文库构建装置 100相连,并且适于对测序文库进 行测序, 以便获得由多个测序数据构成的测序结果。 根据本发明的实施例, 可以用于进行 测序的方法和设备并不受特别限制。 根据本发明的实施例, 可以釆用第二代测序技术, 也 可以釆用第三代以及第四代或者更先进的测序技术。 根据本发明的具体示例, 可以利用选 自 Illumina-Solexa、 ABI-SOLiD、 Roche-454、 和单分子测序装置的至少一种对所述全基因 组测序文库进行测序。 由此, 结合最新的测序技术, 针对单个位点可以达到较高的测序深 度, 检测灵敏度和准确性大大提高, 因而能够利用这些测序装置的高通量、 深度测序的特 点, 进一步提高对核酸样本进行检测分析的效率。 从而, 提高后续对测序数据进行分析, 尤其是统计检验分析时的精确性和准确度。 参考图 2, 根据本发明的一个实施例, 该系统可 以进一步包括文库筛选装置 400。 根据本发明的实施例, 文库筛选装置 400中设置有探针, 所述探针对于预定区域是特异性的, 以便利用探针对所述测序文库进行稀选。 由此, 可以 在测序之前, 对测序文库进行初步稀选, 从而提高所得到的测序数据中可以直接进行分析 的数据的比例, 并且可以进一步提高测序深度, 实现同时对核酸样本的多个预定区域进行 测序和分析。 根据本发明的一个实施例, 所述探针呈芯片的形式。 由此, 通过将探针设置 在芯片上, 可以通过实现筛选多种预定区域的测序文库, 进一步提高对核酸样本进行检测 分析的效率。 如前所述, 这里所描述的文库筛选装置 400 可以设置在文库构建的任何环节 中, 既可以设置在将核酸样本例如基因组 DNA打断得到 DNA片后, 也可以设置在获得基 因组 DNA的测序文库之后, 进行测序之前。
根据本发明的实施例,分析装置 300与测序装置 200相连,并且适于测序装置 200接收 测序结果, 从所述测序结果中选择来自预定区域的测序数据, 进一步基于来自预定区域的 测序数据的数目, 判断所述预定事件的发生。 关于从测序结果中选择来自预定区域的测序 数据, 前面已经进行了详细描述, 在此不再赘述。 根据本发明的实施例, 可以釆用在分析 装置 300中预存有相关的序列信息, 也可以釆用分析装置 300与远程数据库(图中未显示) 相连, 进行联网操作。
关于判断所述预定事件的发生, 前面也进行了详细描述, 此处不再赘述。 简言之, 分析 装置 300适于对 SNP进行检测和分析。 对于 SNP的分析方法, 所述预定区域是包含已知 SNP的核酸片段, 所述预定事件为 SNP位点的突变类型, 其中, 分析装置 300适于进行: 确定在 SNP位点分别为碱基八、 T、 G、 C的测序数据分别占总测序数据的比例; 以及基于 所述比例, 利用贝叶斯模型, 确定在所述 SNP位点出现概率最高的碱基, 以便确定所述核 酸样本中 SNP位点的突变类型。 由此, 可以有效地确定预定区域中 SNP的突变类型, 进而 可以通过对胎儿及其父母中多个 SNP位点的突变类型进行检测, 来进行亲子鉴定。
根据本发明的一个实施例, 分析装置 300可以用于分析染色体的非整倍性, 因而, 预定 区域是基因组中的第一染色体, 所述预定事件为所述第一染色体的非整倍性, 其中, 所述 分析装置 300适于: 确定来自所述第一染色体的测序数据占总测序数据的比例; 以及基于 所述比例与预定参数的差异, 确定关于所述第一染色体, 所述核酸样本是否具有非整倍性。 由此, 可以有效地确定染色体的非整倍性, 从而可以实现产前对胎儿遗传性疾病的有效检 测。 根据本发明的一个实施例, 所述第一染色体为选自人类 21号染色体、 18号染色体、 13 号染色体、 X染色体和 Y染色体的至少一种。 由此, 能够有效地确定常见的人类染色体疾 病。 根据本发明的一个实施例, 分析装置 300进一步包括 T检验装置(图中未示出), 以便 对所述比例与所述参数进行 T检验。 由此, 能够进一步提高测序分析结果的准确度和精确 度。
利用该系统, 能够有效地实施前面所述的检测核酸样本中预定事件的方法,从而有效地 对核酸样本中的预定事件进行检测, 例如可以有效地检测 SNP位点中的突变类型, 或者能 够有效地进行产前染色体的非整倍性的分析。 在本文中所的术语 "相连" 应作广义理解, 既可以是直接相连, 也可以是间接相连, 只要能够实现上述功能上的衔接即可。
需要说明的是,本领域技术人员能够理解,在前面所描述的用于检测核酸样本中预定事 件的方法的特征和优点也适合于用于检测核酸样本中预定事件的系统, 为描述方便, 不再 详述。 捕获芯片
根据本发明的实施例, 本发明还提出了一种用于前述用于检测核酸样本中预定事件的 方法的捕获芯片。 参考图 5 , 该芯片 2000包括芯片本体 2001和多个寡核苷酸探针 2002。 根据本发明的实施例, 所述多个寡核苷酸探针 2002设置在芯片本体 2001的表面上, 其中, 所述寡核苷酸探针对于人基因组中的预定区域是特异性的。 由此, 通过利用该捕获芯片, 能够有效地将样品中与预定区域所对应的核酸样品进行捕获, 从而可以有效地提高检测核 酸样本中预定事件的方法的效率。 根据本发明的实施例, 首先确定所感兴趣的预定区域, 然后, 根据预定区域的序列特征, 确定寡核苷酸探针的序列。 并且, 根据本发明的实施例, 预定区域的类型并不受特别限制。 才艮据本发明的实施例, 所述预定区域是人基因组中与疾 病相关的基因区。 由此, 利用该芯片, 能够有效地筛选来自人类基因组中与疾病相关的基 因信息。 根据具体的示例, 基因区位于人基因组第 18、 13或 21号染色体。 另外, 根据本 发明的实施例, 预定区域是包含已知 SNP的核酸片段。 由此, 可以利用该芯片, 可以同时 筛选大量的 SNP相关信息。
需要说明的是,本领域技术人员能够理解,在前面所描述的用于检测核酸样本中预定事 件的方法的特征和优点也适合于该捕获芯片, 为描述方便, 不再详述。 下面参考具体实施例,对本发明进行说明,需要说明的是,这些实施例仅仅是说明性的, 而不能理解为对本发明的限制。 若未特别指明, 实施例中所釆用的技术手段为本领域技术人员所熟知的常规手段,可以 参照 《分子克隆实验指南》 第三版或者相关产品进行, 所釆用的试剂和产品也均为可商业 获得的。 未详细描述的各种过程和方法是本领域中公职的常规方法, 所用试剂的来源、 商 品名以及有必要列出其组成成分者, 均在首次出现时标明, 其后所用相同试剂如无特殊说 明, 均以首次标明的内容相同。 实施例 1、 SNP位点的检测 所取样品包括一个家庭中父亲和母亲孕期的外周血, 胎儿出生后取脐带血, 以 EDTA抗 凝管收集。 取母亲孕期外周血, 1600g, 4°C离心 10分钟, 将血细胞和血浆分开, 血浆再以 16000g, 4°C离心 10分钟, 进一步去除残留的白细胞。 孕妇外周血细胞和血浆用 TIANamp Micro DNA Kit ( TIANGEN )提取 DNA, 分别代表母亲基因组 DNA及母亲和胎儿基因组 DNA混合物。 父亲外周血和胎儿脐带血则直接用该试剂盒提取 DNA。 所获得的所有 DNA 样品,除血浆 DNA样品外,需用 Covaris™打断仪打断至 500bp大小的片段。将获得的 DNA 片段根据 HiSeq2000™ 测序仪制造商 illumia®公司所提供的说明书进行建库, 获得测序文 库, 具体步骤如下: 末端修复:
10 X T4多核苷酸激酶緩冲液 10 μΐ dNTPs(lOmM) 4 μΐ
Τ4 DNA聚合酶 5 μΐ
Klenow片段(具有 5'→3'聚合酶活性和 3'→5 '外切酶活性) 1 μΐ
Τ4 多核苷酸激酶 5 μ1
DNA 30μ1
ddH20 补至 100 μΐ
20°C反应 30分钟后, 使用 PCR纯化试剂盒 (QIAGEN)回收末端修复产物。 样品最后溶 于 34μ1的 ΕΒ緩冲液中。 末端添加碱基 Α:
10 X Klenow緩冲液 5μ1
dATP(lmM) ΙΟμΙ Klenow 片段 (3 '-5 ' exo— ) 3μ1
DNA (末端修复产物) 32μ1
37°C温育 30分钟后,经 MinElute® PCR纯化试剂盒 (QIAGEN)纯化并溶于 12μ1的 ΕΒ中。 接头连接:
2χ快速 DNA连接緩冲液 25μ1
PEI Adapter oligomix(20uM) 1 Ομΐ
Τ4 DNA连接酶 5μ1
末端添加碱基 Α的产物 1 Ομΐ
20°C反应 15分钟后,使用 PCR纯化试剂盒 (QIAGEN)回收连接产物。样品最后溶于 32μ1 的 ΕΒ緩冲液中。
PCR扩增:
接头连接的产物 10 μΐ
Phusion DNA聚合酶 Mix 25 μΐ
PCR 引物(lO pmol/μΙ) 1 μΐ
标签 Ν*(10 ρηιο1/μ1) 1 μΐ
超纯水 13 μΐ
注: *制造商 illumina®提供。
PCR反应程序如下:
10个循环
Figure imgf000014_0001
72 °C
4°C Hold
使用 PCR纯化试剂盒 (QIAGEN)回收 PCR产物。 样品最后溶于 50μ1的 ΕΒ緩冲液中。 将构建好的文库经 Agilent®Bioanalyzer 2100 检测片段分布范围符合要求, 再经过 Q-PCR 方法对该文库进行定量, 合格后, 用在 NimbleGen 公司定制的固相芯片 11032 I HG 19_BGI_exon_chrM_cap_HX3 (关于该芯片的细节见下)进行杂交, 杂交后产物 用 illumina® HiSeq2000™测序仪测序,测序循环数为 PE101Index(即双向 lOlbp Index测序), 其中仪器的参数设置及操作方法按照制造商 Illumina®提供的 HiSeq2000™测序仪操作说明 书进行 (该说明书可由 http://www.illumina.com/support/documentation.ilmn获取 )。
固相芯片 110321_HG19_BGI_exon_chrM_cap_HX3的设计和制备:
申请人根据制造商 Roche NimbleGen提供的探针设计指南, 针对下表中所列出的区域, 选取单基因疾病相关的区域( http://omim.org/statistics/geneMap ), 以已知的人基因组序列信 息 Hgl9为参考序列,设计了平均长度为 150bp的探针 7644条,其覆盖参考基因组 1.8M的 区 域 。 并 交 Roche NimbleGen 公 司 合 成 到 杂 交 芯 片 上 , 即 为 110321 HG 19_BGI_exon_chrM_cap_HX3。 作为一种替代方式, 探针设计亦可交予芯片公司 完成, 只要探针有效覆盖所述区域即能达到相同或类似的效果。
Figure imgf000015_0001
测序得到的数据量, 如表 -1中所示。 父母及孩子的白细胞样本测序深度约为 50x, 母亲 孕期外周血样本测序深度约为 300x。 在数据分析过程中, 使用 SOAP v2.20将测序 reads比 对到参考序列 hgl9上, 参数设置为 ( -V 5 -S 40 -1 40 -r 1 )。 只取比对结果中唯一比对到芯片 目标区域的 reads进行后续分析。对于父母和胎儿的 SNP结果, 已有全基因组测序和芯片数 据作为标准结果。因此从中选取所有落在芯片目标区域的 SNP位点作为候选位点进行分析。
表 -1 测序数据产量
Figure imgf000016_0001
统计每个 SNP位点上的覆盖深度及 A、 T、 C、 G分布, 过滤掉其中覆盖度较低的位点, 最终得到可推断位点的碱基分布。根据公式 I所列的贝叶斯模型进行父母基因组及母亲外周 血中胎儿的基因型的推断, 具体数据如表 -2中所示。
表 -2 SNP正确率统计
Figure imgf000016_0002
从表 2, 可以看出对于父母的基因型检测准确率基本为 100%, 胎儿基因型的检测准确 率也在 70%以上, 其中对应母亲纯合的位点检测正确率可以达到 92.54%, 正确率不高主要 是由母亲杂合位点造成。 目前结果受限于本次试验的测序深度。 如图 3所示是模拟数据分 析结果显示, 当深度提高时准确率还有较大的提升空间。 图 3是根据母亲杂合胎儿纯合时 的碱基概率分布, 随机产生不同测序深度时各碱基的模拟频数, 使用公式 1所示的贝叶斯 模型进行运算, 得到不同测序深度时的准确度的结果。
实施例 2 染色体非整倍性检测 选取已经羊水穿刺检测结果证实胎儿为 T21(二十一三体综合症)的孕妇血浆样品一例, 怀有正常胎儿孕妇的血浆样品两例, 分别提取血浆 DNA, 按照实施例 1所示方法进行文库 构建, 使用与实施例 1 中相同的捕获芯片对测序文库进行捕获后, 利用 Illumina® HiSeq2000TM 测序仪测序。 对于染色体数目异常检测, 测序得到的有效数据如表 -3所示。 各样本的测序深度为 50x左右。
比对等过程与实施例 1的 SNP基因型推断一致。 对于比对结果, 分析以染色体为单位 统计落入各染色体的唯一性比对 reads占全基因组测序数据的比例。 然后以正常样本的比例 作为对照进行相除, 对得到的相对 reads分布进行 T检验, 其中离群值超过显著限的即为数 目异常的染色体。 结果如图 4所示, 对于 T21血浆样品, 其他染色体都在阈值以内, 而 21 号染色体超出了阈值(3 ), 如图 4中箭头所指。 通过阈值筛选, 可以成功检测出 21号染色 体的数目异常。 表 -3 测序数据产量
Figure imgf000017_0001
在本说明书的描述中,参考术语 "一个实施例"、 "一些实施例"、 "示例"、 "具体示例"、 或 "一些示例" 等的描述意指结合该实施例或示例描述的具体特征、 结构、 材料或者特点 包含于本发明的至少一个实施例或示例中。 在本说明书中, 对上述术语的示意性表述不一 定指的是相同的实施例或示例。 而且, 描述的具体特征、 结构、 材料或者特点可以在任何 的一个或多个实施例或示例中以合适的方式结合。
尽管已经示出和描述了本发明的实施例, 本领域的普通技术人员可以理解: 在不脱离 本发明的原理和宗旨的情况下可以对这些实施例进行多种变化、 修改、 替换和变型, 本发 明的范围由权利要求及其等同物限定。

Claims

权利要求书
1、 一种检测核酸样本中预定事件的方法, 其特征在于, 包括以下步骤:
针对所述核酸样本构建测序文库;
对所述测序文库进行测序, 以便获得由多个测序数据构成的测序结果;
确定来自预定区域的测序数据; 以及
基于所述来自预定区域的测序数据的组成, 判断在所述核酸样本中发生所述预定事件, 其巾,
任选地, 所述核酸样本为选自人的基因组 DNA样本和游离核酸的至少一种, 任选地, 所述基因组 DNA样本是来自人白细胞或孕妇血浆的基因组 DNA,
任选地, 所述测序是利用选自 Illumina-Solexa、 ABI-Solid、 Roche-454、 和单分子测序 装置的至少一种对所述全基因组测序文库进行的,
任选地,在将所述测序文库进行测序之前,进一步包括利用探针对所述测序文库进行筛 选的步骤, 其中所述探针对于所述预定区域是特异性的,
任选地, 所述探针设置在芯片上,
任选地, 在获得所述测序结果之后, 进一步包括:
将所述测序结果与已知的核酸序列进行比对, 以便获得唯一比对序列; 以及 从所述唯一比对序列选择来自预定区域的测序数据。
2、根据权利要求 1所述的方法, 其特征在于, 所述预定区域是包含已知 SNP的核酸片 段, 所述预定事件为 SNP位点的突变类型,
其巾,
判断在所述核酸样本中发生所述预定事件进一步包括:
确定在 SNP位点分别为碱基 A、 T、 G、 C的测序数据分别占总测序数据的比例; 以 及
基于所述比例, 利用贝叶斯模型, 确定在所述 SNP位点出现概率最高的碱基, 以便 确定所述核酸样本中 SNP位点的突变类型。
3、根据权利要求 1所述的方法, 其特征在于, 所述预定区域是基因组中的第一染色体, 所述预定事件为所述第一染色体的非整倍性, 其中, 判断在所述核酸样本中发生所述预定 事件进一步包括:
确定来自所述第一染色体的测序数据占总测序数据的比例; 以及
基于所述比例与预定参数的差异,确定关于所述第一染色体,所述核酸样本是否具有非 整倍性,
任选地, 所述第一染色体为选自人类 21号染色体、 18号染色体、 13号染色体、 X 染色体和 Y染色体的至少一种,
任选地, 所述核酸样本为从孕妇血浆中提取的基因组 DNA,
任选地,所述预定参数是从正常核酸样本获得的来自于所述第一染色体的测序数据占总 测序数据的比例,
任选地, 进一步包括对所述比例与所述参数进行 T检验。
4、 一种用于检测核酸样本中预定事件的系统, 其特征在于, 包括:
文库构建装置, 所述文库构建装置适于针对所述核酸样本构建测序文库;
测序装置,所述测序装置与所述文库构建装置相连,并且适于对所述测序文库进行测序, 以便获得由多个测序数据构成的测序结果;
分析装置,所述分析装置适于确定来自预定区域的测序数据,并且基于所述来自预定区 域的测序数据的组成, 判断所述预定事件的发生,
任选地, 所述测序装置为选自 Illumina-Solexa、 ABI-Solid、 Roche-454、 和单分子测序 装置的至少一种,
任选地, 进一步包括文库筛选装置, 所述文库筛选装置中设置有探针, 对于所述预定区 域是特异性的, 以便利用所述探针对所述测序文库进行筛选,
5、 根据权利要求 4所述的系统, 其特征在于, 所述预定区域是包含已知 SNP的核酸片 段, 所述预定事件为 SNP位点的突变类型, 其中, 所述分析装置适于:
确定在 SNP位点分别为碱基 、 T、 G、 C的测序数据分别占总测序数据的比例; 以及 基于所述比例, 利用贝叶斯模型, 确定在所述 SNP位点出现概率最高的碱基, 以便确 定所述核酸样本中 SNP位点的突变类型。
6、根据权利要求 4所述的系统, 其特征在于, 所述预定区域是基因组中的第一染色体, 所述预定事件为所述第一染色体的非整倍性,
其中,
所述分析装置适于:
确定来自所述第一染色体的测序数据占总测序数据的比例; 以及
基于所述比例与预定参数的差异,确定关于所述第一染色体,所述核酸样本是否具有非 整倍性,
任选地, 所述第一染色体为选自人类 21号染色体、 18号染色体、 13号染色体、 X染色 体和 Y染色体的至少一种, 任选地, 所述分析装置进一步包括 T检验装置, 以便对所述比例与所述参数进行 Τ检 验。
7、一种用于权利要求 1所述检测核酸样本中预定事件的方法的捕获芯片, 其特征在于, 包括:
芯片本体;
多个寡核苷酸探针, 所述多个寡核苷酸探针设置在所述芯片本体的表面上, 其巾,
所述寡核苷酸探针对于人基因组中的预定区域是特异性的。
8、 根据权利要求 7所述的捕获芯片, 其特征在于, 所述预定区域是人基因组中与疾病 相关的基因区,
任选地, 所述基因区位于人基因组第 18、 13或 21号染色体。
9、根据权利要求 7所述的捕获芯片, 其特征在于, 所述预定区域是包含已知 SNP的核 酸片段。
PCT/CN2011/084380 2011-10-14 2011-12-21 检测核酸样本中预定事件的方法和系统以及捕获芯片 WO2013053182A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN201180074169.6A CN105392893A (zh) 2011-10-14 2011-12-21 检测核酸样本中预定事件的方法和系统以及捕获芯片
US14/351,468 US20140249038A1 (en) 2011-10-14 2011-12-21 Method of detecting a pre-determined event in a nucleic acid sample and system thereof
HK16103726.7A HK1215812A1 (zh) 2011-10-14 2016-03-31 檢測核酸樣本中預定事件的方法和系統以及捕獲芯片
US16/023,868 US20180371539A1 (en) 2011-10-14 2018-06-29 Method of detecting a pre-determined event in a nucleic acid sample and system thereof

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201110311333.2A CN102329876B (zh) 2011-10-14 2011-10-14 一种测定待检测样本中疾病相关核酸分子的核苷酸序列的方法
CN201110311333.2 2011-10-14

Related Child Applications (2)

Application Number Title Priority Date Filing Date
US14/351,468 A-371-Of-International US20140249038A1 (en) 2011-10-14 2011-12-21 Method of detecting a pre-determined event in a nucleic acid sample and system thereof
US16/023,868 Continuation US20180371539A1 (en) 2011-10-14 2018-06-29 Method of detecting a pre-determined event in a nucleic acid sample and system thereof

Publications (1)

Publication Number Publication Date
WO2013053182A1 true WO2013053182A1 (zh) 2013-04-18

Family

ID=45481837

Family Applications (4)

Application Number Title Priority Date Filing Date
PCT/CN2011/084329 WO2013053180A1 (zh) 2011-10-14 2011-12-21 一种超级芯片及其制备方法和应用
PCT/CN2011/084380 WO2013053182A1 (zh) 2011-10-14 2011-12-21 检测核酸样本中预定事件的方法和系统以及捕获芯片
PCT/CN2011/084395 WO2013053183A1 (zh) 2011-10-14 2011-12-21 对核酸样本中预定区域进行基因分型的方法和系统
PCT/CN2012/001381 WO2013053207A1 (zh) 2011-10-14 2012-10-12 测定待检测样本中疾病相关核酸分子的核苷酸序列的方法

Family Applications Before (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/084329 WO2013053180A1 (zh) 2011-10-14 2011-12-21 一种超级芯片及其制备方法和应用

Family Applications After (2)

Application Number Title Priority Date Filing Date
PCT/CN2011/084395 WO2013053183A1 (zh) 2011-10-14 2011-12-21 对核酸样本中预定区域进行基因分型的方法和系统
PCT/CN2012/001381 WO2013053207A1 (zh) 2011-10-14 2012-10-12 测定待检测样本中疾病相关核酸分子的核苷酸序列的方法

Country Status (5)

Country Link
US (2) US20140249038A1 (zh)
CN (4) CN102329876B (zh)
HK (2) HK1193845A1 (zh)
TW (1) TW201315813A (zh)
WO (4) WO2013053180A1 (zh)

Families Citing this family (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102329876B (zh) * 2011-10-14 2014-04-02 深圳华大基因科技有限公司 一种测定待检测样本中疾病相关核酸分子的核苷酸序列的方法
KR102018934B1 (ko) * 2012-02-27 2019-09-06 도레이 카부시키가이샤 핵산의 검출 방법
KR102001554B1 (ko) * 2014-01-16 2019-07-18 일루미나, 인코포레이티드 고형 지지체 상에서의 앰플리콘 제조 방법 및 시퀀싱
WO2016058121A1 (zh) * 2014-10-13 2016-04-21 深圳华大基因科技有限公司 一种核酸片段化方法和序列组合
CN105648043A (zh) * 2014-11-13 2016-06-08 天津华大基因科技有限公司 试剂盒及其在检测矮小相关基因中的用途
WO2016095736A1 (zh) * 2014-12-18 2016-06-23 深圳华大基因研究院 一种基于多重pcr的目标区域富集方法和试剂
MX2017012367A (es) * 2015-03-26 2017-12-14 Quest Diagnostics Invest Inc Producto en desarrollo de analisis de secuenciacion de alineacion y variante.
CN104805183A (zh) * 2015-03-31 2015-07-29 江汉大学 一种测试纯系植物新品种的特异性、一致性与稳定性的方法
CN104805192A (zh) * 2015-03-31 2015-07-29 江汉大学 一种测试油菜品种实质性派生关系的方法
CN104805187B (zh) * 2015-03-31 2018-02-13 农业部科技发展中心 一种测试纯系大豆新品种的特异性、一致性与稳定性的方法
CN104878085A (zh) * 2015-04-08 2015-09-02 江汉大学 一种油菜亲本来源真实性及其比例测试新方法
CN104805195A (zh) * 2015-04-08 2015-07-29 江汉大学 一种水稻亲本来源真实性及其比例测试新方法
CN104805196A (zh) * 2015-04-08 2015-07-29 江汉大学 一种植物亲本来源真实性及其比例测试新方法
WO2017139945A1 (zh) * 2016-02-18 2017-08-24 深圳华大基因研究院 分型方法和装置
CN105925666A (zh) * 2016-03-30 2016-09-07 广州精科生物技术有限公司 试剂盒、试剂盒的用途及检测目标区域变异的方法及系统
CN105986032A (zh) * 2016-03-30 2016-10-05 广州精科生物技术有限公司 试剂盒、建库方法以及检测目标区域变异的方法及系统
CN105861700B (zh) * 2016-05-17 2019-07-30 上海昂朴生物科技有限公司 一种针对神经肌肉病的高通量检测方法
CN106372459B (zh) * 2016-08-30 2019-03-15 天津诺禾致源生物信息科技有限公司 一种基于扩增子二代测序拷贝数变异检测的方法及装置
CN106355045B (zh) * 2016-08-30 2019-03-15 天津诺禾致源生物信息科技有限公司 一种基于扩增子二代测序小片段插入缺失检测的方法及装置
CN106282356B (zh) * 2016-08-30 2019-11-26 天津诺禾医学检验所有限公司 一种基于扩增子二代测序点突变检测的方法及装置
CN106399535A (zh) * 2016-10-19 2017-02-15 江苏苏博生物医学股份有限公司 一种高通量测序检测无创亲子鉴定的方法
CN106480222B (zh) * 2016-12-20 2019-09-24 广东辉锦创兴生物医学科技有限公司 基于悬浮微珠阵列系统检测遗传性耳聋的探针、引物、检测试剂盒及检测方法
CN106591461A (zh) * 2016-12-29 2017-04-26 天津协和华美医学诊断技术有限公司 一种检测遗传性易栓症相关基因群的检测试剂盒
CN108277267B (zh) * 2016-12-29 2019-08-13 安诺优达基因科技(北京)有限公司 检测基因突变的装置和用于对孕妇和胎儿的基因型进行分型的试剂盒
WO2018137496A1 (zh) * 2017-01-24 2018-08-02 深圳华大基因股份有限公司 确定生物样本中预定来源的游离核酸比例的方法及装置
CN109097457A (zh) * 2017-06-20 2018-12-28 深圳华大智造科技有限公司 确定核酸样本中预定位点突变类型的方法
CN109280701A (zh) * 2017-07-21 2019-01-29 深圳华大基因股份有限公司 用于地中海贫血检测的探针、基因芯片及制备方法和应用
CN107937513B (zh) * 2017-11-30 2018-12-25 东莞市第八人民医院 新生儿50种遗传病基因检测探针组及筛查方法
CN109913539A (zh) * 2017-12-13 2019-06-21 浙江大学 一种靶向捕获hla基因序列并测序的方法
CN108004301B (zh) * 2017-12-15 2022-02-22 格诺思博生物科技南通有限公司 基因目标区域富集方法及建库试剂盒
JP6891150B2 (ja) * 2018-08-31 2021-06-18 シスメックス株式会社 解析方法、情報処理装置、遺伝子解析システム、プログラム、記録媒体
JP2022505050A (ja) * 2018-10-16 2022-01-14 ツインストランド・バイオサイエンシズ・インコーポレイテッド プーリングを介した多数の試料の効率的な遺伝子型決定のための方法および試薬
CN109517819A (zh) * 2018-10-24 2019-03-26 深圳市易基因科技有限公司 一种用于检测多靶点基因突变、甲基化修饰和/或羟甲基化修饰的检测探针、方法和试剂盒
CN109576799B (zh) * 2018-11-30 2022-04-26 深圳安吉康尔医学检验实验室 Fh测序文库的构建方法和引物组及试剂盒
WO2020113577A1 (zh) * 2018-12-07 2020-06-11 深圳华大生命科学研究院 一种靶基因文库的构建方法、检测装置及其应用
WO2020118543A1 (zh) * 2018-12-12 2020-06-18 深圳华大生命科学研究院 分离和/或富集宿主源核酸和病原核酸的方法和试剂及其制备方法
CN109554485B (zh) * 2018-12-26 2022-04-19 北京迈基诺基因科技股份有限公司 一种用于无创检测待测胎儿染色体是否为非整倍体的试剂盒及其专用探针组
CN110029158B (zh) * 2019-02-01 2021-03-30 北京大学第三医院 一种马凡综合征检测panel及其应用
CN111961763A (zh) * 2020-09-17 2020-11-20 生捷科技(杭州)有限公司 一种新型冠状病毒检测基因芯片
CN112164423B (zh) * 2020-10-14 2021-03-23 深圳吉因加医学检验实验室 基于RNAseq数据的融合基因检测方法、装置和存储介质
CN114395620B (zh) * 2021-12-20 2022-09-20 温州谱希医学检验实验室有限公司 一种检测高度近视易感人群的生物标志物组合
WO2023172877A2 (en) * 2022-03-07 2023-09-14 Arima Genomics, Inc. Oncogenic structural variants
CN114540474B (zh) * 2022-03-11 2024-04-26 上海交通大学 一种基于暗探针技术的ngs靶向捕获方法及其在差异深度测序中的应用
CN114774515A (zh) * 2022-03-24 2022-07-22 北京安智因生物技术有限公司 一种检测多囊肾疾病基因突变的捕获探针、试剂盒和检测方法
CN115948574B (zh) * 2022-12-28 2023-11-10 中国人民解放军空军特色医学中心 一种基于三代测序的个体识别体系、试剂盒及其应用

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101849236A (zh) * 2007-07-23 2010-09-29 香港中文大学 利用基因组测序诊断胎儿染色体非整倍性
CN102127819A (zh) * 2010-11-22 2011-07-20 深圳华大基因科技有限公司 Mhc区域核酸文库的构建方法及用途

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7108976B2 (en) * 2002-06-17 2006-09-19 Affymetrix, Inc. Complexity management of genomic DNA by locus specific amplification
US20040110153A1 (en) * 2002-12-10 2004-06-10 Affymetrix, Inc. Compleixity management of genomic DNA by semi-specific amplification
DE602004024034D1 (de) * 2003-01-29 2009-12-24 454 Corp Nukleinsäureamplifikation auf basis von kügelchenemulsion
CN101012482A (zh) * 2007-02-12 2007-08-08 中国农业大学 一种筛选基因组dna中差异位点及其侧翼序列的方法
EP2053132A1 (en) * 2007-10-23 2009-04-29 Roche Diagnostics GmbH Enrichment and sequence analysis of geomic regions
CN101921841B (zh) * 2010-06-30 2014-03-12 深圳华大基因科技有限公司 基于Illumina GA测序技术的HLA基因高分辨率分型方法
CN101921874B (zh) * 2010-06-30 2013-09-11 深圳华大基因科技有限公司 基于Solexa测序法的检测人类乳头瘤病毒的方法
CN102329876B (zh) * 2011-10-14 2014-04-02 深圳华大基因科技有限公司 一种测定待检测样本中疾病相关核酸分子的核苷酸序列的方法

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101849236A (zh) * 2007-07-23 2010-09-29 香港中文大学 利用基因组测序诊断胎儿染色体非整倍性
CN102127819A (zh) * 2010-11-22 2011-07-20 深圳华大基因科技有限公司 Mhc区域核酸文库的构建方法及用途

Also Published As

Publication number Publication date
CN103890189B (zh) 2017-07-07
US20140249038A1 (en) 2014-09-04
CN103890189A (zh) 2014-06-25
HK1193845A1 (zh) 2014-10-03
US20180371539A1 (en) 2018-12-27
WO2013053180A1 (zh) 2013-04-18
TW201315813A (zh) 2013-04-16
WO2013053183A1 (zh) 2013-04-18
CN103874767B (zh) 2016-08-17
CN102329876B (zh) 2014-04-02
CN102329876A (zh) 2012-01-25
CN105392893A (zh) 2016-03-09
WO2013053207A1 (zh) 2013-04-18
CN103874767A (zh) 2014-06-18
HK1215812A1 (zh) 2016-09-15

Similar Documents

Publication Publication Date Title
WO2013053182A1 (zh) 检测核酸样本中预定事件的方法和系统以及捕获芯片
JP6585117B2 (ja) 胎児の染色体異数性の診断
JP6328934B2 (ja) 非侵襲性出生前親子鑑定法
JP6045686B2 (ja) 胎児ゲノムにおける所定領域の塩基情報を確定する方法、システム及びコンピューター読み取り可能な記録媒体
JP6426162B2 (ja) 非侵襲的に胎児の性染色体異数性のリスクを計算する方法
DK2562268T3 (en) Non-invasive diagnosis of fetal aneuploidy by sequencing
RU2013141237A (ru) Способы неинвазивного пренатального установления плоидности
JP6073461B2 (ja) 標的大規模並列配列決定法を使用した対立遺伝子比分析による胎児トリソミーの非侵襲的出生前診断
TWI467020B (zh) 檢測dmd基因外顯子缺失和/或重複的方法
JP2014507141A5 (zh)
WO2013086744A1 (zh) 确定基因组是否存在异常的方法及系统
JP2015522293A (ja) 多重化連続ライゲーションに基づく遺伝子変異体の検出
US20190338362A1 (en) Methods for non-invasive prenatal determination of aneuploidy using targeted next generation sequencing of biallelic snps
US20190338350A1 (en) Method, device and kit for detecting fetal genetic mutation
JP2022524208A (ja) 腫瘍モデルの同定のための方法および組成物
EP3018213A1 (en) Method for determining the presence of a biological condition by determining total and relative amounts of two different nucleic acids
WO2015042980A1 (zh) 确定染色体预定区域中snp信息的方法、系统和计算机可读介质
WO2024076469A1 (en) Non-invasive methods of assessing transplant rejection in pregnant transplant recipients
WO2013173993A1 (zh) 鉴定双胞胎类型的方法和系统
AU2015202167B2 (en) Noninvasive diagnosis of fetal aneuploidy by sequencing

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201180074169.6

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11873996

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 14351468

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11873996

Country of ref document: EP

Kind code of ref document: A1