WO2014029093A1 - Method and system for determining whether individual is in abnormal state - Google Patents

Method and system for determining whether individual is in abnormal state Download PDF

Info

Publication number
WO2014029093A1
WO2014029093A1 PCT/CN2012/080500 CN2012080500W WO2014029093A1 WO 2014029093 A1 WO2014029093 A1 WO 2014029093A1 CN 2012080500 W CN2012080500 W CN 2012080500W WO 2014029093 A1 WO2014029093 A1 WO 2014029093A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequencing
snp
nucleic acid
individual
acid sample
Prior art date
Application number
PCT/CN2012/080500
Other languages
French (fr)
Chinese (zh)
Inventor
王威
殷旭阳
张春雷
陈盛培
潘小瑜
张春生
Original Assignee
深圳华大基因科技有限公司
深圳华大基因研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳华大基因科技有限公司, 深圳华大基因研究院 filed Critical 深圳华大基因科技有限公司
Priority to CN201280074982.8A priority Critical patent/CN104508141A/en
Priority to PCT/CN2012/080500 priority patent/WO2014029093A1/en
Publication of WO2014029093A1 publication Critical patent/WO2014029093A1/en
Priority to HK15109589.1A priority patent/HK1208889A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Definitions

  • the present invention relates to the field of biomedicine and, in particular, to methods and systems for determining whether an individual has an abnormal state.
  • Mendelian genetic disease also known as single-gene disease (in this paper, Mendelian genetic disease and single-gene disease are used interchangeably), according to genetic methods can be divided into autosomal dominant, autosomal recessive, with sexual dominant , with concealed recessive genetic diseases.
  • OMIM online human Mendelian Genetic Database
  • the present invention aims to solve at least one of the technical problems existing in the prior art. To this end, the present invention proposes a method and system for determining whether an individual has an abnormal state.
  • the invention proposes a method of determining whether an individual has an abnormal state.
  • the method comprises: constructing a sequencing library for the nucleic acid sample of the individual; sequencing the sequencing library to obtain a sequencing result, the sequencing result consisting of a plurality of sequencing data; Sequencing the data, determining a known SNP contained in the sequencing result; and determining whether the individual has an abnormal state associated with the known SNP based on the known SNP.
  • the SNPs contained in the nucleic acid samples can be efficiently determined by sequencing, and since these SNPs are related to the abnormal state, it is possible to effectively determine whether the source individuals of the nucleic acid samples have abnormalities associated with these SNPs. status.
  • the method of determining whether an individual has an abnormal state may also have the following additional technical features:
  • the individual is a human.
  • the human body sample can be detected by the method of determining whether the individual has an abnormal state according to an embodiment of the present invention, so that it is possible to effectively predict whether the person has some abnormal state.
  • the abnormal state is a disease.
  • the disease is a monogenic disease.
  • the monogenic disease is at least one selected from the group consisting of thalassemia and erythrocyte glucose-6-phosphate dehydrogenase deficiency.
  • the thalassemia is beta-thalassemia.
  • the nucleic acid sample is at least a portion of an individual's whole genome DNA.
  • the nucleic acid sample is extracted from a single cell or a microsample of the individual.
  • the single or minimal sample is isolated from at least one member selected from the group consisting of blood, tissue, urine, gametes, fertilized eggs, blastomeres, and embryos.
  • the single cells are isolated by at least one selected from the group consisting of a dilution method, a mouth pipette separation method, micromanipulation, microdissection, flow cytometry, and flow control.
  • constructing the sequencing library for the nucleic acid sample of the individual further comprises: amplifying the nucleic acid sample to obtain a nucleic acid sample amplification product; and amplifying the product for the nucleic acid sample, The sequencing library was constructed.
  • the efficiency of constructing the sequencing library can be improved, thereby further improving the efficiency of subsequent determination of whether or not the individual has an abnormal state.
  • the nucleic acid sample is whole genome DNA extracted from a single cell of an individual, for example, may be a whole-base genomic DNA released by lysing a single cell of an individual, wherein the whole genome DNA is Amplification is carried out by at least one selected from the group consisting of PEP-PCR, DOP-PCR, OmniPlex WGA and MDA.
  • the efficiency of amplifying whole genome DNA can be further improved, thereby further improving the efficiency of subsequently determining whether an individual has an abnormal state.
  • the method before constructing the sequencing library for the nucleic acid sample amplification product, the method further comprises: screening the nucleic acid sample amplification product by using a nucleic acid probe to obtain a predetermined region. Amplification product of the nucleic acid sample; and constructing the sequencing library for the nucleic acid sample amplification product from the predetermined region.
  • the predetermined area is at least one exon area. Thereby, the known SNPs included in the predetermined area can be effectively determined, whereby the efficiency of determining the abnormal state related to the SNPs according to the predetermined SNPs included in the predetermined area such as the exon area can be effectively improved. And accuracy.
  • the nucleic acid probe is provided in the form of a chip. Thereby, the efficiency of screening using the nucleic acid probe can be further improved, thereby further improving the efficiency of subsequently determining whether the individual has an abnormal state.
  • the sequencing is performed using at least one selected from the group consisting of Illumina Hiseq 2000, Genome Analyzer, SOLiD sequencing system, Ion Torrents Ion Proton, 454, PacBio RS sequencing system, Helicos tSMS technology, and nanopore sequencing technology. ongoing. Thereby, the high-throughput, deep-sequencing characteristics of these sequencing devices can be utilized to further improve the efficiency of determining whether an individual has an abnormal state.
  • the sequencing is performed using Illumina Hiseq 2000, which is 90 bp in length. Thereby, the efficiency of subsequent SNP analysis can be further improved, thereby improving the efficiency of determining whether an individual has an abnormal state.
  • the determining the known SNPs included in the sequencing result based on the sequencing data is performed by comparing the sequencing data with a reference gene.
  • the reference gene is a known human genomic sequence.
  • the alignment is performed by SOAP/SOAP2 software.
  • the method further comprises filtering the known SNPs included in the sequencing result, the filtering is based on the following filtering conditions: SNP calling quality value is greater than 20; SNP site sequencing depth is greater than 8; SNP The depth of the locus is less than 5 times the average depth of the genome; the copy number of the SNP locus is no more than 2; and the distance between the SNP locus and the nearest other SNP loci is greater than 5.
  • SNP calling quality value is greater than 20
  • SNP site sequencing depth is greater than 8
  • SNP The depth of the locus is less than 5 times the average depth of the genome; the copy number of the SNP locus is no more than 2; and the distance between the SNP locus and the nearest other SNP loci is greater than 5.
  • the known SNP is located in the human chromosome HBB gene region. In one embodiment of the invention, the known SNP is said to be at least one selected from the group consisting of: rs33985472, rs63750954, rs63751128, rs33978907, rs34029390, rs34809925, rs33953406, rs33910569, rs33910569, rs33971634, rs3392539 rs33946267, rs35485099, rs36015961, rs33930977, rs35256489, rs33952266, rs33952266, rs33952266, rs33914668, rs33913413, rs33913413, rs34483965, rs34793594, rs35703285,
  • the invention proposes a system for determining whether an individual has an abnormal state.
  • the system comprises: a sequencing library construction device for constructing a sequencing library for a nucleic acid sample of the individual; a sequencing device, the sequencing device and the sequencing library construction device Connected for sequencing the sequencing library to obtain a sequencing result, the sequencing result consisting of a plurality of sequencing data; a SNP determining device, the SNP determining device being coupled to the sequencing device for performing the sequencing based Data, determining a known SNP included in the sequencing result; and abnormal state determining means, the abnormal device determining means being connected to the SNP determining means for determining whether the individual is suffering based on the known SNP There are abnormal states associated with the known SNPs.
  • a method of determining whether an individual has an abnormal state as described above, using a system according to an embodiment of the present invention for determining whether an individual has an abnormal state, thereby efficiently determining a SNP contained in a nucleic acid sample by sequencing, and Since these SNPs are related to abnormal states, it is thereby possible to effectively determine whether the source individuals of the nucleic acid samples have abnormal states associated with these SNPs.
  • the system for determining whether an individual has an abnormal state may also have the following additional technical features:
  • the system further comprises: a nucleic acid sample extraction device adapted to extract at least a portion of the individual's whole genomic DNA from a single cell or a micro sample of the individual.
  • the system further comprises: a biological sample separation device adapted to be selected from the group consisting of blood, tissue, urine, gametes, fertilized eggs, blastomeres, and embryos At least one separates a single cell or a microsample.
  • a biological sample separation device adapted to be selected from the group consisting of blood, tissue, urine, gametes, fertilized eggs, blastomeres, and embryos At least one separates a single cell or a microsample.
  • the biological sample separation device is adapted to be separated by a method selected from a dilution method and a mouth pipette Method, micromanipulation, microdissection, flow cytometry, microfluidic separation of at least one isolated single cell or microsample.
  • the above-described method of determining whether an individual has an abnormal state can be effectively implemented using a system for determining whether an individual has an abnormal state according to an embodiment of the present invention.
  • the single cells of the biological sample can be obtained efficiently and conveniently, so that the subsequent operations can be performed, whereby the single cells can be effectively separated from the individual, thereby further improving the efficiency of subsequently determining whether the individual has an abnormal state.
  • the sequencing library construction device further comprises: a nucleic acid sample amplification unit, wherein the nucleic acid sample amplification unit is adapted to amplify the nucleic acid sample to obtain a nucleic acid sample amplification product.
  • the amplification unit is adapted to perform at least one selected from the group consisting of PEP-PCR, DOP-PCR, OmniPlex WGA, and MDA.
  • the above-described method of determining whether an individual has an abnormal state can be effectively implemented using a system for determining whether an individual has an abnormal state according to an embodiment of the present invention. Thereby, the efficiency of amplifying whole genome DNA can be further improved, thereby further improving the efficiency of subsequently determining whether an individual has an abnormal state.
  • the sequencing library construction device further includes: a sorting unit, wherein the screening unit is provided with a nucleic acid probe to screen the nucleic acid sample amplification product by using the nucleic acid probe And obtaining a nucleic acid sample amplification product from a predetermined region; and constructing the sequencing library for the nucleic acid sample amplification product from the predetermined region.
  • the nucleic acid probe is provided in the form of a chip.
  • the efficiency of screening using the nucleic acid probe can be further improved, thereby further improving the efficiency of subsequently determining whether the individual has an abnormal state.
  • the sequencing device is at least selected from the group consisting of Illumina Hiseq2000, Genome Analyzer, SOLiD sequencing system, Ion Torrent, Ion Proton, 454, PacBio RS sequencing system, Helicos tSMS sequencing device, and nanopore sequencing device.
  • Illumina Hiseq2000 Genome Analyzer
  • SOLiD sequencing system Ion Torrent
  • Ion Proton 454, PacBio RS sequencing system
  • Helicos tSMS sequencing device Helicos tSMS sequencing device
  • nanopore sequencing device is at least selected from the group consisting of Illumina Hiseq2000, Genome Analyzer, SOLiD sequencing system, Ion Torrent, Ion Proton, 454, PacBio RS sequencing system, Helicos tSMS sequencing device, and nanopore sequencing device.
  • the sequencing is performed using Illumina Hiseq 2000, which is 90 bp in length. Thereby, the efficiency of subsequent SNP analysis can be further improved, thereby improving the efficiency of determining whether an individual has an abnormal state
  • the SNP determining apparatus further includes: a comparing unit, configured to determine, by comparing the sequencing data with a reference gene, the included in the sequencing result Know the SNP.
  • the comparison unit is adapted to perform alignment using SOAP/SOAP2 software.
  • the SNP determining apparatus further includes: a SNP filtering unit, wherein the SNP filtering unit is adapted to filter the known SNPs included in the sequencing result based on the following filtering conditions: SNP calling The mass value is greater than 20; the SNP site sequencing depth is greater than 8; the SNP site depth is less than 5 times the genome average depth; the SNP site copy number is not greater than 2; and the distance between the SNP site and the nearest other SNP site is greater than 5.
  • SNP calling The mass value is greater than 20; the SNP site sequencing depth is greater than 8; the SNP site depth is less than 5 times the genome average depth; the SNP site copy number is not greater than 2; and the distance between the SNP site and the nearest other SNP site is greater than 5.
  • the known SNP is located in the human chromosome HBB gene region.
  • the known SNP is at least one selected from the group consisting of: rs33985472, rs63750954, rs63751128, rs33978907, rs34029390, rs34809925, rs33953406, rs33910569, rs33910569, rs33971634, rs3392539 rs33946267, rs35485099, rs36015961, rs33930977, rs35256489, rs33952266, rs33952266, rs33952266, rs33914668, rs33913413, rs33913413, rs34483965, rs34793594, rs35703285, rs
  • the above-described method of determining whether an individual has an abnormal state can be effectively implemented using a system for determining whether an individual has an abnormal state according to an embodiment of the present invention. It is thus possible to effectively determine whether the subject has a risk of thalassemia, especially beta-thalassemia.
  • FIG. 1 is a flow chart showing a method of determining whether an individual has an abnormal state according to an embodiment of the present invention
  • FIG. 2 is a flow chart showing a method of determining whether an individual has an abnormal state according to another embodiment of the present invention
  • FIG. 4 is a schematic structural view of a system for determining whether an individual has an abnormal state according to an embodiment of the present invention
  • FIG. 6 is a flow chart showing a method of determining whether an individual has an abnormal state, according to another embodiment of the present invention. Detailed description of the invention
  • abnormal state as used herein shall be understood broadly, and may be any state different from the normal state of an individual such as a person, and may include, for example, a disease, an immune abnormality, or the like.
  • the abnormal state is a disease.
  • the type of the disease is not particularly limited, and according to a preferred embodiment of the present invention, the disease is a single gene disease.
  • a single gene disease monotherapy is usually a disease or pathological trait controlled by a pair of alleles. Therefore, by detecting SNPs associated with a single gene disease, it is possible to effectively determine whether the subject under study has the disease.
  • the monogenic disease is at least one selected from the group consisting of thalassemia and erythrocyte glucose-6-phosphate dehydrogenase deficiency.
  • the thalassemia is beta-thalassemia.
  • the invention proposes a method of determining whether an individual has an abnormal state.
  • the method includes:
  • a sequencing library is first constructed for individual nucleic acid samples, which can be used for subsequent sequencing and results analysis.
  • the term "individual” is used without limitation, and may be any organism containing genetic information, for example, may be human. Thus, it is possible to determine whether an individual is suffering from an embodiment according to an embodiment of the present invention.
  • the method of abnormal state detects the human body sample, so that it can effectively predict whether a person has some abnormal state.
  • the method further includes the step of extracting a nucleic acid sample from an individual.
  • nucleic acid sample as used herein shall be understood broadly and may be a DNA sample or It can be an RNA sample, or it can be a modified or processed DNA sample or RNA sample, as long as the genetic sequence can be determined by sequencing.
  • the nucleic acid sample can be at least a portion of an individual's whole genome DNA.
  • the whole genome DNA contains all the genetic information of the individual, and thus, by sequencing and SNP analysis of the whole genome DNA, the SNP information of the individual can be obtained more effectively and completely, thereby further improving whether the individual has an abnormality. The efficiency and accuracy of the state method.
  • the nucleic acid sample is extracted from a single cell or a microsample of the individual.
  • the method and means for obtaining a nucleic acid sample from a single cell or a micro sample are not particularly limited, and for example, single cell cleavage may be carried out by using a lysate to effect release and collect single cell whole genome DNA.
  • the single or minimal sample is isolated from at least one member selected from the group consisting of blood, tissue, urine, gametes, fertilized eggs, blastomeres, and embryos.
  • the efficiency of determining whether an individual has an abnormal state can be improved by effectively predicting and evaluating whether an individual has an abnormal state by a small sample from an individual, and the cost of determining whether the individual has an abnormal state is reduced.
  • these samples can be easily obtained from organisms, and can be specifically sampled for certain diseases to take specific analytical measures for certain specific diseases.
  • the single cells are isolated by at least one selected from the group consisting of a dilution method, a mouth pipette separation method, micromanipulation, microdissection, flow cytometry, and microfluidics.
  • the obtained nucleic acid sample can be expanded, especially for a nucleic acid sample extracted from a single cell or a micro sample.
  • constructing a sequencing library for the nucleic acid sample of the individual further comprises: amplifying the obtained nucleic acid sample to obtain a nucleic acid sample amplification product (S102).
  • a product can be amplified from the obtained nucleic acid sample to construct a sequencing library.
  • the method of amplifying a nucleic acid sample according to an embodiment of the present invention is not particularly limited.
  • the nucleic acid sample employed is whole genome DNA extracted from a single cell of an individual, and amplification of the coupon genomic DNA can be performed by selecting from PEP-PCR, DOP-PCR, OmniPlex Conducted by at least one of WGA and MDA.
  • the efficiency of amplifying whole genome DNA can be further improved, thereby further improving the efficiency of subsequently determining whether an individual has an abnormal state.
  • the step of lysing the single cells to release the whole genome of the single cells may be further included.
  • a method which can be used for lysing a single cell and releasing a whole genome is not particularly limited as long as single cell lysis can be preferably sufficiently lysed.
  • the single cell can be cleaved and released using an alkaline lysate Single-genome whole genome. The inventors have found that this can effectively lyse single cells and release the whole genome, and the released whole genome can improve the accuracy when sequencing, thereby further improving the efficiency of determining single cell chromosome aneuploidy.
  • the method of single-cell whole genome amplification is not particularly limited, and PCR-based methods such as PEP-PCR, DOP-PCR, and OmniPlex WGA may be employed, and non-PCR-based methods may be employed, for example. MDA (multiple strand displacement amplification).
  • a PCR based method such as the OmniPlex WGA method, is preferably employed.
  • Commercial kits of choice include, but are not limited to, GenomePlex from Sigma Aldrich, PicoPlex from Rubicon Genomics, REPLI-g from Qiagen, illustra GenomiPhi from GE Healthcare, and the like.
  • the single cell whole genome can be amplified using OmniPlex WGA prior to construction of the sequencing library.
  • the whole genome can be efficiently amplified, thereby further improving the efficiency of determining whether the individual has an abnormal state.
  • the method before constructing the sequencing library for the nucleic acid sample amplification product, further comprises: screening the nucleic acid sample amplification product by using a nucleic acid probe to obtain a predetermined region. Amplification product of the nucleic acid sample; and constructing the sequencing library for the nucleic acid sample amplification product from the predetermined region.
  • the predetermined area is at least one exon area.
  • the method of screening the amplification product of the nucleic acid sample by means of the nucleic acid probe is not particularly limited, and may be a solid phase screening or a liquid phase hybridization.
  • the nucleic acid probe can be provided in the form of a chip. Thereby, the efficiency of screening using the nucleic acid probe can be further improved, thereby further improving the efficiency of subsequently determining whether the individual has an abnormal state.
  • the nucleic acid sample of the predetermined region can also be analyzed by other known methods, for example, the nucleic acid sample is subjected to PC using a specific primer, thereby obtaining a related amplification product of a predetermined region. Thereby, a sequencing library of the predetermined region is constructed, and information about the predetermined region is obtained.
  • a method of constructing a sequencing library for a nucleic acid sample of an individual is not particularly limited.
  • Those skilled in the art can select different methods for constructing a whole genome sequencing library according to the specific scheme of the genome sequencing technology adopted.
  • For details on constructing the whole genome sequencing library refer to the protocol provided by the manufacturer of the sequencing instrument, such as Illumina, for example, see Illumina Corporation Multiplexing Sample Preparation Guide (Part #1005361; Feb 2010) or Paired-End SamplePrep Guide (Part #1005063; Feb 2010), which is incorporated herein by reference.
  • sequencing can be performed using at least one selected from the group consisting of Illumina Hiseq 2000, Genome Analyzer, SOLiD sequencing system, Ion Torrents Ion Proton, 454, PacBio RS sequencing system, Helicos tSMS technology, and nanopore sequencing technology.
  • Illumina Hiseq 2000 Genome Analyzer
  • SOLiD sequencing system Ion Torrents Ion Proton
  • 454 PacBio RS sequencing system
  • Helicos tSMS technology Helicos tSMS technology
  • nanopore sequencing technology nanopore sequencing technology
  • the length of the sequencing data obtained by whole genome sequencing is not particularly limited.
  • the sequencing data is 90 bp in length using Illumina Hiseq2000. Applicants have surprisingly found that when the length of the sequencing data is about 90 bp, the sequencing data can be greatly facilitated, the analysis efficiency is improved, and the cost of the analysis can be significantly reduced. The efficiency of determining whether an individual has an abnormal state is further improved, and the cost of determining whether the individual has an abnormal state is reduced.
  • sequence data refers to the average of the length values of individual sequencing data.
  • the genetic information contained in the sequencing result can be obtained by analyzing the sequencing data included in the sequencing result, for example, SNP information can be obtained.
  • the method of analyzing the sequencing data contained in the sequencing result to obtain the SNP information is not particularly limited.
  • SNP information in the obtained sequencing result can be determined by comparing the obtained sequencing data with a reference gene.
  • the reference gene used is a known human genome sequence, which may be, for example, Hgl9, NCBI Build 37.
  • the reference gene used is a known human genome sequence, which may be, for example, Hgl9, NCBI Build 37.
  • the reference gene used is a known human genome sequence, which may be, for example, Hgl9, NCBI Build 37.
  • those skilled in the art may also use other known sequences as reference sequences, for example, The known SNPs were used as a reference sequence for alignment.
  • the method and software employed for the comparison are not particularly limited.
  • the alignment between the sequencing data and the reference sequence can be performed using SOAP/SOAP2 software.
  • sequencing data may also be assembled first, and the assembly result is compared with a reference sequence. Thereby, the sequencing data contained in the sequencing result can be effectively analyzed, so that the SNP contained in the sequencing result can be effectively determined, and the efficiency of determining whether the individual has an abnormal state can be improved.
  • SOAP/SOAP2 software provides accurate alignment of short-sequence data from Illumina sequencing systems.
  • the statistical data quality value, the comparison rate, the GC content, the repetition rate, the genome coverage, the sequencing depth, and the like may be used according to the comparison result, and the sequencing data is quality-controlled according to the above information.
  • the useless data that cannot be compared or repeated is removed. Get a valid data set for subsequent analysis.
  • the obtained SNP information may also be filtered, for example, by pairing the sequencing results based on the preset filtering conditions.
  • SNPs are known for filtration.
  • the filtering conditions that can be employed are at least one of the following:
  • the SNP calling quality value is greater than 20.
  • SNP calling quality value refers to the scoring result given to the confidence of each SNP during the operation of the SOAP analysis software.
  • SNP site sequencing depth is greater than 8;
  • the depth of the SNP locus is less than 5 times the average depth of the genome
  • SNP site copy number is not greater than 2;
  • the distance between the SNP site and the nearest other SNP site is greater than 5, and the expression "distance between two SNP sites” as used herein refers to the bases between the two SNP sites.
  • the number for example, “the distance is greater than 5", that is, the number of bases separated by two sites is greater than five.
  • the obtained SNP results can be effectively filtered, so that the accuracy and efficiency of determining whether an individual has an abnormal state can be effectively improved.
  • sequencing data is obtained by sequencing, and the sequencing data is analyzed, SNP information contained in the sequencing result can be obtained. Further, the SNP can be analyzed to determine whether the individual has an abnormal state associated with the SNP, such as whether or not there is a single-gene disease associated with the SNP.
  • the type of SNP that can be employed is not particularly limited.
  • the known SNP is located in the human chromosome HBB gene region.
  • the known SNP is at least one selected from the group consisting of: rs33985472, rs63750954, rs63751128, rs33978907, rs34029390, rs34809925, rs33953406, rs33910569, rs33910569, rs33971634, rs3392539 rs33946267, rs35485099, rs36015961, rs33930977, rs35256489, rs33952266, rs33952266, rs33952266, rs33914668, rs33913413, rs33913413, rs3434
  • the SNP information of the individual can be efficiently obtained by the above method, and the desired SNP information can be extracted for the relevant gene of the specific disease to be studied, and the typing result of the target gene can be obtained.
  • the disease annotation can be performed by comparing with the existing database, and whether the corresponding SNP variation information causes the disease can be determined, and the embryo or individual corresponding to the single cell or the micro sample can be judged.
  • the corresponding embryo or individual is judged to be a carrier of the Mendelian genetic disease gene.
  • the present invention provides a system 1000 for determining whether an individual has an abnormal condition.
  • the system 1000 includes: a sequencing library construction device 100, a sequencing device 200,
  • the SNP determining device 300 and the abnormal state determining device 400 are identical to each other.
  • the sequencing library construction device 100 is configured to construct a sequencing library for a nucleic acid sample of an individual.
  • the sequencing device 200 is coupled to the sequencing library construction device 100, and thus, can be used to sequence the constructed sequencing library to obtain sequencing results composed of a plurality of sequencing data.
  • SNP determining device 300 is coupled to the sequencing device for determining a known SNP contained in the sequencing result based on the obtained sequencing data.
  • the abnormal state determining device 400 is connected to the SNP determining device, thereby for determining whether the individual has the known SNP based on the previously determined known SNPs included in the sequencing result. Related exception status.
  • the method of determining whether an individual has an abnormal state described above can be effectively performed using a system for determining whether an individual has an abnormal state according to an embodiment of the present invention, thereby efficiently determining the inclusion in the nucleic acid sample by sequencing.
  • SNPs SNPs, and since these SNPs are related to abnormal states, it is thereby possible to effectively determine whether the source individuals of the nucleic acid samples have abnormal states associated with these SNPs.
  • the system may further comprise a nucleic acid sample extraction device 101.
  • the nucleic acid sample extraction device 101 is adapted to extract at least a portion of the individual's whole genome DNA from a single cell or a micro sample of the individual.
  • the system 1000 can further comprise at least one single cell suitable for isolation from an individual selected from the group consisting of blood, tissue, urine, gametes, fertilized eggs, blastomeres, and embryos. A biological sample separation device for micro-samples.
  • the efficiency of determining whether an individual has an abnormal state can be improved by effectively predicting and evaluating whether an individual has an abnormal state by a small sample from an individual, and the cost of determining whether the individual has an abnormal state is reduced.
  • these samples can be easily obtained from organisms, and can be specifically sampled for certain diseases to take specific analytical measures for certain specific diseases.
  • the biological sample separation device is adapted to separate at least one single cell selected from the group consisting of a dilution method, a mouth pipette separation method, a micromanipulation, a microdissection, a flow cytometry, and a microfluidic control. Or a small sample.
  • the above-described method of determining whether an individual has an abnormal state can be effectively implemented using a system for determining whether an individual has an abnormal state according to an embodiment of the present invention.
  • the single cells of the biological sample can be obtained efficiently and conveniently, so that the subsequent operations can be performed, whereby the single cells can be effectively separated from the individual, thereby further improving the efficiency of subsequently determining whether the individual has an abnormal state.
  • the sequencing library construction device may further comprise a nucleic acid sample amplification unit adapted to amplify the nucleic acid sample to obtain a nucleic acid sample amplification product.
  • the amplification unit is adapted to perform at least one selected from the group consisting of PEP-PCR, DOP-PCR, OmniPlex WGA, and MDA.
  • the efficiency of amplifying whole genome DNA can be further improved, thereby further improving the efficiency of subsequently determining whether an individual has an abnormal state.
  • the above-described method of determining whether an individual has an abnormal state can be effectively implemented using a system for determining whether an individual has an abnormal state according to an embodiment of the present invention.
  • the efficiency of amplifying whole genome DNA can be further improved, thereby further improving the efficiency of subsequently determining whether an individual has an abnormal state.
  • the operation mode of the "nucleic acid sample extraction device” is not particularly limited as long as the relevant nucleic acid sample can be obtained and the obtained nucleic acid sample is suitable for subsequent operations, for example, a whole genome from a single cell or a micro sample.
  • DNA can be released by lysing single cells with lysate and collecting single-cell whole genome DNA.
  • the sequencing library construction device may further include a screening unit.
  • a nucleic acid probe is disposed in the screening unit to screen the nucleic acid sample amplification product with a nucleic acid probe to obtain a nucleic acid sample amplification product from a predetermined region; and to amplify a product for a nucleic acid sample from a predetermined region,
  • the sequencing library was constructed. Thereby, the known SNPs included in the predetermined area can be effectively determined, whereby the efficiency of determining the abnormal state related to the SNPs according to the predetermined SNPs included in the predetermined area such as the exon area can be effectively improved. And accuracy.
  • the nucleic acid probe may be provided in the form of a chip.
  • the efficiency of screening by using a nucleic acid probe can be further improved, thereby further improving the efficiency of subsequently determining whether an individual has an abnormal state.
  • the sequencing device is selected from Illumina Hiseq2000, Genome Analyzer. ⁇ At least one of a SOLiD sequencing system, an Ion Torrent, an Ion Proton, 454, a PacBio RS sequencing system, a Helicos tSMS sequencing device, and a nanopore sequencing device. Thereby, it is possible to utilize the high of these sequencing devices The characteristics of flux and deep sequencing further improve the efficiency of determining whether an individual has an abnormal state.
  • the sequencing is performed using an Illumina Hiseq 2000, the sequencing data being 90 bp in length.
  • the efficiency of subsequent SNP analysis can be further improved, thereby improving the efficiency of determining whether an individual has an abnormal state.
  • the SNP determining apparatus further includes: a comparing unit configured to determine a known SNP included in the sequencing result by comparing the sequencing data with a reference gene .
  • the comparison unit is adapted to perform alignment using SOAP/SOAP2 software. Thereby, the sequencing data included in the sequencing result can be effectively analyzed, so that the SNP included in the sequencing result can be effectively determined, and the efficiency of determining whether the individual has an abnormal state can be improved.
  • the SNP determining apparatus may further include: a SNP filtering unit, wherein the SNP filtering unit is adapted to filter the known SNPs included in the sequencing result based on the following filtering conditions: SNP calling quality The value is greater than 20; the sequencing depth of the SNP site is greater than 8; the depth of the SNP site is less than 5 times the average depth of the genome; the copy number of the SNP site is not greater than 2; and the distance between the SNP site and the nearest other SNP site is greater than 5 .
  • SNP calling quality The value is greater than 20; the sequencing depth of the SNP site is greater than 8; the depth of the SNP site is less than 5 times the average depth of the genome; the copy number of the SNP site is not greater than 2; and the distance between the SNP site and the nearest other SNP site is greater than 5 .
  • the known SNP is located in the human chromosome HBB gene region.
  • the known SNP is at least one selected from the group consisting of: rs33985472, rs63750954, rs63751128, rs33978907, rs34029390, rs34809925, rs33953406, rs33910569, rs33910569, rs33971634, rs3392539 rs33946267, rs35485099, rs36015961, rs33930977, rs35256489, rs33952266, rs33952266, rs33952266, rs33914668, rs33913413, rs33913413, rs34483965, rs34793594, rs35703285, rs
  • the system for determining whether an individual has an abnormal state can effectively implement the foregoing A method of determining whether an individual has an abnormal state. Therefore, it can be effectively judged whether the subject has suffering from thalassemia, especially ⁇ -thalassemia.
  • the term "connected,” as used herein, is to be understood broadly and can be either directly connected or indirectly connected, even using the same container or device, as long as functional linkages are possible, such as nucleic acid sample extraction and
  • the nucleic acid sample amplification can be carried out in the same apparatus, that is, after the nucleic acid sample is extracted, the nucleic acid sample amplification processing can be performed in the same apparatus or container, and the extracted nucleic acid sample set does not need to be transported to other ones.
  • Equipment or container as long as the conditions within the device (including the reaction conditions and the composition of the reaction system) are converted to be suitable for nucleic acid sample amplification reaction, thus achieving the functional connection between nucleic acid sample extraction and nucleic acid sample amplification , can also be considered to be covered by the term "connected".
  • single-cell disease is detected on a single cell or a microsample using a method comprising the following steps:
  • S3 sequencing sample preparation (library preparation);
  • S7 SNP extraction and filtration to obtain the typing result of the target gene
  • IVF-PGD in vitro fertilization-embryo preimplantation genetic diagnosis
  • sperm and eggs are fertilized in vitro and cultured in vitro to the third day to form blastomeres at 5-8 cell stage.
  • a conventional biopsy was performed. Under the micromanipulator, a blastomere single cell was taken out, placed in a PCR tube containing the lysate, and stored at -80 °C. The blastomere after biopsy continues to culture until the fifth day, reaching the embryonic stage, can be vitrified, or used directly for implantation.
  • Single-cell whole-genome amplification was performed using Qiagen's REPLI-g Mini Kit kit according to the manufacturer's protocol.
  • the blastomere single cells were first subjected to alkaline lysis, and then the amplification reaction solution was added for constant temperature expansion at 30 °C. increase.
  • a DNA sequencing library was constructed using the Illumina Paired-End DNA Sample Prep Kit according to the manufacturer's protocol. Three libraries were prepared for the blastomere single-cell whole genome amplification products obtained in the previous section. The expected inserts of the library were 200 bp, 350 bp, and 500 bp, respectively. The actual insert size is shown in Table 1.
  • High throughput sequencing was performed using the Illumina Hiseq 2000 sequencing system.
  • a strategy for whole genome sequencing of blastomere single cells is employed.
  • the well-prepared library of blastomere single-cell amplification products was prepared by cBot, and then run on the Hiseq2000 sequencer.
  • the sequencing length was 90 bp, and the Pair End was sequenced in two directions. One lane was measured for each library, and three lanes were measured.
  • Sequencing data were aligned using SOAP2 software and referenced to the human reference genome sequence (Hgl9; NCBI Build37), allowing up to two base mismatches when aligned.
  • the raw data quality, GC content, actual insert fragment size, alignment rate, repetition rate, and genomic coverage and sequencing depth were calculated based on the alignment results. See Table 1 for specific information.
  • the quality of the sequencing data is controlled by these statistical results. In this example, we obtained a total of 38.25X of the whole genome, and the statistical results of the data were all up to standard. After that, the data of the upper reference genome and the repeated data are not removed, and a valid data set is obtained for SNP analysis.
  • Sample PE reads PE Unique Coverage Mean Depth Duplication Rate
  • BLSl is the number of single-cell samples of blastomeres
  • BLSl(total) is the statistical result of the combination of three lane data of BLSl sample
  • Q20 (%) is the ratio of the data of quality value above 20 to the total amount of data.
  • GC (%) indicates the actual GC content percentage of the sequencing data
  • Insert Size indicates the actual library insert size of the sequencing data
  • Clean Reads indicates the amount of read data remaining after the low-quality read is removed
  • PE-alignment indicates both ends of the Pair End. Both can read the ratio of the read data of the reference genome to the total data volume
  • PE reads that both ends of the Pair End can compare the read data of the reference genome
  • PE Unique means that both ends of the Pair End can be uniquely aligned to the reference genome.
  • the amount of read data Coverage represents the coverage of the whole genome
  • Mean Depth represents the average depth of the whole genome
  • Duplication Rate represents the ratio of repeated read data to total read data.
  • This embodiment uses the SOAPsnp software to perform SNP Calling on the valid data sets obtained above according to the instructions of SOAPsnp (see http://soa.genomics.org.cn/soapsn, html, which is incorporated herein by reference). Finally, the SNP data set is obtained.
  • the SNP data set obtained above is filtered, and the filtering conditions are as follows:
  • SNP calling quality value is greater than 20;
  • the sequencing depth of the site is greater than 8.
  • the depth of the site is less than 5 times the average depth of the genome
  • the copy number of the site is not more than 2;
  • the distance between the SNP and the nearest SNP is greater than 5.
  • the filtered SNP data extracts SNPs located in the target gene region for the disease-related genes to be detected.
  • the gene for ⁇ -thalassemia is detected to detect whether the corresponding blastomere embryo has ⁇ -sea aquaemia, or whether it is a carrier of the ⁇ thalassemia disease gene. Therefore, in this example, the SNP locus located in the ⁇ gene region of chromosome 11 was extracted, and the specific information is shown in Table 2.
  • ⁇ Chromosome indicates the chromosome number
  • Locus indicates the site number of the base corresponding to the SNP on the chromosome
  • Ref indicates the base type of the corresponding site on the human reference genome in the database
  • Blastomere indicates the corresponding SNP position in the blastomere single cell data.
  • the type information of the point Mutation indicates the mutation type of the corresponding site existing in the database
  • the SNP ID indicates the ID number of the SNP site in the database
  • Gene indicates which gene region the SNP site is located in.
  • the disease annotation is further made based on the SNP information of the target gene filtered and extracted as described above.
  • heterozygous SNP variants were found in the 5247141 and 5247791 bases of chromosome 11, respectively, which are located in the HBB gene region. Homozygous mutations at these two sites can cause beta thalassemia.
  • the two sites are heterozygous mutations, indicating that the blastomeres corresponding to the single cells are carriers of the ⁇ -thalassemia pathogenic gene.
  • the blood sample of this example is derived from a human individual with a normal phenotype. A small amount of blood sample was taken and centrifuged to separate the leukocyte layer. The leukocytes were washed with PBS, suspended in PBS droplets, and the individual leukocytes were separated by a mouth pipe, placed in 1-2 ⁇ l of alkaline cell lysate, and frozen at -80 ° C for more than 30 min.
  • Qiagen's REPLI-g Mini Kit kit was used. According to the manufacturer's protocol, blood single cells were subjected to alkaline lysis after treatment at 65 °C for 10 minutes, and then amplified reaction solution was added for constant temperature amplification at 30 °C.
  • the Agilent chip used in this example targets a target area of 2.1 M in size, including all exon regions of one hundred single gene disease-associated genes.
  • the captured and constructed sequencing libraries were subjected to high throughput sequencing.
  • This example uses the Illumina Hiseq 2000 sequencing system for high throughput sequencing.
  • a strategy for chip capture sequencing of blood single cells is adopted, and the target region is all exon regions of one hundred single gene disease-related genes, about 2.1 M in size.
  • a well-prepared chip capture library prepared by blood single-cell amplification products was prepared by cBot, and then run on a Hiseq2000 sequencer.
  • the sequencing length was 90 bp, and Pair End was bidirectionally sequenced.
  • the amount of sequencing data was expected to be 1 to 2 G bases.
  • Sequencing data were aligned using SOAP2 software and referenced to the human reference genome sequence (Hgl9; NCBI Build37), allowing up to two base mismatches when aligned.
  • the raw data quality, GC content, alignment rate, repetition rate, and genomic coverage and sequencing depth were calculated based on the alignment results.
  • the specific information is shown in Table 3.
  • the quality of the measured data is controlled by these statistical results. In this embodiment, we obtain a total amount of data of 457X in the target area, and the statistical results of the data can reach the standard. After that, the data of the upper reference genome and the repeated data are not removed, and a valid data set is obtained for SNP analysis.
  • GC (%) indicates the actual GC content percentage of the sequencing data
  • Reads indicates the amount of read data remaining after the read of the low quality value is removed
  • Production indicates the amount of base data calculated according to the Reads value
  • PE-alignment indicates the both ends of the Pair End Both can compare the ratio of the read data of the reference genome to the total data volume
  • PE Unique indicates that the pair of Pair ends can uniquely compare the read data to the reference genome
  • Coverage indicates the coverage of the whole genome
  • Mean Depth indicates the whole genome.
  • Duplication Rate indicates the ratio of the repeated read data to the total read data
  • the specificity (Reads) indicates the ratio of the read data amount of the target area to the total read data.
  • the specificity (Bases) indicates the specific alignment. The ratio of the amount of base data in the target area to the total amount of base data.
  • This embodiment uses the SOAPsnp software according to the instructions of SOAPsnp (see http://soap.genoniics, org m''soapsnp.htrnl, which is incorporated herein by reference) for the valid data set obtained above. Perform SNP Calling and finally obtain the SNP data set.
  • the SNP data set obtained above is filtered, and the filtering conditions are as follows:
  • SNP calling quality value is greater than 20;
  • the sequencing depth of the site is greater than 8.
  • the depth of the site is less than 5 times the average depth of the target region
  • the copy number of the site is not more than 2;
  • the distance between the SNP and the nearest SNP is greater than 5.
  • the filtered SNP data extracts SNPs located in the target gene region for the disease-related genes to be detected.
  • the gene for ⁇ -thalassemia is detected to detect whether the corresponding blastomere embryo has ⁇ -sea aquaemia, or whether it is a carrier of the ⁇ thalassemia disease gene. Therefore, in this example, the SNP locus located in the ⁇ gene region of chromosome 11 was extracted, and the specific information is shown in Table 4.
  • Chromosome indicates the chromosome number
  • Locus indicates the site number of the base corresponding to the SNP on the chromosome
  • Ref indicates the base type of the corresponding site on the human reference genome in the database
  • Blood Cell indicates the corresponding SNP site in the blood single cell data.
  • the type information, Mutation indicates the type of mutation of the corresponding site existing in the database
  • the SNP ID indicates the ID number of the SNP site in the database
  • Gene indicates which gene region the SNP site is located in.
  • the SNP information of the target gene filtered and extracted above is further subjected to disease annotation.
  • no heterozygous or homozygous mutation site was detected in the HBB gene region, indicating that the individual corresponding to the blood single cell in the present example is not a beta thalassemia disease patient, nor is it a beta thalassemia disease gene carrier.
  • the beta thalassemia disease gene region is a normal genotype.
  • embodiments of the present invention enable a method for detecting Mendelian genetic diseases (single-gene diseases) on single-cell or micro-samples based on high-throughput sequencing.
  • the invention can be effectively applied to the construction and sequencing of a DNA sequencing library of sample DNA, and the obtained library has good quality and accurate sequencing results.

Abstract

Provided are a method and a system for determining whether an individual is in an abnormal state. The method for determining whether the individual is in the abnormal state comprises: establishing a sequencing library for a nucleic acid sample of the individual; sequencing the sequencing library to acquire a sequencing result, wherein the sequencing result is formed by multiple pieces of sequencing data; determining, based on the sequencing data, a known SNP comprised in the sequencing result; and determining, based on the known SNP, whether the individual is in an abnormal state related to the known SNP.

Description

确定个体是否患有异常状态的方法及系统  Method and system for determining whether an individual has an abnormal state
优先权信息 Priority information
 No
技术领域 Technical field
本发明涉及生物医学领域, 具体而言, 涉及确定个体是否患有异常状态的方法及系 统。  The present invention relates to the field of biomedicine and, in particular, to methods and systems for determining whether an individual has an abnormal state.
背景技术 Background technique
孟德尔遗传病又称单基因疾病(在本文中, 孟德尔遗传病和单基因疾病是可以互换 使用的) , 根据遗传方式可分为常染色体显性、 常染色体隐性、 伴性显性、 伴性隐性遗 传病等。 截止至 2012年 5月, 在线人类孟德尔遗传数据库 (OMIM)中已经收录的单基因 疾病相关基因一共有 21 ,250种, 其中疾病表型描述准确且其遗传机理比较清楚的大约 有 3,500种左右。  Mendelian genetic disease, also known as single-gene disease (in this paper, Mendelian genetic disease and single-gene disease are used interchangeably), according to genetic methods can be divided into autosomal dominant, autosomal recessive, with sexual dominant , with concealed recessive genetic diseases. As of May 2012, there are 21,250 single-gene disease-related genes already included in the online human Mendelian Genetic Database (OMIM), of which about 3,500 are characterized by accurate disease phenotypes and clear genetic mechanisms. .
然而, 目前对单基因疾病的研究仍有待改进。  However, current research on single-gene diseases remains to be improved.
发明内容 Summary of the invention
本发明旨在至少解决现有技术中存在的技术问题之一。 为此, 本发明提出了一种确 定个体是否患有异常状态的方法及系统。  The present invention aims to solve at least one of the technical problems existing in the prior art. To this end, the present invention proposes a method and system for determining whether an individual has an abnormal state.
在本发明的一个方面, 本发明提出了一种确定个体是否患有异常状态的方法。 根据本 发明的实施例, 该方法包括: 针对所述个体的核酸样本, 构建测序文库; 对所述测序文库 进行测序, 以便获得测序结果, 所述测序结果由多个测序数据构成; 基于所述测序数据, 确定所述测序结果中所包含的已知 SNP; 以及基于所述已知 SNP, 确定所述个体是否患有 与所述已知 SNP相关的异常状态。 利用该方法, 可以有效地通过测序确定核酸样本中所包 含的 SNP, 并且由于这些 SNP与异常状态是相关的, 由此, 可以有效地确定核酸样本的来 源个体是否患有与这些 SNP相关的异常状态。  In one aspect of the invention, the invention proposes a method of determining whether an individual has an abnormal state. According to an embodiment of the invention, the method comprises: constructing a sequencing library for the nucleic acid sample of the individual; sequencing the sequencing library to obtain a sequencing result, the sequencing result consisting of a plurality of sequencing data; Sequencing the data, determining a known SNP contained in the sequencing result; and determining whether the individual has an abnormal state associated with the known SNP based on the known SNP. With this method, the SNPs contained in the nucleic acid samples can be efficiently determined by sequencing, and since these SNPs are related to the abnormal state, it is possible to effectively determine whether the source individuals of the nucleic acid samples have abnormalities associated with these SNPs. status.
根据本发明的实施例, 该确定个体是否患异常状态的方法还可以具有下列附加技术特 征:  According to an embodiment of the invention, the method of determining whether an individual has an abnormal state may also have the following additional technical features:
在本发明的一个实施例中, 所述个体为人。 由此, 可以利用根据本发明实施例的确定 个体是否患有异常状态的方法, 对人体样本进行检测, 从而可以有效地对人是否患有某些 异常状态进行预测。  In one embodiment of the invention, the individual is a human. Thus, the human body sample can be detected by the method of determining whether the individual has an abnormal state according to an embodiment of the present invention, so that it is possible to effectively predict whether the person has some abnormal state.
在本发明的一个实施例中, 所述异常状态为疾病。 优选, 所述疾病为单基因疾病。 才艮 据本发明的具体示例, 所述单基因疾病为选自地中海贫血症和红细胞葡萄糖 -6-磷酸脱氢酶 缺陷症的至少一种。 在本发明的一个实施例中, 所述地中海贫血症为 β-地中海贫血症。 由 此, 可以有效地对人类的疾病尤其是单基因疾病的患病风险进行有效地预测和评估。 In one embodiment of the invention, the abnormal state is a disease. Preferably, the disease is a monogenic disease. Talent According to a specific example of the present invention, the monogenic disease is at least one selected from the group consisting of thalassemia and erythrocyte glucose-6-phosphate dehydrogenase deficiency. In one embodiment of the invention, the thalassemia is beta-thalassemia. Thereby, the risk of disease of human diseases, especially single-gene diseases, can be effectively predicted and evaluated.
在本发明的一个实施例中,所述核酸样本为个体的全基因组 DNA的至少一部分。由此, 可以进一步提高确定个体是否患有异常状态的方法的效率和准确性。  In one embodiment of the invention, the nucleic acid sample is at least a portion of an individual's whole genome DNA. Thereby, the efficiency and accuracy of the method of determining whether an individual has an abnormal state can be further improved.
在本发明的一个实施例中, 所述核酸样本是从个体的单细胞或者微量样本中提取的。 在本发明的一个实施例中, 所述单细胞或者微量样本是从个体的选自血液、 组织、 尿液、 配子、 受精卵、 卵裂球和胚胎的至少一种分离的。 由此, 可以通过对来自个体的少量样本 对个体是否患有异常状态进行有效地预测和评估, 从而提高了确定个体是否患有异常状态 的效率, 降低了确定个体是否患有异常状态的成本。 另外, 可以方便地从生物体获取这些 样本, 并且能够具体地针对某些疾病采取不同的样本, 从而针对某些特殊疾病采取特定的 分析手段。  In one embodiment of the invention, the nucleic acid sample is extracted from a single cell or a microsample of the individual. In one embodiment of the invention, the single or minimal sample is isolated from at least one member selected from the group consisting of blood, tissue, urine, gametes, fertilized eggs, blastomeres, and embryos. Thus, it is possible to effectively predict and evaluate whether an individual has an abnormal state by a small sample from an individual, thereby improving the efficiency of determining whether an individual has an abnormal state, and reducing the cost of determining whether an individual has an abnormal state. In addition, these samples can be easily obtained from organisms and can be specifically sampled for certain diseases to take specific analytical measures for specific diseases.
在本发明的一个实施例中, 所述单细胞是通过选自稀释法、 口吸管分离法、 显微操作、 显微切割、 流式细胞分离术、 流控的至少一种分离的。 由此, 能够有效便捷地获得生物 样本的单细胞, 以便实施后续操作, 由此, 可以有效地从个体中分离单细胞, 从而进一步 提高了后续确定个体是否患有异常状态的效率。  In one embodiment of the invention, the single cells are isolated by at least one selected from the group consisting of a dilution method, a mouth pipette separation method, micromanipulation, microdissection, flow cytometry, and flow control. Thereby, the single cells of the biological sample can be obtained efficiently and conveniently, so that the subsequent operations can be performed, whereby the single cells can be effectively separated from the individual, thereby further improving the efficiency of subsequently determining whether the individual has an abnormal state.
在本发明的一个实施例中, 针对所述个体的核酸样本, 构建测序文库进一步包括: 对 所述核酸样本进行扩增, 以便得到核酸样本扩增产物; 以及针对所述核酸样本扩增产物, 构建所述测序文库。 由此, 可以提高构建测序文库的效率, 从而进一步提高了后续确定个 体是否患有异常状态的效率。  In one embodiment of the invention, constructing the sequencing library for the nucleic acid sample of the individual further comprises: amplifying the nucleic acid sample to obtain a nucleic acid sample amplification product; and amplifying the product for the nucleic acid sample, The sequencing library was constructed. Thereby, the efficiency of constructing the sequencing library can be improved, thereby further improving the efficiency of subsequent determination of whether or not the individual has an abnormal state.
在本发明的一个实施例中, 所述核酸样本为从个体的单细胞提取的全基因组 DNA, 例 如可以为通过裂解个体的单细胞而释放的全基基因组 DNA,其中,针对所述全基因组 DNA 进行扩增是通过选自 PEP-PCR、 DOP-PCR、 OmniPlex WGA和 MDA的至少一种进行的。 由此, 可以进一步提高扩增全基因组 DNA的效率, 从而进一步提高了后续确定个体是否患 有异常状态的效率。  In one embodiment of the invention, the nucleic acid sample is whole genome DNA extracted from a single cell of an individual, for example, may be a whole-base genomic DNA released by lysing a single cell of an individual, wherein the whole genome DNA is Amplification is carried out by at least one selected from the group consisting of PEP-PCR, DOP-PCR, OmniPlex WGA and MDA. Thereby, the efficiency of amplifying whole genome DNA can be further improved, thereby further improving the efficiency of subsequently determining whether an individual has an abnormal state.
在本发明的一个实施例中, 在针对所述核酸样本扩增产物, 构建所述测序文库之前, 进一步包括: 利用核酸探针对所述核酸样本扩增产物进行筛选, 以便获得来自预定区域的 核酸样本扩增产物; 以及针对所述来自预定区域的核酸样本扩增产物, 构建所述测序文库。 在本发明的一个实施例中, 所述预定区域为至少一个外显子区域。 由此, 可以有效地确定 预定区域所包含的已知 SNP, 由此, 可以有效地提高根据预定区域例如外显子区域中所包 含的已知 SNP, 来确定与这些 SNP有关的异常状态的效率和准确性。 在本发明的一个实施例中, 所述核酸探针是以芯片的形式提供的。 由此, 可以进一步 提高利用核酸探针进行筛选的效率, 从而进一步提高了后续确定个体是否患有异常状态的 效率。 In one embodiment of the present invention, before constructing the sequencing library for the nucleic acid sample amplification product, the method further comprises: screening the nucleic acid sample amplification product by using a nucleic acid probe to obtain a predetermined region. Amplification product of the nucleic acid sample; and constructing the sequencing library for the nucleic acid sample amplification product from the predetermined region. In an embodiment of the invention, the predetermined area is at least one exon area. Thereby, the known SNPs included in the predetermined area can be effectively determined, whereby the efficiency of determining the abnormal state related to the SNPs according to the predetermined SNPs included in the predetermined area such as the exon area can be effectively improved. And accuracy. In one embodiment of the invention, the nucleic acid probe is provided in the form of a chip. Thereby, the efficiency of screening using the nucleic acid probe can be further improved, thereby further improving the efficiency of subsequently determining whether the individual has an abnormal state.
在本发明的一个实施例中,所述测序是利用选自 Illumina Hiseq2000、 Genome Analyzer、 SOLiD测序系统、 Ion Torrents Ion Proton、 454、 PacBio RS测序系统、 Helicos tSMS技术以 及纳米孔测序技术的至少一种进行的。 由此, 能够利用这些测序装置的高通量、 深度测序 的特点, 进一步提高了确定个体是否患有异常状态的效率。 在本发明的一个实施例中, 所 述测序是利用 Illumina Hiseq2000进行的,所述测序数据的长度为 90bp。 由此, 可以进一步 提高后续进行 SNP分析的效率, 从而提高了确定个体是否患有异常状态的效率。  In one embodiment of the invention, the sequencing is performed using at least one selected from the group consisting of Illumina Hiseq 2000, Genome Analyzer, SOLiD sequencing system, Ion Torrents Ion Proton, 454, PacBio RS sequencing system, Helicos tSMS technology, and nanopore sequencing technology. ongoing. Thereby, the high-throughput, deep-sequencing characteristics of these sequencing devices can be utilized to further improve the efficiency of determining whether an individual has an abnormal state. In one embodiment of the invention, the sequencing is performed using Illumina Hiseq 2000, which is 90 bp in length. Thereby, the efficiency of subsequent SNP analysis can be further improved, thereby improving the efficiency of determining whether an individual has an abnormal state.
在本发明的一个实施例中, 所述基于所述测序数据, 确定所述测序结果中所包含的已 知 SNP, 是通过将所述测序数据与参考基因进行比对进行的。 在本发明的一个实施例中, 所述参考基因是已知的人基因组序列。在本发明的一个实施例中,所述比对是 SOAP/SOAP2 软件进行的。 由此, 可以有效地对测序结果中所包含的测序数据进行分析, 从而可以有效 地确定测序结果中所包含的 SNP, 进而可以提高确定个体是否患有异常状态的效率。  In one embodiment of the present invention, the determining the known SNPs included in the sequencing result based on the sequencing data is performed by comparing the sequencing data with a reference gene. In one embodiment of the invention, the reference gene is a known human genomic sequence. In one embodiment of the invention, the alignment is performed by SOAP/SOAP2 software. Thereby, the sequencing data included in the sequencing result can be effectively analyzed, so that the SNP contained in the sequencing result can be effectively determined, and the efficiency of determining whether the individual has an abnormal state can be improved.
在本发明的一个实施例中,进一步包括对所述测序结果中所包含的已知 SNP进行过滤, 所述过滤基于如下过滤条件: SNP calling质量值大于 20; SNP位点测序深度大于 8; SNP 位点深度小于基因组平均深度的 5倍; SNP位点拷贝数不大于 2; 和 SNP位点与最近的其 他 SNP位点之间的距离大于 5。 由此, 可以有效地对所得到的 SNP结果进行过滤, 从而可 以有效地提高确定个体是否患有异常状态的准确性和效率。  In an embodiment of the present invention, the method further comprises filtering the known SNPs included in the sequencing result, the filtering is based on the following filtering conditions: SNP calling quality value is greater than 20; SNP site sequencing depth is greater than 8; SNP The depth of the locus is less than 5 times the average depth of the genome; the copy number of the SNP locus is no more than 2; and the distance between the SNP locus and the nearest other SNP loci is greater than 5. Thereby, the obtained SNP results can be effectively filtered, so that the accuracy and efficiency of determining whether an individual has an abnormal state can be effectively improved.
在本发明的一个实施例中,所述已知 SNP位于人染色体 HBB基因区。在本发明的一个 实施例中, 听述已知 SNP 为选自下列的至少一种: rs33985472、 rs63750954、 rs63751128、 rs33978907、 rs34029390、 rs34809925、 rs33953406、 rs33910569、 rs33910569、 rs33971634、 rs3392539 rs33946267、 rs35485099、 rs36015961、 rs33930977、 rs35256489、 rs33952266、 rs33952266、 rs33952266、 rs33914668、 rs33914668、 rs33913413, rs33913413, rs34483965、 rs34793594、 rs35703285、 rs63750433、 rs34690599、 rs63751175、 rs35328027、 rsl609812、 rs34451549、 rs7480526、 rsl0768683、 rs35099082、 rs63750283、 rs63750283、 rs33945777、 rs33945777、 rs33945777、 rs33933298, rs33913712、 rs33995148、 rs33931779、 rs33969400、 rs33922842、 rs 11549407、 rs33974936、 rs33991059、 rs33982568、 rs33948578、 rsll35071、 rs3394300 rs3394300 rs63750513、 rs34527846、 rs35456885、 rs35004220、 rs63750195、 rs35724775、 rs33915217、 rs33915217、 rs33915217、 rs33956879、 rs33956879、 rs33956879、 rs33971440、 rs33971440、 rs33960103、 rs33960103、 rs35684407、 rs35578002、 rs33916412、 rs35424040、 rs33950507、 rs33950507、 rs33951465、 rs33959855、 rs33972047、 rs33986703、 rs3471601 rs63750783、 rs35799536、 rs33930702、 rs33930702、 rs33930702、 rs33941849、 rs33941849、 rs33941849、 rs34563000、 rs34135787、 rs34704828、 rs63750628、 rs34305195 以及位于人染色体 11上 5246716位、 5246879位、 5247026位、 5248161位的 SNP。 由此, 可以有效地确定人体是否携带 HBB基因区的 SNP,从而可以有效地确定所研究对象是否患 有地中海贫血症尤其是 β-地中海贫血症的风险。 In one embodiment of the invention, the known SNP is located in the human chromosome HBB gene region. In one embodiment of the invention, the known SNP is said to be at least one selected from the group consisting of: rs33985472, rs63750954, rs63751128, rs33978907, rs34029390, rs34809925, rs33953406, rs33910569, rs33910569, rs33971634, rs3392539 rs33946267, rs35485099, rs36015961, rs33930977, rs35256489, rs33952266, rs33952266, rs33952266, rs33914668, rs33914668, rs33913413, rs33913413, rs34483965, rs34793594, rs35703285, rs63750433, rs34690599, rs63751175, rs35328027, rsl609812, rs34451549, rs7480526, rsl0768683, rs35099082, rs63750283, rs63750283, rs33945777, rs33945777, rs33945777, rs33933298, rs33913712, rs33995148, rs33931779, rs33969400, rs33922842, rs 11549407, rs33974936, rs33991059, rs33982568, rs33948578, rsll35071, rs3394300 rs3394300 rs63750513, rs34527846, rs35456885, rs35004220, rs63750195, rs35724775, rs33915217, rs33915217, rs33915217, rs33956879, rs33956879 , rs33956879, rs33971440 rs33971440, rs33960103, rs33960103, rs35684407, rs35578002, rs33916412, rs35424040, rs33950507, rs33950507, rs33951465, rs33959855, rs33972047, rs33986703, Rs3471601 rs63750783, rs35799536, rs33930702, rs33930702, rs33930702, rs33941849, rs33941849, rs33941849, rs34563000, rs34135787, rs34704828, rs63750628, rs34305195 and SNPs located on human chromosome 11 at positions 5246716, 5246879, 5247026, 5248161. Thereby, it is possible to effectively determine whether the human body carries the SNP of the HBB gene region, thereby being able to effectively determine whether the subject has a risk of thalassemia, especially β-thalassemia.
在本发明的另一方面, 本发明提出了一种确定个体是否患有异常状态的系统。 根据本 发明的实施例, 该系统包括: 测序文库构建装置, 所述测序文库构建装置用于针对所述个 体的核酸样本, 构建测序文库; 测序装置, 所述测序装置与所述测序文库构建装置相连, 用于对所述测序文库进行测序, 以便获得测序结果, 所述测序结果由多个测序数据构成; SNP确定装置, 所述 SNP确定装置与所述测序装置相连, 用于基于所述测序数据, 确定所 述测序结果中所包含的已知 SNP; 以及异常状态确定装置, 所述异常装置确定装置与所述 SNP确定装置相连, 用于基于所述已知 SNP, 确定所述个体是否患有与所述已知 SNP相关 的异常状态。 利用根据本发明实施例的确定个体是否患有异常状态的系统可以有效地实施 前面所述的确定个体是否患有异常状态的方法, 从而以有效地通过测序确定核酸样本中所 包含的 SNP, 并且由于这些 SNP与异常状态是相关的, 由此, 可以有效地确定核酸样本的 来源个体是否患有与这些 SNP相关的异常状态。  In another aspect of the invention, the invention proposes a system for determining whether an individual has an abnormal state. According to an embodiment of the present invention, the system comprises: a sequencing library construction device for constructing a sequencing library for a nucleic acid sample of the individual; a sequencing device, the sequencing device and the sequencing library construction device Connected for sequencing the sequencing library to obtain a sequencing result, the sequencing result consisting of a plurality of sequencing data; a SNP determining device, the SNP determining device being coupled to the sequencing device for performing the sequencing based Data, determining a known SNP included in the sequencing result; and abnormal state determining means, the abnormal device determining means being connected to the SNP determining means for determining whether the individual is suffering based on the known SNP There are abnormal states associated with the known SNPs. A method of determining whether an individual has an abnormal state, as described above, using a system according to an embodiment of the present invention for determining whether an individual has an abnormal state, thereby efficiently determining a SNP contained in a nucleic acid sample by sequencing, and Since these SNPs are related to abnormal states, it is thereby possible to effectively determine whether the source individuals of the nucleic acid samples have abnormal states associated with these SNPs.
根据本发明的实施例, 该确定个体是否患有异常状态的系统还可以具有下列附加技术 特征:  According to an embodiment of the invention, the system for determining whether an individual has an abnormal state may also have the following additional technical features:
在本发明的一个实施例中, 该系统进一步包括: 核酸样本提取装置, 所述核酸样本提 取装置适于从个体的单细胞或者微量样本中提取个体的全基因组 DNA的至少一部分。利用 根据本发明实施例的确定个体是否患有异常状态的系统可以有效地实施前面所述的确定个 体是否患有异常状态的方法。  In one embodiment of the invention, the system further comprises: a nucleic acid sample extraction device adapted to extract at least a portion of the individual's whole genomic DNA from a single cell or a micro sample of the individual. The above-described method of determining whether an individual has an abnormal state can be effectively implemented using a system for determining whether an individual has an abnormal state according to an embodiment of the present invention.
在本发明的一个实施例中, 该系统进一步包括: 生物样本分离装置, 所述生物样本分 离装置适于从个体的选自血液、 组织、 尿液、 配子、 受精卵、 卵裂球和胚胎的至少一种分 离单细胞或者微量样本。 利用根据本发明实施例的确定个体是否患有异常状态的系统可以 有效地实施前面所述的确定个体是否患有异常状态的方法。 由此, 可以通过对来自个体的 少量样本对个体是否患有异常状态进行有效地预测和评估, 从而提高了确定个体是否患有 异常状态的效率, 降低了确定个体是否患有异常状态的成本。 另外, 可以方便地从生物体 获取这些样本, 并且能够具体地针对某些疾病采取不同的样本, 从而针对某些特殊疾病采 取特定的分析手段。  In one embodiment of the invention, the system further comprises: a biological sample separation device adapted to be selected from the group consisting of blood, tissue, urine, gametes, fertilized eggs, blastomeres, and embryos At least one separates a single cell or a microsample. The above-described method of determining whether an individual has an abnormal state can be effectively implemented using a system for determining whether an individual has an abnormal state according to an embodiment of the present invention. Thus, it is possible to effectively predict and evaluate whether an individual has an abnormal state by a small sample from an individual, thereby improving the efficiency of determining whether an individual has an abnormal state, and reducing the cost of determining whether an individual has an abnormal state. In addition, these samples can be easily obtained from organisms and can be specifically sampled for certain diseases to take specific analytical tools for specific diseases.
在本发明的一个实施例中, 所述生物样本分离装置适于通过选自稀释法、 口吸管分离 法、 显微操作、 显微切割、 流式细胞分离术、 微流控的至少一种分离单细胞或者微量样本。 利用根据本发明实施例的确定个体是否患有异常状态的系统可以有效地实施前面所述的确 定个体是否患有异常状态的方法。 由此, 能够有效便捷地获得生物样本的单细胞, 以便实 施后续操作, 由此, 可以有效地从个体中分离单细胞, 从而进一步提高了后续确定个体是 否患有异常状态的效率。 In an embodiment of the invention, the biological sample separation device is adapted to be separated by a method selected from a dilution method and a mouth pipette Method, micromanipulation, microdissection, flow cytometry, microfluidic separation of at least one isolated single cell or microsample. The above-described method of determining whether an individual has an abnormal state can be effectively implemented using a system for determining whether an individual has an abnormal state according to an embodiment of the present invention. Thereby, the single cells of the biological sample can be obtained efficiently and conveniently, so that the subsequent operations can be performed, whereby the single cells can be effectively separated from the individual, thereby further improving the efficiency of subsequently determining whether the individual has an abnormal state.
在本发明的一个实施例中, 所述测序文库构建装置进一步包括: 核酸样本扩增单元, 所述核酸样本扩增单元适于对所述核酸样本进行扩增, 以便得到核酸样本扩增产物。 在本 发明的一个实施例中, 所述扩增单元适于进行选自 PEP-PCR、 DOP-PCR、 OmniPlex WGA 和 MDA的至少一种。 由此, 可以进一步提高扩增全基因组 DNA的效率, 从而进一步提高 了后续确定个体是否患有异常状态的效率。 利用根据本发明实施例的确定个体是否患有异 常状态的系统可以有效地实施前面所述的确定个体是否患有异常状态的方法。 由此, 可以 进一步提高扩增全基因组 DNA的效率,从而进一步提高了后续确定个体是否患有异常状态 的效率。  In an embodiment of the present invention, the sequencing library construction device further comprises: a nucleic acid sample amplification unit, wherein the nucleic acid sample amplification unit is adapted to amplify the nucleic acid sample to obtain a nucleic acid sample amplification product. In one embodiment of the invention, the amplification unit is adapted to perform at least one selected from the group consisting of PEP-PCR, DOP-PCR, OmniPlex WGA, and MDA. Thereby, the efficiency of amplifying whole genome DNA can be further improved, thereby further improving the efficiency of subsequently determining whether an individual has an abnormal state. The above-described method of determining whether an individual has an abnormal state can be effectively implemented using a system for determining whether an individual has an abnormal state according to an embodiment of the present invention. Thereby, the efficiency of amplifying whole genome DNA can be further improved, thereby further improving the efficiency of subsequently determining whether an individual has an abnormal state.
在本发明的一个实施例中, 所述测序文库构建装置进一步包括: 歸选单元, 所述筛选 单元中设置有核酸探针, 以便利用所述核酸探针对所述核酸样本扩增产物进行筛选, 以便 获得来自预定区域的核酸样本扩增产物; 以及针对所述来自预定区域的核酸样本扩增产物, 构建所述测序文库。 由此, 可以有效地确定预定区域所包含的已知 SNP, 由此, 可以有效 地提高根据预定区域例如外显子区域中所包含的已知 SNP, 来确定与这些 SNP有关的异常 状态的效率和准确性。  In an embodiment of the present invention, the sequencing library construction device further includes: a sorting unit, wherein the screening unit is provided with a nucleic acid probe to screen the nucleic acid sample amplification product by using the nucleic acid probe And obtaining a nucleic acid sample amplification product from a predetermined region; and constructing the sequencing library for the nucleic acid sample amplification product from the predetermined region. Thereby, the known SNPs included in the predetermined area can be effectively determined, whereby the efficiency of determining the abnormal state related to the SNPs according to the predetermined SNPs included in the predetermined area such as the exon area can be effectively improved. And accuracy.
在本发明的一个实施例中, 所述核酸探针是以芯片的形式提供的。 由此, 可以进一步 提高利用核酸探针进行筛选的效率, 从而进一步提高了后续确定个体是否患有异常状态的 效率。  In one embodiment of the invention, the nucleic acid probe is provided in the form of a chip. Thereby, the efficiency of screening using the nucleic acid probe can be further improved, thereby further improving the efficiency of subsequently determining whether the individual has an abnormal state.
在本发明的一个实施例中,所述测序装置为选自 Illumina Hiseq2000、 Genome Analyzer、 SOLiD测序系统、 Ion Torrent、 Ion Proton、 454、 PacBio RS测序系统、 Helicos tSMS测序装 置以及纳米孔测序装置的至少一种。 由此, 能够利用这些测序装置的高通量、 深度测序的 特点, 进一步提高了确定个体是否患有异常状态的效率。 在本发明的一个实施例中, 所述 测序是利用 Illumina Hiseq2000进行的,所述测序数据的长度为 90bp。 由此, 可以进一步提 高后续进行 SNP分析的效率, 从而提高了确定个体是否患有异常状态的效率。  In one embodiment of the invention, the sequencing device is at least selected from the group consisting of Illumina Hiseq2000, Genome Analyzer, SOLiD sequencing system, Ion Torrent, Ion Proton, 454, PacBio RS sequencing system, Helicos tSMS sequencing device, and nanopore sequencing device. One. Thereby, it is possible to further improve the efficiency of determining whether an individual has an abnormal state by utilizing the characteristics of high-throughput and deep sequencing of these sequencing devices. In one embodiment of the invention, the sequencing is performed using Illumina Hiseq 2000, which is 90 bp in length. Thereby, the efficiency of subsequent SNP analysis can be further improved, thereby improving the efficiency of determining whether an individual has an abnormal state.
在本发明的一个实施例中, 所述 SNP确定装置进一步包括: 比对单元, 所述比对单元 用于通过将所述测序数据与参考基因进行比对确定所述测序结果中所包含的已知 SNP。 在 本发明的一个实施例中, 所述比对单元适于利用 SOAP/SOAP2软件进行比对。 由此, 可以 有效地对测序结果中所包含的测序数据进行分析, 从而可以有效地确定测序结果中所包含 的 SNP, 进而可以提高确定个体是否患有异常状态的效率。 In an embodiment of the present invention, the SNP determining apparatus further includes: a comparing unit, configured to determine, by comparing the sequencing data with a reference gene, the included in the sequencing result Know the SNP. In one embodiment of the invention, the comparison unit is adapted to perform alignment using SOAP/SOAP2 software. Thereby, the sequencing data included in the sequencing result can be effectively analyzed, thereby effectively determining the inclusion in the sequencing result. The SNP, in turn, can increase the efficiency of determining whether an individual has an abnormal state.
在本发明的一个实施例中, 所述 SNP确定装置进一步包括: SNP过滤单元, 所述 SNP 过滤单元适于基于如下过滤条件, 对所述测序结果中所包含的已知 SNP 进行过滤: SNP calling质量值大于 20; SNP位点测序深度大于 8; SNP位点深度小于基因组平均深度的 5 倍; SNP位点拷贝数不大于 2; 和 SNP位点与最近的其他 SNP位点之间的距离大于 5。 由 此, 可以有效地对所得到的 SNP结果进行过滤, 从而可以有效地提高确定个体是否患有异 常状态的准确性和效率。  In an embodiment of the present invention, the SNP determining apparatus further includes: a SNP filtering unit, wherein the SNP filtering unit is adapted to filter the known SNPs included in the sequencing result based on the following filtering conditions: SNP calling The mass value is greater than 20; the SNP site sequencing depth is greater than 8; the SNP site depth is less than 5 times the genome average depth; the SNP site copy number is not greater than 2; and the distance between the SNP site and the nearest other SNP site is greater than 5. Thereby, the obtained SNP results can be effectively filtered, so that the accuracy and efficiency of determining whether an individual has an abnormal state can be effectively improved.
在本发明的一个实施例中,所述已知 SNP位于人染色体 HBB基因区。在本发明的一个 实施例中, 所述已知 SNP 为选自下列的至少一种: rs33985472、 rs63750954、 rs63751128、 rs33978907、 rs34029390、 rs34809925、 rs33953406、 rs33910569、 rs33910569、 rs33971634、 rs3392539 rs33946267、 rs35485099、 rs36015961、 rs33930977、 rs35256489、 rs33952266、 rs33952266、 rs33952266、 rs33914668、 rs33914668、 rs33913413 , rs33913413 , rs34483965、 rs34793594、 rs35703285、 rs63750433、 rs34690599、 rs63751175、 rs35328027、 rsl609812、 rs34451549、 rs7480526、 rsl0768683、 i rs35099082、 rs63750283、 rs63750283、 rs33945777、 rs33945777、 rs33945777、 rs33933298, rs33913712、 rs33995148、 rs33931779、 rs33969400、 rs33922842、 rs 11549407、 rs33974936、 rs33991059、 rs33982568、 rs33948578、 rsl l35071、 rs3394300 rs3394300 rs63750513、 rs34527846、 rs35456885、 rs35004220、 rs63750195、 rs35724775、 rs33915217、 rs33915217、 rs33915217、 rs33956879、 rs33956879、 rs33956879、 rs33971440、 rs33971440、 rs33960103、 rs33960103、 rs35684407、 rs35578002、 rs33916412、 rs35424040、 rs33950507、 rs33950507、 rs33951465、 rs33959855、 rs33972047、 rs33986703、 rs3471601 rs63750783、 rs35799536、 rs33930702、 rs33930702、 rs33930702、 rs33941849、 rs33941849、 rs33941849、 rs34563000、 rs34135787、 rs34704828 、 rs63750628 、 rs34305195 以及位于人染色体 11上 5246716位、 5246879位、 5247026位、 5248161位的 SNP。 由此, 利用根据本发明实施例的确定个体是否患有异常状态的系统可以有效地实施前面所述的确 定个体是否患有异常状态的方法。 从而可以有效地确定所研究对象是否患有地中海贫血症 尤其是 β-地中海贫血症的风险。  In one embodiment of the invention, the known SNP is located in the human chromosome HBB gene region. In one embodiment of the present invention, the known SNP is at least one selected from the group consisting of: rs33985472, rs63750954, rs63751128, rs33978907, rs34029390, rs34809925, rs33953406, rs33910569, rs33910569, rs33971634, rs3392539 rs33946267, rs35485099, rs36015961, rs33930977, rs35256489, rs33952266, rs33952266, rs33952266, rs33914668, rs33914668, rs33913413, rs33913413, rs34483965, rs34793594, rs35703285, rs63750433, rs34690599, rs63751175, rs35328027, rsl609812, rs34451549, rs7480526, rsl0768683, i rs35099082, rs63750283, rs63750283, rs33945777, rs33945777 , rs33945777, rs33933298, rs33913712, rs33995148, rs33931779, rs33969400, rs33922842, rs 11549407, rs33974936, rs33991059, rs33982568, rs33948578, rsl l35071, rs3394300 rs3394300 rs63750513, rs34527846, rs35456885, rs35004220, rs63750195, rs35724775, rs33915217, rs33915217, rs33915217, rs33956879 , rs33956879, rs33956879, rs33 971440, rs33971440, rs33960103, rs33960103, rs35684407, rs35578002, rs33916412, rs35424040, rs33950507, rs33950507, rs33951465, rs33959855, rs33972047, rs33986703, rs3471601 rs63750783, rs35799536, rs33930702, rs33930702, rs33930702, rs33941849, rs33941849, rs33941849, rs34563000, rs34135787, rs34704828 , rs63750628, rs34305195 and SNPs located on human chromosome 11 at 5246716, 5246879, 5247026, and 5248161. Thus, the above-described method of determining whether an individual has an abnormal state can be effectively implemented using a system for determining whether an individual has an abnormal state according to an embodiment of the present invention. It is thus possible to effectively determine whether the subject has a risk of thalassemia, especially beta-thalassemia.
本发明的附加方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得 明显, 或通过本发明的实践了解到。  The additional aspects and advantages of the invention will be set forth in part in the description which follows.
附图说明 DRAWINGS
本发明的上述和 /或附加的方面和优点从结合下面附图对实施例的描述中将变得明 显和容易理解, 其中: 图 1是根据本发明一个实施例的确定个体是否患有异常状态的方法的流程示意图; 图 2是根据本发明另一个实施例的确定个体是否患有异常状态的方法的流程示意图; 图 3是根据本发明又一个实施例的确定个体是否患有异常状态的方法的流程示意图; 图 4是根据本发明一个实施例的确定个体是否患有异常状态的系统的结构示意图; 图 5是根据本发明另一个实施例的确定个体是否患有异常状态的系统的结构示意图; 以及 The above and/or additional aspects and advantages of the present invention will become apparent and readily understood from 1 is a flow chart showing a method of determining whether an individual has an abnormal state according to an embodiment of the present invention; FIG. 2 is a flow chart showing a method of determining whether an individual has an abnormal state according to another embodiment of the present invention; A schematic diagram of a method for determining whether an individual has an abnormal state according to still another embodiment of the present invention; FIG. 4 is a schematic structural view of a system for determining whether an individual has an abnormal state according to an embodiment of the present invention; A structural schematic diagram of another embodiment of a system for determining whether an individual has an abnormal state;
图 6是根据本发明另一个实施例的确定个体是否患有异常状态的方法的流程示意图。 发明详细描述  6 is a flow chart showing a method of determining whether an individual has an abnormal state, according to another embodiment of the present invention. Detailed description of the invention
下面详细描述本发明的实施例, 所述实施例的示例在附图中示出, 其中自始至终相同 或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。 下面通过参考附图描 述的实施例是示例性的, 仅用于解释本发明, 而不能理解为对本发明的限制。  The embodiments of the present invention are described in detail below, and the examples of the embodiments are illustrated in the drawings, wherein the same or similar reference numerals are used to refer to the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the drawings are intended to be illustrative of the invention and are not to be construed as limiting.
在本文中所使用的术语 "异常状态" 应作广义理解, 其可以是任何与个体例如人的正 常状态不同的状态, 例如可以包括疾病、 免疫异常等。 在本发明的一个实施例中, 所述异 常状态为疾病。 根据本发明的实施例, 疾病的类型并不受特别限制, 根据本发明的优选实 施例, 所述疾病为单基因疾病。 单基因疾病单基通常是由一对等位基因控制的疾病或病理 性状。 因此, 通过检测与单基因疾病相关的 SNP能够有效地确定所研究的对象是否患有该 疾病。 根据本发明的实施例, 还可以实现对个体是否易于患有相关的异常状态例如疾病的 风险进行预测和评估。 根据本发明的具体示例, 所述单基因疾病为选自地中海贫血症和红 细胞葡萄糖 -6-磷酸脱氢酶缺陷症的至少一种。 在本发明的一个实施例中, 所述地中海贫血 症为 β-地中海贫血症。 由此, 可以有效地确定人类是否患有相关的疾病尤其是单基因疾病。 确定个体是否患有异常状态的方法  The term "abnormal state" as used herein shall be understood broadly, and may be any state different from the normal state of an individual such as a person, and may include, for example, a disease, an immune abnormality, or the like. In one embodiment of the invention, the abnormal state is a disease. According to an embodiment of the present invention, the type of the disease is not particularly limited, and according to a preferred embodiment of the present invention, the disease is a single gene disease. A single gene disease monotherapy is usually a disease or pathological trait controlled by a pair of alleles. Therefore, by detecting SNPs associated with a single gene disease, it is possible to effectively determine whether the subject under study has the disease. According to an embodiment of the present invention, it is also possible to predict and evaluate whether an individual is prone to suffer from an associated abnormal state such as a disease. According to a specific example of the present invention, the monogenic disease is at least one selected from the group consisting of thalassemia and erythrocyte glucose-6-phosphate dehydrogenase deficiency. In one embodiment of the invention, the thalassemia is beta-thalassemia. Thereby, it is possible to effectively determine whether a human has a related disease, especially a monogenic disease. Method of determining whether an individual has an abnormal state
在本发明的一个方面,本发明提出了一种确定个体是否患有异常状态的方法。参考图 1 , 根据本发明的实施例, 该方法包括:  In one aspect of the invention, the invention proposes a method of determining whether an individual has an abnormal state. Referring to FIG. 1, in accordance with an embodiment of the present invention, the method includes:
S100: 针对个体的核酸样本, 构建测序文库  S100: Construction of a sequencing library for individual nucleic acid samples
在该步骤中, 首先对个体的核酸样本, 构建测序文库, 从而可以用于后续的测序和结 果分析。 在本发明中, 所采用的术语 "个体,, 的含义并不受特别限制, 可以是任何含有遗 传信息的生物体, 例如可以为人。 由此, 可以利用根据本发明实施例的确定个体是否患有 异常状态的方法, 对人体样本进行检测, 从而可以有效地对人是否患有某些异常状态进行 预测。  In this step, a sequencing library is first constructed for individual nucleic acid samples, which can be used for subsequent sequencing and results analysis. In the present invention, the term "individual" is used without limitation, and may be any organism containing genetic information, for example, may be human. Thus, it is possible to determine whether an individual is suffering from an embodiment according to an embodiment of the present invention. The method of abnormal state detects the human body sample, so that it can effectively predict whether a person has some abnormal state.
参考图 2 , 根据本发明的实施例, 还可以进一步包括从个体中提取核酸样本的步骤 Referring to FIG. 2, according to an embodiment of the present invention, the method further includes the step of extracting a nucleic acid sample from an individual.
( S101 )„ 在本文中所使用的术语 "核酸样本" 应作广义理解, 其可以是 DNA样本, 也可 以是 RNA样本,也可以是经过修饰或者处理得 DNA样本或 RNA样本,只要能够通过测序 确定其遗传序列即可。 根据本发明的一个实施例, 核酸样本可以为个体的全基因组 DNA的 至少一部分。 由此, 全基因组 DNA中包含了个体的全部遗传信息, 由此, 通过对全基因组 DNA进行测序和 SNP分析, 能够更有效完整地获得个体的 SNP信息, 从而可以进一步提 高确定个体是否患有异常状态的方法的效率和准确性。 在本发明的一个实施例中, 所述核 酸样本是从个体的单细胞或者微量样本中提取的。 根据本发明的实施例, 从单细胞或者微 量样本中获得核酸样本的方法和手段并不受特别限制, 例如, 可以通过采用裂解液将单细 胞裂解从而实现释放并收集单细胞的全基因组 DNA。 在本发明的一个实施例中, 所述单细 胞或者微量样本是从个体的选自血液、 组织、 尿液、 配子、 受精卵、 卵裂球和胚胎的至少 一种分离的。 由此, 可以通过对来自个体的少量样本对个体是否患有异常状态进行有效地 预测和评估, 从而提高了确定个体是否患有异常状态的效率, 降低了确定个体是否患有异 常状态的成本。 另外, 可以方便地从生物体获取这些样本, 并且能够具体地针对某些疾病 采取不同的样本, 从而针对某些特殊疾病采取特定的分析手段。 在本发明的一个实施例中, 所述单细胞是通过选自稀释法、 口吸管分离法、 显微操作、 显微切割、 流式细胞分离术、 微流控的至少一种分离的。 由此, 能够有效便捷地获得生物样本的单细胞, 以便实施后续 操作, 由此, 可以有效地从个体中分离单细胞, 从而进一步提高了后续确定个体是否患有 异常状态的效率。 (S101) „ The term “nucleic acid sample” as used herein shall be understood broadly and may be a DNA sample or It can be an RNA sample, or it can be a modified or processed DNA sample or RNA sample, as long as the genetic sequence can be determined by sequencing. According to one embodiment of the invention, the nucleic acid sample can be at least a portion of an individual's whole genome DNA. Thus, the whole genome DNA contains all the genetic information of the individual, and thus, by sequencing and SNP analysis of the whole genome DNA, the SNP information of the individual can be obtained more effectively and completely, thereby further improving whether the individual has an abnormality. The efficiency and accuracy of the state method. In one embodiment of the invention, the nucleic acid sample is extracted from a single cell or a microsample of the individual. According to an embodiment of the present invention, the method and means for obtaining a nucleic acid sample from a single cell or a micro sample are not particularly limited, and for example, single cell cleavage may be carried out by using a lysate to effect release and collect single cell whole genome DNA. In one embodiment of the invention, the single or minimal sample is isolated from at least one member selected from the group consisting of blood, tissue, urine, gametes, fertilized eggs, blastomeres, and embryos. Thus, the efficiency of determining whether an individual has an abnormal state can be improved by effectively predicting and evaluating whether an individual has an abnormal state by a small sample from an individual, and the cost of determining whether the individual has an abnormal state is reduced. In addition, these samples can be easily obtained from organisms, and can be specifically sampled for certain diseases to take specific analytical measures for certain specific diseases. In one embodiment of the invention, the single cells are isolated by at least one selected from the group consisting of a dilution method, a mouth pipette separation method, micromanipulation, microdissection, flow cytometry, and microfluidics. Thereby, the single cells of the biological sample can be obtained efficiently and conveniently, so that the subsequent operations can be performed, whereby the single cells can be effectively separated from the individual, thereby further improving the efficiency of subsequently determining whether the individual has an abnormal state.
另外, 根据本发明的实施例, 在获得核酸样本之后, 可以对所得到的核酸样本进行扩 增, 尤其是对于从单细胞或者微量样本提取的核酸样本。 例如, 参见图 3 , 在本发明的一个 实施例中, 针对所述个体的核酸样本, 构建测序文库进一步包括: 对所得到的核酸样本进 行扩增, 以便得到核酸样本扩增产物(S102 )。接下来, 可以对所得到的核酸样本扩增产物, 构建测序文库。 由此, 可以提高构建测序文库的效率, 从而进一步提高了后续确定个体是 否患有异常状态的效率。  Further, according to an embodiment of the present invention, after obtaining a nucleic acid sample, the obtained nucleic acid sample can be expanded, especially for a nucleic acid sample extracted from a single cell or a micro sample. For example, referring to Fig. 3, in one embodiment of the present invention, constructing a sequencing library for the nucleic acid sample of the individual further comprises: amplifying the obtained nucleic acid sample to obtain a nucleic acid sample amplification product (S102). Next, a product can be amplified from the obtained nucleic acid sample to construct a sequencing library. Thereby, the efficiency of constructing the sequencing library can be improved, thereby further improving the efficiency of subsequent determination of whether the individual has an abnormal state.
根据本发明的实施例, 对核酸样本进行扩增的方法, 并不受特别限制。 例如, 在本发 明的一个实施例中, 所采用的核酸样本为从个体的单细胞中提取的全基因组 DNA, 针对该 券基因组 DNA进行扩增可以通过选自 PEP-PCR、 DOP-PCR、 OmniPlex WGA和 MDA的 至少一种进行的。 由此, 可以进一步提高扩增全基因组 DNA的效率, 从而进一步提高了后 续确定个体是否患有异常状态的效率。 任选地, 根据本发明的实施例, 可以进一步包括对 所述单细胞进行裂解, 以便释放所述单细胞的全基因组的步骤。 根据本发明的一些示例, 可以用于裂解单细胞并释放全基因组的方法不受特别限制, 只要能够将单细胞裂解优选充 分裂解即可。 根据本发明的具体示例, 可以利用碱性裂解液将所述单细胞裂解并释放所述 单细胞的全基因组。 发明人发现, 这样能够有效地裂解单细胞并释放出全基因组, 并且所 释放的全基因组在进行测序时, 能够提高准确率, 从而进一步提高了确定单细胞染色体非 整倍性的效率。 根据本发明的实施例, 单细胞全基因组扩增的方法不受特别限制, 可以采 用基于 PCR的方法例如可以采用 PEP-PCR、 DOP-PCR、 和 OmniPlex WGA, 也可以采用非 基于 PCR的方法例如 MDA (多重链置换扩增)。根据本发明的具体示例,优选采用基于 PCR 的方法, 例如 OmniPlex WGA方法。 可选用的商业化试剂盒包括但不限于 Sigma Aldrich的 GenomePlex, Rubicon Genomics的 PicoPlex, Qiagen的 REPLI-g, GE Healthcare的 illustra GenomiPhi等。 因而, 根据本发明的具体示例, 在构建测序文库之前, 可以采用 OmniPlex WGA对单细胞全基因组进行扩增。 由此, 能够有效地对全基因组进行扩增, 从而进一步提 高了确定个体是否患有异常状态的效率。 The method of amplifying a nucleic acid sample according to an embodiment of the present invention is not particularly limited. For example, in one embodiment of the invention, the nucleic acid sample employed is whole genome DNA extracted from a single cell of an individual, and amplification of the coupon genomic DNA can be performed by selecting from PEP-PCR, DOP-PCR, OmniPlex Conducted by at least one of WGA and MDA. Thereby, the efficiency of amplifying whole genome DNA can be further improved, thereby further improving the efficiency of subsequently determining whether an individual has an abnormal state. Optionally, according to an embodiment of the present invention, the step of lysing the single cells to release the whole genome of the single cells may be further included. According to some examples of the present invention, a method which can be used for lysing a single cell and releasing a whole genome is not particularly limited as long as single cell lysis can be preferably sufficiently lysed. According to a specific example of the present invention, the single cell can be cleaved and released using an alkaline lysate Single-genome whole genome. The inventors have found that this can effectively lyse single cells and release the whole genome, and the released whole genome can improve the accuracy when sequencing, thereby further improving the efficiency of determining single cell chromosome aneuploidy. According to an embodiment of the present invention, the method of single-cell whole genome amplification is not particularly limited, and PCR-based methods such as PEP-PCR, DOP-PCR, and OmniPlex WGA may be employed, and non-PCR-based methods may be employed, for example. MDA (multiple strand displacement amplification). According to a specific example of the invention, a PCR based method, such as the OmniPlex WGA method, is preferably employed. Commercial kits of choice include, but are not limited to, GenomePlex from Sigma Aldrich, PicoPlex from Rubicon Genomics, REPLI-g from Qiagen, illustra GenomiPhi from GE Healthcare, and the like. Thus, according to a specific example of the invention, the single cell whole genome can be amplified using OmniPlex WGA prior to construction of the sequencing library. Thereby, the whole genome can be efficiently amplified, thereby further improving the efficiency of determining whether the individual has an abnormal state.
在本发明的一个实施例中, 在针对所述核酸样本扩增产物, 构建所述测序文库之前, 进一步包括: 利用核酸探针对所述核酸样本扩增产物进行筛选, 以便获得来自预定区域的 核酸样本扩增产物; 以及针对所述来自预定区域的核酸样本扩增产物, 构建所述测序文库。 在本发明的一个实施例中, 所述预定区域为至少一个外显子区域。 由此, 可以有效地确定 预定区域所包含的已知 SNP, 由此, 可以有效地提高根据预定区域例如外显子区域中所包 含的已知 SNP, 来确定与这些 SNP有关的异常状态的效率和准确性。 另外, 根据本发明的 实施例, 借助核酸探针对核酸样本扩增产物进行筛选的方法并不受特别限制, 可以是固相 筛选, 也可以是液相杂交。 根据本发明的实施例, 可以以芯片的形式提供核酸探针。 由此, 可以进一步提高利用核酸探针进行筛选的效率, 从而进一步提高了后续确定个体是否患有 异常状态的效率。  In one embodiment of the present invention, before constructing the sequencing library for the nucleic acid sample amplification product, the method further comprises: screening the nucleic acid sample amplification product by using a nucleic acid probe to obtain a predetermined region. Amplification product of the nucleic acid sample; and constructing the sequencing library for the nucleic acid sample amplification product from the predetermined region. In an embodiment of the invention, the predetermined area is at least one exon area. Thereby, the known SNPs included in the predetermined area can be effectively determined, whereby the efficiency of determining the abnormal state related to the SNPs according to the predetermined SNPs included in the predetermined area such as the exon area can be effectively improved. And accuracy. Further, according to an embodiment of the present invention, the method of screening the amplification product of the nucleic acid sample by means of the nucleic acid probe is not particularly limited, and may be a solid phase screening or a liquid phase hybridization. According to an embodiment of the invention, the nucleic acid probe can be provided in the form of a chip. Thereby, the efficiency of screening using the nucleic acid probe can be further improved, thereby further improving the efficiency of subsequently determining whether the individual has an abnormal state.
本领域技术人员能够理解的是, 还可以通过其他已知的方法对预定区域的核酸样本进 行分析, 例如采用特异性引物对核酸样本进行 PC , 由此, 可以得到预定区域的相关扩增 产物, 从而构建该预定区域的测序文库, 进而获得该预定区域的相关信息。  It will be understood by those skilled in the art that the nucleic acid sample of the predetermined region can also be analyzed by other known methods, for example, the nucleic acid sample is subjected to PC using a specific primer, thereby obtaining a related amplification product of a predetermined region. Thereby, a sequencing library of the predetermined region is constructed, and information about the predetermined region is obtained.
根据本发明的实施例, 对个体的核酸样本构建测序文库的方法并不受特别限制。 本领 域技术人员可以根据采用的基因组测序技术的具体方案选择不同的构建全基因组测序文库 的方法, 关于构建全基因组测序文库的细节,可以参见测序仪器的厂商例如 Illumina公司所 提供的规程, 例如参见 Illumina公司 Multiplexing Sample Preparation Guide ( Part#1005361; Feb 2010 )或 Paired-End SamplePrep Guide ( Part#1005063; Feb 2010 ), 通过参照将其并入 本文。  According to an embodiment of the present invention, a method of constructing a sequencing library for a nucleic acid sample of an individual is not particularly limited. Those skilled in the art can select different methods for constructing a whole genome sequencing library according to the specific scheme of the genome sequencing technology adopted. For details on constructing the whole genome sequencing library, refer to the protocol provided by the manufacturer of the sequencing instrument, such as Illumina, for example, see Illumina Corporation Multiplexing Sample Preparation Guide (Part #1005361; Feb 2010) or Paired-End SamplePrep Guide (Part #1005063; Feb 2010), which is incorporated herein by reference.
S200: 对测序文库进行测序, 得到测序结果  S200: Sequencing the sequencing library to obtain sequencing results
在该步骤中,通过对前面所构建的测序文库进行测序,可以得到由多个测序数据( reads ) 构成的测序结果。 In this step, by sequencing the previously constructed sequencing library, multiple sequencing data (lists) can be obtained. The resulting sequencing results.
根据本发明的实施例, 用于测序的技术和平台并不受特别限制。 在本发明的一个实施 例中, 可以利用选自 Illumina Hiseq2000、 Genome Analyzer、 SOLiD测序系统、 Ion Torrents Ion Proton、 454、 PacBio RS测序系统、 Helicos tSMS技术以及纳米孔测序技术的至少一种 进行测序。 由此, 能够利用这些测序装置的高通量、 深度测序的特点, 进一步提高了确定 个体是否患有异常状态的效率。 当然, 本领域技术人员能够理解的是, 还可以采用其他的 测序方法和装置进行全基因组测序, 例如第三代测序技术, 以及以后可能开发出来的更先 进的测序技术。 根据本发明的实施例, 通过全基因组测序所得到的测序数据的长度不受特 别限制。根据本发明的一个具体示例, 利用 Illumina Hiseq2000进行的,所述测序数据的长度 为 90bp。 申请人惊奇地发现, 当测序数据的长度为约 90bp时, 能够极大地方便对测序数据 进行分析, 提高分析效率, 同时能够显著降低分析的成本。 进一步提高了确定个体是否患 有异常状态的效率, 并且降低了确定个体是否患有异常状态的成本。这里所使用的术语 "测 序数据" 是指各个测序数据长度数值的平均值。  Techniques and platforms for sequencing are not particularly limited in accordance with embodiments of the present invention. In one embodiment of the invention, sequencing can be performed using at least one selected from the group consisting of Illumina Hiseq 2000, Genome Analyzer, SOLiD sequencing system, Ion Torrents Ion Proton, 454, PacBio RS sequencing system, Helicos tSMS technology, and nanopore sequencing technology. Thereby, it is possible to further improve the efficiency of determining whether an individual has an abnormal state by utilizing the characteristics of high-throughput and deep sequencing of these sequencing devices. Of course, those skilled in the art will appreciate that other sequencing methods and devices can be used for whole genome sequencing, such as third generation sequencing techniques, as well as more advanced sequencing technologies that may be developed in the future. According to an embodiment of the present invention, the length of the sequencing data obtained by whole genome sequencing is not particularly limited. According to a specific example of the present invention, the sequencing data is 90 bp in length using Illumina Hiseq2000. Applicants have surprisingly found that when the length of the sequencing data is about 90 bp, the sequencing data can be greatly facilitated, the analysis efficiency is improved, and the cost of the analysis can be significantly reduced. The efficiency of determining whether an individual has an abnormal state is further improved, and the cost of determining whether the individual has an abnormal state is reduced. The term "sequence data" as used herein refers to the average of the length values of individual sequencing data.
S300: 确定测序结果中所包含的已知 SNP  S300: Determine the known SNPs included in the sequencing results
在获得测序结果之后, 可以通过对测序结果中所包含的测序数据进行分析, 获得测序 结果中所包含的遗传信息, 例如可以得到 SNP信息。  After obtaining the sequencing result, the genetic information contained in the sequencing result can be obtained by analyzing the sequencing data included in the sequencing result, for example, SNP information can be obtained.
根据本发明的实施例,在该步骤中,对测序结果中所包含的测序数据进行分析得到 SNP 信息的方法并不受特别限制。 根据本发明的实施例, 可以通过将所得到的测序数据与参考 基因进行比对, 以便确定所得到的测序结果中的 SNP信息。 本领域技术人员可以根据分析 目的来确定用于进行比对的参考基因。 根据本发明的实施例, 所采用的参考基因是已知的 人基因组序列, 例如可以为 Hgl9, NCBI Build 37„ 当然, 本领域技术人员还可以通过采 用其他已知的序列作为参照序列, 例如可以采用已知的 SNP作为参照序列进行比对。  According to an embodiment of the present invention, in this step, the method of analyzing the sequencing data contained in the sequencing result to obtain the SNP information is not particularly limited. According to an embodiment of the present invention, SNP information in the obtained sequencing result can be determined by comparing the obtained sequencing data with a reference gene. Those skilled in the art can determine the reference genes used for the alignment based on the analytical purpose. According to an embodiment of the present invention, the reference gene used is a known human genome sequence, which may be, for example, Hgl9, NCBI Build 37. Of course, those skilled in the art may also use other known sequences as reference sequences, for example, The known SNPs were used as a reference sequence for alignment.
根据本发明的实施例, 进行比对所采用的方法和软件并不受特别限制。 根据本发明的 实施例, 在本发明的一个实施例中, 可以采用 SOAP/SOAP2软件进行测序数据与参照序列 之间的比对。 根据本发明的实施例, 也可以首先对测序数据进行组装, 将组装结果与参照 序列进行比对。 由此, 可以有效地对测序结果中所包含的测序数据进行分析, 从而可以有 效地确定测序结果中所包含的 SNP, 进而可以提高确定个体是否患有异常状态的效率。  According to an embodiment of the present invention, the method and software employed for the comparison are not particularly limited. In accordance with an embodiment of the present invention, in one embodiment of the invention, the alignment between the sequencing data and the reference sequence can be performed using SOAP/SOAP2 software. According to an embodiment of the present invention, sequencing data may also be assembled first, and the assembly result is compared with a reference sequence. Thereby, the sequencing data contained in the sequencing result can be effectively analyzed, so that the SNP contained in the sequencing result can be effectively determined, and the efficiency of determining whether the individual has an abnormal state can be improved.
本领域技术人员可以理解的是, 针对不同测序平台, 可以采用不同的比对软件。 例如 SOAP/SOAP2软件, 可对 Illumina测序系统的短测序数据数据进行精确比对。 具体的, 可 以根据比对结果统计数据质量值、 比对率、 GC含量、 重复率、 基因组覆盖度、 测序深度等 信息, 并且根据以上信息对测序数据进行质控。 同时将不能比对的或重复的无用数据去掉, 得到有效的数据集, 用于后续分析。 Those skilled in the art will appreciate that different alignment software can be employed for different sequencing platforms. For example, SOAP/SOAP2 software provides accurate alignment of short-sequence data from Illumina sequencing systems. Specifically, the statistical data quality value, the comparison rate, the GC content, the repetition rate, the genome coverage, the sequencing depth, and the like may be used according to the comparison result, and the sequencing data is quality-controlled according to the above information. At the same time, the useless data that cannot be compared or repeated is removed. Get a valid data set for subsequent analysis.
另外, 在本发明的一个实施例中, 在通过比对, 获得 SNP信息之后, 还可以对所得到 的 SNP信息进行筛选, 例如可以通过基于预先设定的过滤条件对对测序结果中所包含的已 知 SNP进行过滤。 根据本发明的实施例, 可以采用的过滤条件为下列至少之一:  In addition, in an embodiment of the present invention, after obtaining the SNP information through the comparison, the obtained SNP information may also be filtered, for example, by pairing the sequencing results based on the preset filtering conditions. SNPs are known for filtration. According to an embodiment of the invention, the filtering conditions that can be employed are at least one of the following:
SNP calling质量值大于 20,在本文中所使用的术语 "SNP calling质量值"是指在 SOAP 分析软件运行过程中, 对每个 SNP的可信度所给出的打分结果。  The SNP calling quality value is greater than 20. The term "SNP calling quality value" as used herein refers to the scoring result given to the confidence of each SNP during the operation of the SOAP analysis software.
SNP位点测序深度大于 8;  SNP site sequencing depth is greater than 8;
SNP位点深度小于基因组平均深度的 5倍;  The depth of the SNP locus is less than 5 times the average depth of the genome;
SNP位点拷贝数不大于 2; 和  SNP site copy number is not greater than 2; and
SNP位点与最近的其他 SNP位点之间的距离大于 5 , 在本文中所使用的表达方式 "两 个 SNP位点之间的距离" 是指这两个 SNP位点之间相隔的碱基个数, 例如 "距离大于 5" 即两个位点间相隔的碱基大于 5个。  The distance between the SNP site and the nearest other SNP site is greater than 5, and the expression "distance between two SNP sites" as used herein refers to the bases between the two SNP sites. The number, for example, "the distance is greater than 5", that is, the number of bases separated by two sites is greater than five.
由此, 可以有效地对所得到的 SNP结果进行过滤, 从而可以有效地提高确定个体是否 患有异常状态的准确性和效率。  Thereby, the obtained SNP results can be effectively filtered, so that the accuracy and efficiency of determining whether an individual has an abnormal state can be effectively improved.
S400: 确定与已知 SNP有关的异常状态  S400: Determining an abnormal state related to a known SNP
如前所述, 在通过测序得到测序数据, 并对测序数据进行分析之后, 可以获得在测序 结果中所包含的 SNP信息。 进而, 可以通过对 SNP进行分析, 确定个体是否患有与 SNP 相关的异常状态, 例如是否患有与 SNP相关的单基因疾病。  As described above, after sequencing data is obtained by sequencing, and the sequencing data is analyzed, SNP information contained in the sequencing result can be obtained. Further, the SNP can be analyzed to determine whether the individual has an abnormal state associated with the SNP, such as whether or not there is a single-gene disease associated with the SNP.
根据本发明的实施例, 可以采用的 SNP类型并不手特别限制。 在本发明的一个实施例 中, 所述已知 SNP位于人染色体 HBB基因区。 在本发明的一个实施例中, 所述已知 SNP 为选自下列的至少一种: rs33985472、 rs63750954、 rs63751128、 rs33978907、 rs34029390、 rs34809925、 rs33953406、 rs33910569、 rs33910569、 rs33971634、 rs3392539 rs33946267、 rs35485099、 rs36015961、 rs33930977、 rs35256489、 rs33952266、 rs33952266、 rs33952266、 rs33914668、 rs33914668、 rs33913413 , rs33913413 , rs34483965、 rs34793594、 rs35703285、 rs63750433、 rs34690599、 rs63751175、 rs35328027、 rsl609812、 rs34451549、 rs7480526、 rsl0768683、 rs35099082、 rs63750283、 rs63750283、 rs33945777、 rs33945777、 rs33945777、 rs33933298, rs33913712、 rs33995148、 rs33931779、 rs33969400、 rs33922842、 rsl 1549407、 rs33974936、 rs33991059、 rs33982568、 rs33948578、 rsl l3507 rs3394300 rs3394300 rs63750513、 rs34527846、 rs35456885、 rs35004220、 rs63750195、 rs35724775、 rs33915217、 rs33915217、 rs33915217、 rs33956879、 rs33956879、 rs33956879、 rs33971440、 rs33971440、 rs33960103、 rs33960103、 rs35684407、 rs35578002、 rs33916412、 rs35424040、 rs33950507、 rs33950507、 rs33951465、 rs33959855、 rs33972047、 rs33986703、 rs34716011、 rs63750783、 rs35799536、 rs33930702、 rs33930702、 rs33930702、 rs33941849、 rs33941849、 rs33941849、 rs34563000、 rs34135787、 rs34704828、 rs63750628、 rs34305195 以及位于人染色体 11 上 5246716位、 5246879位、 5247026位、 5248161位的 SNP。 由此, 可以有效地确定人体是 否携带 HBB基因区的 SNP,从而可以有效地确定所研究对象是否患有地中海贫血症尤其是 β-地中海贫血症。 According to an embodiment of the present invention, the type of SNP that can be employed is not particularly limited. In one embodiment of the invention, the known SNP is located in the human chromosome HBB gene region. In one embodiment of the present invention, the known SNP is at least one selected from the group consisting of: rs33985472, rs63750954, rs63751128, rs33978907, rs34029390, rs34809925, rs33953406, rs33910569, rs33910569, rs33971634, rs3392539 rs33946267, rs35485099, rs36015961, rs33930977, rs35256489, rs33952266, rs33952266, rs33952266, rs33914668, rs33914668, rs33913413, rs33913413, rs34483965, rs34793594, rs35703285, rs63750433, rs34690599, rs63751175, rs35328027, rsl609812, rs34451549, rs7480526, rsl0768683, rs35099082, rs63750283, rs63750283, rs33945777, rs33945777, rs33945777, rs33933298, rs33913712, rs33995148, rs33931779, rs33969400, rs33922842, rsl 1549407, rs33974936, rs33991059, rs33982568, rs33948578, rsl l3507 rs3394300 rs3394300 rs63750513, rs34527846, rs35456885, rs35004220, rs63750195, rs35724775, rs33915217, rs33915217, rs33915217, rs33956879, rs33956879 , rs33956879, rs33971440 rs33971440, rs33960103, rs33960103, rs35684407, rs35578002, rs33916412, rs35424040, rs33950507, Rs33950507, rs33951465, rs33959855, rs33972047, rs33986703, rs34716011, rs63750783, rs35799536, rs33930702, rs33930702, rs33930702, rs33941849, rs33941849, rs33941849, rs34563000, rs34135787, rs34704828, rs63750628, rs34305195 and located on human chromosome 11 at 5246716, 5246879, 5247026 Bit, 5248161 SNP. Thereby, it is possible to effectively determine whether the human body carries the SNP of the HBB gene region, thereby being able to effectively determine whether the subject has thalassemia, especially β-thalassemia.
由此, 通过上述方法可以有效地获取个体的 SNP信息, 并且可以针对所研究的特定疾 病的相关基因提取所需的 SNP信息, 获得目标基因的分型结果。 在获得对获得的目标基因 的分型结果之后, 可以通过与已有的数据库比较, 进行疾病注释, 判断相应的 SNP变异信 息是否会引起疾病, 而判断此单细胞或微量样本对应的胚胎或个体为孟德尔遗传病 (单基 因一疾病) 患者, 或者相应的 SNP变异信息不至于引起疾病, 判断对应的胚胎或个体为孟 德尔遗传病基因的携带者。  Thus, the SNP information of the individual can be efficiently obtained by the above method, and the desired SNP information can be extracted for the relevant gene of the specific disease to be studied, and the typing result of the target gene can be obtained. After obtaining the typing result of the obtained target gene, the disease annotation can be performed by comparing with the existing database, and whether the corresponding SNP variation information causes the disease can be determined, and the embryo or individual corresponding to the single cell or the micro sample can be judged. For patients with Mendelian genetic disease (single gene-one disease), or the corresponding SNP mutation information does not cause disease, the corresponding embryo or individual is judged to be a carrier of the Mendelian genetic disease gene.
确定个体是否患有异常状态的系统 a system for determining whether an individual has an abnormal state
在本发明的另一方面, 参考图 4, 本发明提出了一种确定个体是否患有异常状态的系统 1000。 才艮据本发明的实施例, 该系统 1000包括: 测序文库构建装置 100、 测序装置 200、 In another aspect of the invention, with reference to Figure 4, the present invention provides a system 1000 for determining whether an individual has an abnormal condition. According to an embodiment of the present invention, the system 1000 includes: a sequencing library construction device 100, a sequencing device 200,
SNP确定装置 300以及异常状态确定装置 400。 The SNP determining device 300 and the abnormal state determining device 400.
才艮据本发明的实施例, 测序文库构建装置 100 用于针对个体的核酸样本, 构建测序文 库。 才艮据本发明的实施例, 测序装置 200与测序文库构建装置 100相连, 由此, 可以用于 对所构建的测序文库进行测序, 以便获得由多个测序数据构成的测序结果。 根据本发明的 实施例, SNP确定装置 300与所述测序装置相连, 用于基于所得到的测序数据, 确定测序 结果中所包含的已知 SNP。 根据本发明的实施例, 异常状态确定装置 400与 SNP确定装置 相连, 由此, 用于基于前面所确定的在测序结果中所包含的已知 SNP, 确定个体是否患有 与所述已知 SNP相关的异常状态。  According to an embodiment of the present invention, the sequencing library construction device 100 is configured to construct a sequencing library for a nucleic acid sample of an individual. According to an embodiment of the present invention, the sequencing device 200 is coupled to the sequencing library construction device 100, and thus, can be used to sequence the constructed sequencing library to obtain sequencing results composed of a plurality of sequencing data. In accordance with an embodiment of the present invention, SNP determining device 300 is coupled to the sequencing device for determining a known SNP contained in the sequencing result based on the obtained sequencing data. According to an embodiment of the present invention, the abnormal state determining device 400 is connected to the SNP determining device, thereby for determining whether the individual has the known SNP based on the previously determined known SNPs included in the sequencing result. Related exception status.
利用根据本发明实施例的确定个体是否患有异常状态的系统可以有效地实施前面所述 的确定个体是否患有异常状态的方法, 从而以有效地通过测序确定核酸样本中所包含的 The method of determining whether an individual has an abnormal state described above can be effectively performed using a system for determining whether an individual has an abnormal state according to an embodiment of the present invention, thereby efficiently determining the inclusion in the nucleic acid sample by sequencing.
SNP, 并且由于这些 SNP与异常状态是相关的, 由此, 可以有效地确定核酸样本的来源个 体是否患有与这些 SNP相关的异常状态。 SNPs, and since these SNPs are related to abnormal states, it is thereby possible to effectively determine whether the source individuals of the nucleic acid samples have abnormal states associated with these SNPs.
在本发明的一个实施例中, 该系统还可以进一步包括核酸样本提取装置 101。根据本发 明的实施例, 核酸样本提取装置 101 适于从个体的单细胞或者微量样本中提取个体的全基 因组 DNA的至少一部分。 在本发明的一个实施例中, 该系统 1000可以进一步包括适于从 个体的选自血液、 组织、 尿液、 配子、 受精卵、 卵裂球和胚胎的至少一种分离单细胞或者 微量样本的生物样本分离装置。 由此, 利用根据本发明实施例的确定个体是否患有异常状 态的系统可以有效地实施前面所述的确定个体是否患有异常状态的方法。 由此, 可以通过 对来自个体的少量样本对个体是否患有异常状态进行有效地预测和评估, 从而提高了确定 个体是否患有异常状态的效率, 降低了确定个体是否患有异常状态的成本。 另外, 可以方 便地从生物体获取这些样本, 并且能够具体地针对某些疾病采取不同的样本, 从而针对某 些特殊疾病采取特定的分析手段。 在本发明的一个实施例中, 生物样本分离装置适于通过 选自稀释法、 口吸管分离法、 显微操作、 显微切割、 流式细胞分离术、 微流控的至少一种 分离单细胞或者微量样本。 利用根据本发明实施例的确定个体是否患有异常状态的系统可 以有效地实施前面所述的确定个体是否患有异常状态的方法。 由此, 能够有效便捷地获得 生物样本的单细胞, 以便实施后续操作, 由此, 可以有效地从个体中分离单细胞, 从而进 一步提高了后续确定个体是否患有异常状态的效率。 在本发明的一个实施例中, 测序文库 构建装置可以进一步包括核酸样本扩增单元, 该核酸样本扩增单元适于对所述核酸样本进 行扩增, 以便得到核酸样本扩增产物。 在本发明的一个实施例中, 所述扩增单元适于进行 选自 PEP-PCR、 DOP-PCR、 OmniPlex WGA和 MDA的至少一种。 由此, 可以进一步提高 扩增全基因组 DNA的效率, 从而进一步提高了后续确定个体是否患有异常状态的效率。 利 用根据本发明实施例的确定个体是否患有异常状态的系统可以有效地实施前面所述的确定 个体是否患有异常状态的方法。 由此, 可以进一步提高扩增全基因组 DNA的效率, 从而进 一步提高了后续确定个体是否患有异常状态的效率。 在本文中, "核酸样本提取装置" 的运 行方式并不受特别限制, 只要可以获得相关的核酸样本并且所获得的核酸样本适于后续操 作即可, 例如对于来自单细胞或者微量样本的全基因组 DNA, 可以通过采用裂解液将单细 胞裂解从而实现释放并收集单细胞的全基因组 DNA。 In an embodiment of the invention, the system may further comprise a nucleic acid sample extraction device 101. According to an embodiment of the invention, the nucleic acid sample extraction device 101 is adapted to extract at least a portion of the individual's whole genome DNA from a single cell or a micro sample of the individual. In one embodiment of the invention, the system 1000 can further comprise at least one single cell suitable for isolation from an individual selected from the group consisting of blood, tissue, urine, gametes, fertilized eggs, blastomeres, and embryos. A biological sample separation device for micro-samples. Thus, the above-described method of determining whether an individual has an abnormal state can be effectively implemented using a system for determining whether an individual has an abnormal state according to an embodiment of the present invention. Thus, the efficiency of determining whether an individual has an abnormal state can be improved by effectively predicting and evaluating whether an individual has an abnormal state by a small sample from an individual, and the cost of determining whether the individual has an abnormal state is reduced. In addition, these samples can be easily obtained from organisms, and can be specifically sampled for certain diseases to take specific analytical measures for certain specific diseases. In one embodiment of the invention, the biological sample separation device is adapted to separate at least one single cell selected from the group consisting of a dilution method, a mouth pipette separation method, a micromanipulation, a microdissection, a flow cytometry, and a microfluidic control. Or a small sample. The above-described method of determining whether an individual has an abnormal state can be effectively implemented using a system for determining whether an individual has an abnormal state according to an embodiment of the present invention. Thereby, the single cells of the biological sample can be obtained efficiently and conveniently, so that the subsequent operations can be performed, whereby the single cells can be effectively separated from the individual, thereby further improving the efficiency of subsequently determining whether the individual has an abnormal state. In one embodiment of the present invention, the sequencing library construction device may further comprise a nucleic acid sample amplification unit adapted to amplify the nucleic acid sample to obtain a nucleic acid sample amplification product. In one embodiment of the invention, the amplification unit is adapted to perform at least one selected from the group consisting of PEP-PCR, DOP-PCR, OmniPlex WGA, and MDA. Thereby, the efficiency of amplifying whole genome DNA can be further improved, thereby further improving the efficiency of subsequently determining whether an individual has an abnormal state. The above-described method of determining whether an individual has an abnormal state can be effectively implemented using a system for determining whether an individual has an abnormal state according to an embodiment of the present invention. Thereby, the efficiency of amplifying whole genome DNA can be further improved, thereby further improving the efficiency of subsequently determining whether an individual has an abnormal state. Herein, the operation mode of the "nucleic acid sample extraction device" is not particularly limited as long as the relevant nucleic acid sample can be obtained and the obtained nucleic acid sample is suitable for subsequent operations, for example, a whole genome from a single cell or a micro sample. DNA, can be released by lysing single cells with lysate and collecting single-cell whole genome DNA.
另外, 在本发明的一个实施例中, 测序文库构建装置还可以进一步包括筛选单元。 该 筛选单元中设置有核酸探针, 以便利用核酸探针对所述核酸样本扩增产物进行筛选, 以便 获得来自预定区域的核酸样本扩增产物; 以及针对来自预定区域的核酸样本扩增产物, 构 建所述测序文库。 由此, 可以有效地确定预定区域所包含的已知 SNP, 由此, 可以有效地 提高根据预定区域例如外显子区域中所包含的已知 SNP, 来确定与这些 SNP有关的异常状 态的效率和准确性。 在本发明的一个实施例中, 核酸探针可以是以芯片的形式提供的。 由 此, 可以进一步提高利用核酸探针进行筛选的效率, 从而进一步提高了后续确定个体是否 患有异常状态的效率在本发明的一个实施例中, 所述测序装置为选自 Illumina Hiseq2000、 Genome Analyzer ^ SOLiD测序系统、 Ion Torrent、 Ion Proton、 454、 PacBio RS测序系统、 Helicos tSMS测序装置以及纳米孔测序装置的至少一种。 由此, 能够利用这些测序装置的高 通量、 深度测序的特点, 进一步提高了确定个体是否患有异常状态的效率。 在本发明的一 个实施例中, 所述测序是利用 Illumina Hiseq2000进行的,所述测序数据的长度为 90bp。 由 此, 可以进一步提高后续进行 SNP分析的效率, 从而提高了确定个体是否患有异常状态的 效率。 在本发明的一个实施例中, SNP确定装置进一步包括: 比对单元, 所述比对单元用 于通过将所述测序数据与参考基因进行比对确定所述测序结果中所包含的已知 SNP。 在本 发明的一个实施例中, 所述比对单元适于利用 SOAP/SOAP2软件进行比对。 由此, 可以有 效地对测序结果中所包含的测序数据进行分析, 从而可以有效地确定测序结果中所包含的 SNP, 进而可以提高确定个体是否患有异常状态的效率。 In addition, in an embodiment of the present invention, the sequencing library construction device may further include a screening unit. A nucleic acid probe is disposed in the screening unit to screen the nucleic acid sample amplification product with a nucleic acid probe to obtain a nucleic acid sample amplification product from a predetermined region; and to amplify a product for a nucleic acid sample from a predetermined region, The sequencing library was constructed. Thereby, the known SNPs included in the predetermined area can be effectively determined, whereby the efficiency of determining the abnormal state related to the SNPs according to the predetermined SNPs included in the predetermined area such as the exon area can be effectively improved. And accuracy. In one embodiment of the invention, the nucleic acid probe may be provided in the form of a chip. Thereby, the efficiency of screening by using a nucleic acid probe can be further improved, thereby further improving the efficiency of subsequently determining whether an individual has an abnormal state. In one embodiment of the present invention, the sequencing device is selected from Illumina Hiseq2000, Genome Analyzer. ^ At least one of a SOLiD sequencing system, an Ion Torrent, an Ion Proton, 454, a PacBio RS sequencing system, a Helicos tSMS sequencing device, and a nanopore sequencing device. Thereby, it is possible to utilize the high of these sequencing devices The characteristics of flux and deep sequencing further improve the efficiency of determining whether an individual has an abnormal state. In one embodiment of the invention, the sequencing is performed using an Illumina Hiseq 2000, the sequencing data being 90 bp in length. Thereby, the efficiency of subsequent SNP analysis can be further improved, thereby improving the efficiency of determining whether an individual has an abnormal state. In an embodiment of the present invention, the SNP determining apparatus further includes: a comparing unit configured to determine a known SNP included in the sequencing result by comparing the sequencing data with a reference gene . In one embodiment of the invention, the comparison unit is adapted to perform alignment using SOAP/SOAP2 software. Thereby, the sequencing data included in the sequencing result can be effectively analyzed, so that the SNP included in the sequencing result can be effectively determined, and the efficiency of determining whether the individual has an abnormal state can be improved.
在本发明的一个实施例中, SNP确定装置可以进一步包括: SNP过滤单元, 所述 SNP 过滤单元适于基于如下过滤条件, 对所述测序结果中所包含的已知 SNP 进行过滤: SNP calling质量值大于 20; SNP位点测序深度大于 8; SNP位点深度小于基因组平均深度的 5 倍; SNP位点拷贝数不大于 2; 和 SNP位点与最近的其他 SNP位点之间的距离大于 5。 由 此, 可以有效地对所得到的 SNP结果进行过滤, 从而可以有效地提高确定个体是否患有异 常状态的准确性和效率。  In an embodiment of the present invention, the SNP determining apparatus may further include: a SNP filtering unit, wherein the SNP filtering unit is adapted to filter the known SNPs included in the sequencing result based on the following filtering conditions: SNP calling quality The value is greater than 20; the sequencing depth of the SNP site is greater than 8; the depth of the SNP site is less than 5 times the average depth of the genome; the copy number of the SNP site is not greater than 2; and the distance between the SNP site and the nearest other SNP site is greater than 5 . Thereby, the obtained SNP results can be effectively filtered, so that the accuracy and efficiency of determining whether an individual has an abnormal state can be effectively improved.
在本发明的一个实施例中,所述已知 SNP位于人染色体 HBB基因区。在本发明的一个 实施例中, 所述已知 SNP 为选自下列的至少一种: rs33985472、 rs63750954、 rs63751128、 rs33978907、 rs34029390、 rs34809925、 rs33953406、 rs33910569、 rs33910569、 rs33971634、 rs3392539 rs33946267、 rs35485099、 rs36015961、 rs33930977、 rs35256489、 rs33952266、 rs33952266、 rs33952266、 rs33914668、 rs33914668、 rs33913413 , rs33913413 , rs34483965、 rs34793594、 rs35703285、 rs63750433、 rs34690599、 rs63751175、 rs35328027、 rsl609812、 rs34451549、 rs7480526、 rsl0768683、 i rs35099082、 rs63750283、 rs63750283、 rs33945777、 rs33945777、 rs33945777、 rs33933298, rs33913712、 rs33995148、 rs33931779、 rs33969400、 rs33922842、 rs 11549407、 rs33974936、 rs33991059、 rs33982568、 rs33948578、 rsl l35071、 rs3394300 rs3394300 rs63750513、 rs34527846、 rs35456885、 rs35004220、 rs63750195、 rs35724775、 rs33915217、 rs33915217、 rs33915217、 rs33956879、 rs33956879、 rs33956879、 rs33971440、 rs33971440、 rs33960103、 rs33960103、 rs35684407、 rs35578002、 rs33916412、 rs35424040、 rs33950507、 rs33950507、 rs33951465、 rs33959855、 rs33972047、 rs33986703、 rs3471601 rs63750783、 rs35799536、 rs33930702、 rs33930702、 rs33930702、 rs33941849、 rs33941849、 rs33941849、 rs34563000、 rs34135787、 rs34704828 、 rs63750628 、 rs34305195 以及位于人染色体 11上 5246716位、 5246879位、 5247026位、 5248161位的 SNP。 由此, 利用根据本发明实施例的确定个体是否患有异常状态的系统可以有效地实施前面所述的确 定个体是否患有异常状态的方法。 从而可以有效地判断所研究对象是否患有地中海贫血症 尤其是 β-地中海贫血症。 In one embodiment of the invention, the known SNP is located in the human chromosome HBB gene region. In one embodiment of the present invention, the known SNP is at least one selected from the group consisting of: rs33985472, rs63750954, rs63751128, rs33978907, rs34029390, rs34809925, rs33953406, rs33910569, rs33910569, rs33971634, rs3392539 rs33946267, rs35485099, rs36015961, rs33930977, rs35256489, rs33952266, rs33952266, rs33952266, rs33914668, rs33914668, rs33913413, rs33913413, rs34483965, rs34793594, rs35703285, rs63750433, rs34690599, rs63751175, rs35328027, rsl609812, rs34451549, rs7480526, rsl0768683, i rs35099082, rs63750283, rs63750283, rs33945777, rs33945777 , rs33945777, rs33933298, rs33913712, rs33995148, rs33931779, rs33969400, rs33922842, rs 11549407, rs33974936, rs33991059, rs33982568, rs33948578, rsl l35071, rs3394300 rs3394300 rs63750513, rs34527846, rs35456885, rs35004220, rs63750195, rs35724775, rs33915217, rs33915217, rs33915217, rs33956879 , rs33956879, rs33956879, rs3397 1440, rs33971440, rs33960103, rs33960103, rs35684407, rs35578002, rs33916412, rs35424040, rs33950507, rs33950507, rs33951465, rs33959855, rs33972047, rs33986703, rs3471601 rs63750783, rs35799536, rs33930702, rs33930702, rs33930702, rs33941849, rs33941849, rs33941849, rs34563000, rs34135787, rs34704828 , rs63750628, rs34305195 and SNPs located on human chromosome 11 at 5246716, 5246879, 5247026, and 5248161. Thus, the system for determining whether an individual has an abnormal state according to an embodiment of the present invention can effectively implement the foregoing A method of determining whether an individual has an abnormal state. Therefore, it can be effectively judged whether the subject has suffering from thalassemia, especially β-thalassemia.
这里所使用的术语 "相连,, 应作广义理解, 既可以是直接相连, 也可以是间接相连, 甚至可以使用相同的容器或设备, 只要能够实现功能上的衔接即可, 例如核酸样本提取和 核酸样本扩增可以在相同的设备中进行, 即在实现对核酸样本提取之后, 在相同的设备或 者容器中即可进行核酸样本扩增处理, 不需要将所提取的核酸样本组输送至其他的设备或 者容器, 只要将设备内的条件 (包括反应条件和反应体系的组成)转换为适于进行核酸样 本扩增反应即可, 这样即实现了核酸样本提取与核酸样本扩增在功能上的衔接, 也可以认 为被术语 "相连" 所涵盖。  The term "connected," as used herein, is to be understood broadly and can be either directly connected or indirectly connected, even using the same container or device, as long as functional linkages are possible, such as nucleic acid sample extraction and The nucleic acid sample amplification can be carried out in the same apparatus, that is, after the nucleic acid sample is extracted, the nucleic acid sample amplification processing can be performed in the same apparatus or container, and the extracted nucleic acid sample set does not need to be transported to other ones. Equipment or container, as long as the conditions within the device (including the reaction conditions and the composition of the reaction system) are converted to be suitable for nucleic acid sample amplification reaction, thus achieving the functional connection between nucleic acid sample extraction and nucleic acid sample amplification , can also be considered to be covered by the term "connected".
本领域技术人员能够理解的是, 前面关于确定个体是否患有异常状态的方法所描述的 特征和优点, 也当然地使用语本发明的确定个体是否患有异常状态的系统, 为了方便, 不 再赘述。  Those skilled in the art will appreciate that the features and advantages described above with respect to methods for determining whether an individual has an abnormal state, and of course the system of the present invention for determining whether an individual has an abnormal state, for convenience, are no longer Narration.
下面将结合实施例对本发明的方案进行解释。 本领域技术人员将会理解, 下面的实施 例仅用于说明本发明, 而不应视为限定本发明的范围。 实施例中未注明具体技术或条件的, 按照本领域内的文献所描述的技术或条件 (例如参考 J.萨姆布鲁克等著, 黄培堂等译的《分 子克隆实验指南》, 第三版, 科学出版社)或者按照产品说明书进行。 所用试剂或仪器未注 明生产厂商者, 均为可以通过市购获得的常规产品, 例如可以采购自 Illumina公司。  The solution of the present invention will be explained below in conjunction with the embodiments. Those skilled in the art will appreciate that the following examples are merely illustrative of the invention and are not to be considered as limiting the scope of the invention. In the examples, the specific techniques or conditions are not indicated, according to the techniques or conditions described in the literature in the field (for example, refer to J. Sambrook et al., Huang Peitang et al., Molecular Cloning Experimental Guide, Third Edition, Science Press) or in accordance with the product manual. The reagents or instruments used are not specified by the manufacturer, and are conventional products that are commercially available, for example, from Illumina.
一般方法 General method
作为非限制性实施例,在本发明的具体实施例中,参考图 6,采用包括下列步骤的方法, 对单细胞或者微量样本进行单基因疾病进行检测:  As a non-limiting example, in a particular embodiment of the invention, single-cell disease is detected on a single cell or a microsample using a method comprising the following steps:
S1 : 单细胞分离 /微量样本准备;  S1: single cell separation / micro sample preparation;
S2: 全基因组扩增;  S2: whole genome amplification;
S3: 测序样本准备 (文库制备);  S3: sequencing sample preparation (library preparation);
S4: 高通量测序;  S4: high throughput sequencing;
S5: 数据比对及质控, 获得有效数据集;  S5: Data comparison and quality control, obtaining a valid data set;
S6: SNP Calling;  S6: SNP Calling;
S7: SNP提取和过滤, 得到目标基因的分型结果;  S7: SNP extraction and filtration to obtain the typing result of the target gene;
S8: 疾病注释。  S8: Disease note.
实施例 1 Example 1
样本: 人 5-8细胞期卵裂球单细胞  Sample: human 5-8 cell stage blastomere single cell
具体操作流程: 1. 人 5-8细胞期卵裂球单细胞分离 Specific operational procedures: 1. Human 5-8 cell stage blastomere single cell separation
在体外受精-胚胎植入前遗传学诊断 (IVF-PGD ) 周期过程中, 精子和卵子在体外受精 后, 体外培养至第三天, 形成 5-8细胞期的卵裂球。 在第三天进行常规活检, 在显微操作仪 下, 取出一个卵裂球单细胞, 置于含有裂解液的 PCR管中, -80°C保存。 经活检后的卵裂球 继续培养至第五天, 到达嚢胚期, 可进行玻璃化冷冻, 或者直接用于植入。  During the in vitro fertilization-embryo preimplantation genetic diagnosis (IVF-PGD) cycle, sperm and eggs are fertilized in vitro and cultured in vitro to the third day to form blastomeres at 5-8 cell stage. On the third day, a conventional biopsy was performed. Under the micromanipulator, a blastomere single cell was taken out, placed in a PCR tube containing the lysate, and stored at -80 °C. The blastomere after biopsy continues to culture until the fifth day, reaching the embryonic stage, can be vitrified, or used directly for implantation.
2.全基因组扩增  2. Whole genome amplification
选用 Qiagen公司的 REPLI-g Mini Kit试剂盒, 按照制造商所提供的规程, 进行单细胞 全基因组扩增, 卵裂球单细胞首先进行碱裂解, 然后加入扩增反应液进行 30°C恒温扩增。  Single-cell whole-genome amplification was performed using Qiagen's REPLI-g Mini Kit kit according to the manufacturer's protocol. The blastomere single cells were first subjected to alkaline lysis, and then the amplification reaction solution was added for constant temperature expansion at 30 °C. increase.
3.测序样本准备 (文库构建)  3. Sequencing sample preparation (library construction)
采用 Illumina Paired-End DNA Sample Prep Kit, 按照制造商所提供的规程, 构建 DNA 测序文库。 针对前面所得到的, 卵裂球单细胞全基因组扩增产物进行建库, 共制备了三个 文库, 文库预期插入片段分别为 200bp, 350bp, 500bp, 实际插入片段大小见表 1。  A DNA sequencing library was constructed using the Illumina Paired-End DNA Sample Prep Kit according to the manufacturer's protocol. Three libraries were prepared for the blastomere single-cell whole genome amplification products obtained in the previous section. The expected inserts of the library were 200 bp, 350 bp, and 500 bp, respectively. The actual insert size is shown in Table 1.
4.高通量测序  4. High-throughput sequencing
采用 Illumina Hiseq2000测序系统进行高通量测序。 本实施例中对卵裂球单细胞采用全 基因组测序的策略。 卵裂球单细胞扩增产物制备好的文库经 cBot制备 Cluster, 之后即在 Hiseq2000测序仪运行, 测序长度 90bp, Pair End双向测序, 每个文库测一个 lane, —共测 了三个 lane。  High throughput sequencing was performed using the Illumina Hiseq 2000 sequencing system. In this example, a strategy for whole genome sequencing of blastomere single cells is employed. The well-prepared library of blastomere single-cell amplification products was prepared by cBot, and then run on the Hiseq2000 sequencer. The sequencing length was 90 bp, and the Pair End was sequenced in two directions. One lane was measured for each library, and three lanes were measured.
5. 数据比对及质控, 获得有效数据集  5. Data comparison and quality control, obtaining a valid data set
测序数据用 SOAP2软件进行比对, 以人参考基因组序列 (Hgl9; NCBI Build37)为参考, 比对时允许最多两个碱基的错配。 根据比对结果统计原始数据质量、 GC含量、 实际插入片 段大小、 比对率、 重复率, 以及基因组的覆盖度和测序深度等信息。 具体信息见表 1。 并 通过这些统计结果对测序数据进行质控。 本实施例中, 我们共获得全基因组 38.25X的数据 量, 数据各项统计结果均能达标。 之后将未能比对上参考基因组的数据和重复的数据去掉, 获得有效数据集, 用于 SNP分析。  Sequencing data were aligned using SOAP2 software and referenced to the human reference genome sequence (Hgl9; NCBI Build37), allowing up to two base mismatches when aligned. The raw data quality, GC content, actual insert fragment size, alignment rate, repetition rate, and genomic coverage and sequencing depth were calculated based on the alignment results. See Table 1 for specific information. The quality of the sequencing data is controlled by these statistical results. In this example, we obtained a total of 38.25X of the whole genome, and the statistical results of the data were all up to standard. After that, the data of the upper reference genome and the repeated data are not removed, and a valid data set is obtained for SNP analysis.
表 1实施例 1中卵裂球单细胞测序数据的比对统计结果 *  Table 1 Comparative results of blastomere single cell sequencing data in Example 1 *
Figure imgf000017_0001
样品 PE reads PE Unique Coverage Mean Depth Duplication Rate
Figure imgf000017_0001
Sample PE reads PE Unique Coverage Mean Depth Duplication Rate
BLSl 392.08M 382.26M (97.50%) 91.34% 14.21 0.70%BLSl 392.08M 382.26M (97.50%) 91.34% 14.21 0.70%
BLSl 368.13M 359.19M (97.57%) 91.71% 13.45 1.04%BLSl 368.13M 359.19M (97.57%) 91.71% 13.45 1.04%
BLSl 357.44M 348.37M (97.46%) 89.93% 12.95 0.58%BLSl 357.44M 348.37M (97.46%) 89.93% 12.95 0.58%
BLSl BLSl
1.12G 1.09G (97.51%) 97.04% 38.25 0.77% (total)  1.12G 1.09G (97.51%) 97.04% 38.25 0.77% (total)
标注:  Marking:
* BLSl是卵裂球单细胞样本的编号, BLSl(total)表示 BLSl样本三个 lane数据合在一 起后的统计结果, Q20(%)表示质量值在 20以上的数据占总数据量的比率, GC(%)表示测序 数据的实际 GC含量百分比, Insert Size表示测序数据实际的文库插入片段大小, Clean Reads 表示去除低质量值的 read后剩下的 read数据量, PE-alignment表示 Pair End两端都能比对 上参考基因组的 read数据量占总数据量比率, PE reads表示 Pair End两端都能比对上参考 基因组的 read数据量, PE Unique表示 Pair End两端能唯一比对到参考基因组的 read数据 量, Coverage表示全基因组的覆盖度 , Mean Depth表示全基因组的平均深度, Duplication Rate 表示重复 read数据占总 read数据量比率。  * BLSl is the number of single-cell samples of blastomeres, BLSl(total) is the statistical result of the combination of three lane data of BLSl sample, and Q20 (%) is the ratio of the data of quality value above 20 to the total amount of data. GC (%) indicates the actual GC content percentage of the sequencing data, Insert Size indicates the actual library insert size of the sequencing data, Clean Reads indicates the amount of read data remaining after the low-quality read is removed, and PE-alignment indicates both ends of the Pair End. Both can read the ratio of the read data of the reference genome to the total data volume, PE reads that both ends of the Pair End can compare the read data of the reference genome, and PE Unique means that both ends of the Pair End can be uniquely aligned to the reference genome. The amount of read data, Coverage represents the coverage of the whole genome, Mean Depth represents the average depth of the whole genome, and Duplication Rate represents the ratio of repeated read data to total read data.
6. SNP Calling  6. SNP Calling
本实施例用 SOAPsnp 软件, 按照 SOAPsnp 的使用说明 ( 参考 网 页 http://soa . genomics.org.cn/soapsn , html , 通过参照将其并入本文)对上述获得的有效数据集 进行 SNP Calling , 最后获得 SNP数据集。  This embodiment uses the SOAPsnp software to perform SNP Calling on the valid data sets obtained above according to the instructions of SOAPsnp (see http://soa.genomics.org.cn/soapsn, html, which is incorporated herein by reference). Finally, the SNP data set is obtained.
7. SNP提取和过滤, 得到目标基因的分型结果  7. SNP extraction and filtration to obtain the genotyping results of the target gene
对上述获得的 SNP数据集进行过滤, 过滤条件如下:  The SNP data set obtained above is filtered, and the filtering conditions are as follows:
a. SNP calling质量值大于 20;  a. SNP calling quality value is greater than 20;
b.该位点测序深度大于 8;  b. The sequencing depth of the site is greater than 8;
c.该位点深度小于基因组平均深度的 5倍;  c. The depth of the site is less than 5 times the average depth of the genome;
d.该位点拷贝数不大于 2;  d. The copy number of the site is not more than 2;
e.该 SNP与最近的 SNP之间的距离大于 5。  e. The distance between the SNP and the nearest SNP is greater than 5.
过滤之后的 SNP数据,针对所要检测的疾病相关基因,提取位于目标基因区内的 SNP。 在本实施例中对 β地中海贫血病的基因进行检测, 以检测对应的卵裂球胚胎是否患有 β地 中海贫血病, 或者是否为 β地中海贫血疾病基因的携带者。 因此本实施例提取了位于 11号 染色体 ΗΒΒ基因区的 SNP位点, 具体信息见表 2。  The filtered SNP data extracts SNPs located in the target gene region for the disease-related genes to be detected. In the present example, the gene for β-thalassemia is detected to detect whether the corresponding blastomere embryo has β-sea aquaemia, or whether it is a carrier of the β thalassemia disease gene. Therefore, in this example, the SNP locus located in the ΗΒΒ gene region of chromosome 11 was extracted, and the specific information is shown in Table 2.
表 2 实施例 1中获得的位于 ΗΒΒ基因区的 SNP位点信息 *
Figure imgf000018_0001
Table 2 SNP site information located in the ΗΒΒ gene region obtained in Example 1 *
Figure imgf000018_0001
Figure imgf000019_0001
Figure imgf000019_0001
/ZT0ZN3/X3d £606請 ΪΟΖ OAV /ZT0ZN3/X3d £606 Please ΪΟΖ OAV
Figure imgf000020_0001
Figure imgf000020_0001
/ZT0ZN3/X3d £606請 ΪΟΖ OAV chrl l 5248249 C CC C->A rs33930702 HBB chrl l 5248250 A AA A->G rs33941849 HBB chrl l 5248250 A AA A->C rs33941849 HBB chrl l 5248250 A AA A->T rs33941849 HBB chrl l 5248251 T TT T->C rs34563000 HBB chrl l 5248269 G GG G->C rs34135787 HBB chrl l 5248280 C CC C->T rs34704828 HBB chrl l 5248282 G GG G->A rs63750628 HBB chrl l 5248301 T TT T->G rs34305195 HBB 标注: /ZT0ZN3/X3d £606 Please ΪΟΖ OAV Chrl l 5248249 C CC C->A rs33930702 HBB chrl l 5248250 A AA A->G rs33941849 HBB chrl l 5248250 A AA A->C rs33941849 HBB chrl l 5248250 A AA A->T rs33941849 HBB chrl l 5248251 T TT T ->C rs34563000 HBB chrl l 5248269 G GG G->C rs34135787 HBB chrl l 5248280 C CC C->T rs34704828 HBB chrl l 5248282 G GG G->A rs63750628 HBB chrl l 5248301 T TT T->G rs34305195 HBB :
^Chromosome表示染色体号, Locus表示 SNP对应的碱基在染色体上的位点编号, Ref 表示数据库中人参考基因组上对应位点的碱基型别, Blastomere表示卵裂球单细胞数据中对 应 SNP位点的型别信息, Mutation表示数据库中存在的对应位点的突变类型, SNP ID表示 该 SNP位点在数据库中的 ID编号, Gene表示该 SNP位点位于哪个基因区。  ^Chromosome indicates the chromosome number, Locus indicates the site number of the base corresponding to the SNP on the chromosome, Ref indicates the base type of the corresponding site on the human reference genome in the database, and Blastomere indicates the corresponding SNP position in the blastomere single cell data. The type information of the point, Mutation indicates the mutation type of the corresponding site existing in the database, the SNP ID indicates the ID number of the SNP site in the database, and Gene indicates which gene region the SNP site is located in.
8. 疾病注释  8. Disease notes
根据上述过滤和提取的目标基因的 SNP信息, 进一步做疾病注释。 本实施例中, 在 11 号染色体的 5247141和 5247791位碱基分别发现了杂合 SNP变异,它们均位于 HBB基因区。 这两个位点的纯合突变可致 β地中海贫血病。 而本实施例中这两个位点为杂合突变, 说明 该单细胞对应的卵裂球为 β地中海贫血致病基因的携带者。  The disease annotation is further made based on the SNP information of the target gene filtered and extracted as described above. In this example, heterozygous SNP variants were found in the 5247141 and 5247791 bases of chromosome 11, respectively, which are located in the HBB gene region. Homozygous mutations at these two sites can cause beta thalassemia. In the present example, the two sites are heterozygous mutations, indicating that the blastomeres corresponding to the single cells are carriers of the β-thalassemia pathogenic gene.
实施例 2 Example 2
样本: 正常人血液单细胞  Sample: normal human blood single cell
具体操作流程:  Specific operational procedures:
1. 正常人血液单细胞分离  1. Normal human blood single cell separation
本实施例血液样本来自于一个表型正常的人个体。 抽取少量血液样本, 经离心, 分离 出白细胞层。 白细胞经 PBS洗涤后, 悬浮于 PBS小滴中, 用口吸管将单个白细胞分离, 置 于 1-2μ1碱性细胞裂解液中, -80°C冻存 30min以上。  The blood sample of this example is derived from a human individual with a normal phenotype. A small amount of blood sample was taken and centrifuged to separate the leukocyte layer. The leukocytes were washed with PBS, suspended in PBS droplets, and the individual leukocytes were separated by a mouth pipe, placed in 1-2 μl of alkaline cell lysate, and frozen at -80 ° C for more than 30 min.
2.全基因组扩增  2. Whole genome amplification
选用 Qiagen公司的 REPLI-g Mini Kit试剂盒, 按照制造商所提供的规程, 血液单细胞 经过 65 °C 10分钟处理进行碱裂解后, 加入扩增反应液进行 30°C恒温扩增。  Qiagen's REPLI-g Mini Kit kit was used. According to the manufacturer's protocol, blood single cells were subjected to alkaline lysis after treatment at 65 °C for 10 minutes, and then amplified reaction solution was added for constant temperature amplification at 30 °C.
3.芯片捕获文库构建  3. Chip capture library construction
血液单细胞样本全基因组扩增产物进行芯片捕获特定目标区域, 并同时进行文库构建。 本实施例中所用 Agilent芯片针对的目标区域一共为 2.1M大小, 包括一百种单基因疾病相 关基因的所有外显子区域。 关于 Agilent芯片的使用方法, 可以参考制造商所提供的规程。 将经过捕获并构建的测序文库进行高通量测序。 Whole-genome amplification products of blood single-cell samples are chip-captured to specific target regions, and library construction is performed simultaneously. The Agilent chip used in this example targets a target area of 2.1 M in size, including all exon regions of one hundred single gene disease-associated genes. For the method of using the Agilent chip, refer to the manufacturer's instructions. The captured and constructed sequencing libraries were subjected to high throughput sequencing.
4.高通量测序  4. High-throughput sequencing
本实施例采用 Illumina Hiseq2000测序系统进行高通量测序。 本实施例中对血液单细胞 采用芯片捕获测序的策略, 目标区域为一百种单基因疾病相关基因的所有外显子区域, 约 2.1M 大小。 血液单细胞扩增产物制备好的芯片捕获文库经 cBot 制备 Cluster, 之后即在 Hiseq2000测序仪运行, 测序长度 90bp, Pair End双向测序, 测序数据量预期为 1~2G碱基。  This example uses the Illumina Hiseq 2000 sequencing system for high throughput sequencing. In this embodiment, a strategy for chip capture sequencing of blood single cells is adopted, and the target region is all exon regions of one hundred single gene disease-related genes, about 2.1 M in size. A well-prepared chip capture library prepared by blood single-cell amplification products was prepared by cBot, and then run on a Hiseq2000 sequencer. The sequencing length was 90 bp, and Pair End was bidirectionally sequenced. The amount of sequencing data was expected to be 1 to 2 G bases.
5. 数据比对及质控, 获得有效数据集  5. Data comparison and quality control, obtaining a valid data set
测序数据用 SOAP2软件进行比对, 以人参考基因组序列 (Hgl9; NCBI Build37)为参考, 比对时允许最多两个碱基的错配。 根据比对结果统计原始数据质量、 GC含量、 比对率、 重 复率, 以及基因组的覆盖度和测序深度等信息。 具体信息见表 3. 并通过这些统计结果对测 序数据进行质控。 本实施例中, 我们共获得目标区域平均深度 457X的数据量, 数据各项统 计结果均能达标。 之后将未能比对上参考基因组的数据和重复的数据去掉, 获得有效数据 集, 用于 SNP分析。  Sequencing data were aligned using SOAP2 software and referenced to the human reference genome sequence (Hgl9; NCBI Build37), allowing up to two base mismatches when aligned. The raw data quality, GC content, alignment rate, repetition rate, and genomic coverage and sequencing depth were calculated based on the alignment results. The specific information is shown in Table 3. The quality of the measured data is controlled by these statistical results. In this embodiment, we obtain a total amount of data of 457X in the target area, and the statistical results of the data can reach the standard. After that, the data of the upper reference genome and the repeated data are not removed, and a valid data set is obtained for SNP analysis.
表 3 实施例 2中血液单细胞测序数据的比对统计结果  Table 3 Comparative results of blood single cell sequencing data in Example 2
Figure imgf000022_0001
Figure imgf000022_0001
标注:  Marking:
*GC(%)表示测序数据的实际 GC含量百分比, Reads表示去除低质量值的 read后剩下 的 read数据量, Production表示按照 Reads值计算的碱基数据量, PE-alignment表示 Pair End 两端都能比对上参考基因组的 read数据量占总数据量比率, PE Unique表示 Pair End两端能 唯一比对到参考基因组的 read数据量, Coverage表示全基因组的覆盖度, Mean Depth表示 全基因组的平均深度, Duplication Rate 表示重复 read 数据占总 read 数据量比率, Specificity(Reads)表示能特异比对上目标区域的 read 数据量占总 read 数据量的比率, Specificity(Bases)表示能特异比对上目标区域的碱基数据量占总碱基数据量的比率。  *GC (%) indicates the actual GC content percentage of the sequencing data, Reads indicates the amount of read data remaining after the read of the low quality value is removed, Production indicates the amount of base data calculated according to the Reads value, and PE-alignment indicates the both ends of the Pair End Both can compare the ratio of the read data of the reference genome to the total data volume, PE Unique indicates that the pair of Pair ends can uniquely compare the read data to the reference genome, Coverage indicates the coverage of the whole genome, and Mean Depth indicates the whole genome. The average depth, Duplication Rate indicates the ratio of the repeated read data to the total read data, and the specificity (Reads) indicates the ratio of the read data amount of the target area to the total read data. The specificity (Bases) indicates the specific alignment. The ratio of the amount of base data in the target area to the total amount of base data.
6. SNP Calling  6. SNP Calling
本实施例用 SOAPsnp 软件, 按照 SOAPsnp 的使用说明 ( 参考 网 页 http://soap.genoniics,org m''soapsnp.htrnl, 通过参照将其并入本文 )对上述获得的有效数据集 进行 SNP Calling , 最后获得 SNP数据集。 This embodiment uses the SOAPsnp software according to the instructions of SOAPsnp (see http://soap.genoniics, org m''soapsnp.htrnl, which is incorporated herein by reference) for the valid data set obtained above. Perform SNP Calling and finally obtain the SNP data set.
7. SNP提取和过滤, 得到目标基因的分型结果  7. SNP extraction and filtration to obtain the genotyping results of the target gene
对上述获得的 SNP数据集进行过滤, 过滤条件如下:  The SNP data set obtained above is filtered, and the filtering conditions are as follows:
a. SNP calling质量值大于 20;  a. SNP calling quality value is greater than 20;
b.该位点测序深度大于 8;  b. The sequencing depth of the site is greater than 8;
c.该位点深度小于目标区域平均深度的 5倍;  c. The depth of the site is less than 5 times the average depth of the target region;
d.该位点拷贝数不大于 2;  d. The copy number of the site is not more than 2;
e.该 SNP与最近的 SNP之间的距离大于 5。  e. The distance between the SNP and the nearest SNP is greater than 5.
过滤之后的 SNP数据,针对所要检测的疾病相关基因,提取位于目标基因区内的 SNP。 在本实施例中对 β地中海贫血病的基因进行检测, 以检测对应的卵裂球胚胎是否患有 β地 中海贫血病, 或者是否为 β地中海贫血疾病基因的携带者。 因此本实施例提取了位于 11号 染色体 ΗΒΒ基因区的 SNP位点, 具体信息见表 4。  The filtered SNP data extracts SNPs located in the target gene region for the disease-related genes to be detected. In the present example, the gene for β-thalassemia is detected to detect whether the corresponding blastomere embryo has β-sea aquaemia, or whether it is a carrier of the β thalassemia disease gene. Therefore, in this example, the SNP locus located in the ΗΒΒ gene region of chromosome 11 was extracted, and the specific information is shown in Table 4.
表 4实施例 2中位于 ΗΒΒ基因区的 SNP位点信息  Table 4 SNP site information located in the ΗΒΒ gene region in Example 2
Figure imgf000023_0001
Figure imgf000023_0001
Figure imgf000024_0001
Figure imgf000024_0001
0/ZT0ZN3/X3d £606請 ΪΟΖ OAV chrl l 5248158 A AA A->T rs33956879 HBB chrl l 5248159 C CC C->T rs33971440 HBB chrl l 5248159 C CC C->A rs33971440 HBB chrl l 5248160 C CC C->T rs33960103 HBB chrl l 5248160 C CC C->G rs33960103 HBB chrl 1 5248161 T TT T->G unknown HBB chrl 1 5248161 T TT T->C rs35684407 HBB chrl l 5248162 G GG G->A rs35578002 HBB chrl l 5248166 A AA A->C rs33916412 HBB chrl l 5248170 C CC C->A rs35424040 HBB chrl l 5248173 C CC C->T rs33950507 HBB chrl l 5248173 C CC C->A rs33950507 HBB chrl l 5248177 A AA A->T rs33951465 HBB chrl l 5248185 C CC C->A rs33959855 HBB chrl 1 5248193 T TT T->C rs33972047 HBB chrl 1 5248200 T TT T->A rs33986703 HBB chrl l 5248204 C CC C->T rs34716011 HBB chrl l 5248205 C CC C->T rs63750783 HBB chrl l 5248219 G GG G->T rs35799536 HBB chrl l 5248249 C CC C->T rs33930702 HBB chrl l 5248249 C CC C->G rs33930702 HBB chrl l 5248249 C CC C->A rs33930702 HBB chrl l 5248250 A AA A->G rs33941849 HBB chrl l 5248250 A AA A->C rs33941849 HBB chrl l 5248250 A AA A->T rs33941849 HBB chrl 1 5248251 T TT T->C rs34563000 HBB chrl l 5248269 G GG G->C rs34135787 HBB chrl l 5248280 C CC C->T rs34704828 HBB chrl l 5248282 G GG G->A rs63750628 HBB chrl 1 5248301 T TT T->G rs34305195 HBB 标注: 0/ZT0ZN3/X3d £606 Please ΪΟΖ OAV Chrl l 5248158 A AA A->T rs33956879 HBB chrl l 5248159 C CC C->T rs33971440 HBB chrl l 5248159 C CC C->A rs33971440 HBB chrl l 5248160 C CC C->T rs33960103 HBB chrl l 5248160 C CC C ->G rs33960103 HBB chrl 1 5248161 T TT T->G unknown HBB chrl 1 5248161 T TT T->C rs35684407 HBB chrl l 5248162 G GG G->A rs35578002 HBB chrl l 5248166 A AA A->C rs33916412 HBB chrl l 5248170 C CC C->A rs35424040 HBB chrl l 5248173 C CC C->T rs33950507 HBB chrl l 5248173 C CC C->A rs33950507 HBB chrl l 5248177 A AA A->T rs33951465 HBB chrl l 5248185 C CC C- >A rs33959855 HBB chrl 1 5248193 T TT T->C rs33972047 HBB chrl 1 5248200 T TT T->A rs33986703 HBB chrl l 5248204 C CC C->T rs34716011 HBB chrl l 5248205 C CC C->T rs63750783 HBB chrl l 5248219 G GG G->T rs35799536 HBB chrl l 5248249 C CC C->T rs33930702 HBB chrl l 5248249 C CC C->G rs33930702 HBB chrl l 5248249 C CC C->A rs33930702 HBB chrl l 5248250 A AA A-> G rs33941849 HBB chrl l 5248250 A AA A->C rs33941849 HBB chrl l 5248250 A AA A->T rs33941849 HBB chr 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ->G rs34305195 HBB
* Chromosome表示染色体号, Locus表示 SNP对应的碱基在染色体上的位点编号, Ref 表示数据库中人参考基因组上对应位点的碱基型别, Blood Cell表示血液单细胞数据中对应 SNP位点的型别信息, Mutation表示数据库中存在的对应位点的突变类型, SNP ID表示该 SNP位点在数据库中的 ID编号, Gene表示该 SNP位点位于哪个基因区。  * Chromosome indicates the chromosome number, Locus indicates the site number of the base corresponding to the SNP on the chromosome, Ref indicates the base type of the corresponding site on the human reference genome in the database, and Blood Cell indicates the corresponding SNP site in the blood single cell data. The type information, Mutation indicates the type of mutation of the corresponding site existing in the database, the SNP ID indicates the ID number of the SNP site in the database, and Gene indicates which gene region the SNP site is located in.
8. 疾病注释  8. Disease notes
上述过滤和提取的目标基因的 SNP信息, 进一步做疾病注释。 本实施例中, 未在 HBB 基因区检出任何杂合或纯合的变异位点, 说明本实施例中的血液单细胞对应的个体不是 β 地中海贫血疾病患者, 也不是 β地中海贫血疾病基因携带者, 其 β地中海贫血疾病基因区 为正常基因型。 至此, 本发明的实施例实现了一种基于高通量测序对单细胞或微量样本检测孟德尔遗 传病 (单基因疾病) 的方法。 The SNP information of the target gene filtered and extracted above is further subjected to disease annotation. In this example, no heterozygous or homozygous mutation site was detected in the HBB gene region, indicating that the individual corresponding to the blood single cell in the present example is not a beta thalassemia disease patient, nor is it a beta thalassemia disease gene carrier. The beta thalassemia disease gene region is a normal genotype. To this end, embodiments of the present invention enable a method for detecting Mendelian genetic diseases (single-gene diseases) on single-cell or micro-samples based on high-throughput sequencing.
工业实用性 Industrial applicability
本发明的, 能够有效地应用于样品 DNA的 DNA测序文库的构建以及测序, 并且获得 的文库质量好, 测序结果准确。  The invention can be effectively applied to the construction and sequencing of a DNA sequencing library of sample DNA, and the obtained library has good quality and accurate sequencing results.
尽管本发明的具体实施方式已经得到详细的描述, 本领域技术人员将会理解。 根据已 经公开的所有教导, 可以对那些细节进行各种修改和替换, 这些改变均在本发明的保护范 围之内。 本发明的全部范围由所附权利要求及其任何等同物给出。  Although specific embodiments of the invention have been described in detail, those skilled in the art will understand. Various modifications and alterations of those details are possible in light of the teachings of the invention. The full scope of the invention is given by the appended claims and any equivalents thereof.
在本说明书的描述中, 参考术语 "一个实施例"、 "一些实施例"、 "示意性实施例"、 "示 例"、 "具体示例"、 或 "一些示例" 等的描述意指结合该实施例或示例描述的具体特征、 结 构、 材料或者特点包含于本发明的至少一个实施例或示例中。 在本说明书中, 对上述术语 的示意性表述不一定指的是相同的实施例或示例。 而且, 描述的具体特征、 结构、 材料或 者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。  In the description of the present specification, the description of the terms "one embodiment", "some embodiments", "illustrative embodiment", "example", "specific example", or "some examples", etc. Particular features, structures, materials or features described in the examples or examples are included in at least one embodiment or example of the invention. In the present specification, the schematic representation of the above terms does not necessarily mean the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in a suitable manner in any one or more embodiments or examples.

Claims

权利要求书 claims
1、 一种确定个体是否患有异常状态的方法, 其特征在于, 包括: 1. A method for determining whether an individual suffers from an abnormal state, characterized by including:
针对所述个体的核酸样本, 构建测序文库; Construct a sequencing library for the nucleic acid sample of the individual;
对所述测序文库进行测序, 以便获得测序结果, 所述测序结果由多个测序数据构成; 基于所述测序数据, 确定所述测序结果中所包含的已知 SNP; 以及 Sequencing the sequencing library to obtain sequencing results, where the sequencing results are composed of multiple sequencing data; determining known SNPs included in the sequencing results based on the sequencing data; and
基于所述已知 SNP, 确定所述个体是否患有与所述已知 SNP相关的异常状态。 Based on the known SNP, it is determined whether the individual suffers from an abnormal condition associated with the known SNP.
2、 根据权利要求 1所述的方法, 其特征在于, 所述个体为人。 2. The method according to claim 1, characterized in that the individual is a human.
3、 根据权利要求 1所述的方法, 其特征在于, 所述异常状态为疾病。 3. The method according to claim 1, characterized in that the abnormal state is a disease.
4、 根据权利要求 3所述的方法, 其特征在于, 所述疾病为单基因疾病。 4. The method according to claim 3, characterized in that the disease is a single gene disease.
5、 根据权利要求 4所述的方法, 其特征在于, 所述单基因疾病为选自地中海贫血症和 红细胞葡萄糖 -6-磷酸脱氢酶缺陷症的至少一种。 5. The method according to claim 4, characterized in that the single gene disease is at least one selected from the group consisting of thalassemia and erythrocyte glucose-6-phosphate dehydrogenase deficiency.
6、 根据权利要求 5所述的方法, 其特征在于, 所述地中海贫血症为 β-地中海贫血症。 6. The method according to claim 5, wherein the thalassemia is β-thalassemia.
7、 根据权利要求 1 所述的方法, 其特征在于, 所述核酸样本为个体的全基因组 DNA 的至少一部分。 7. The method according to claim 1, characterized in that the nucleic acid sample is at least a part of the individual's whole genome DNA.
8、 根据权利要求 1所述的方法, 其特征在于, 所述核酸样本是从个体的单细胞或者微 量样本中提取的。 8. The method according to claim 1, characterized in that the nucleic acid sample is extracted from an individual's single cell or a trace sample.
9、 根据权利要求 8所述的方法, 其特征在于, 所述单细胞或者微量样本是从个体的选 自血液、 组织、 尿液、 配子、 受精卵、 卵裂球和胚胎的至少一种分离的。 9. The method according to claim 8, characterized in that the single cell or trace sample is isolated from at least one selected from the group consisting of blood, tissue, urine, gametes, fertilized eggs, blastomeres and embryos of the individual of.
10、 根据权利要求 9 所述的方法, 其特征在于, 所述单细胞是通过选自稀释法、 口吸 管分离法、 显 操作、 显 切割、 流式细胞分离术、 微流控的至少一种分离的。 10. The method according to claim 9, characterized in that the single cells are separated by at least one method selected from the group consisting of dilution method, mouth pipette separation method, visual manipulation, visual cutting, flow cytometry, and microfluidics. Detached.
11、 根据权利要求 1 所述的方法, 其特征在于, 针对所述个体的核酸样本, 构建测序 文库进一步包括: 11. The method according to claim 1, wherein constructing a sequencing library for the nucleic acid sample of the individual further includes:
对所述核酸样本进行扩增, 以便得到核酸样本扩增产物; 以及 Amplify the nucleic acid sample to obtain a nucleic acid sample amplification product; and
针对所述核酸样本扩增产物, 构建所述测序文库。 The sequencing library is constructed based on the nucleic acid sample amplification product.
12、 根据权利要求 11所述的方法, 其特征在于, 所述核酸样本为从个体的单细胞中提 取的全基因组 DNA, 12. The method according to claim 11, wherein the nucleic acid sample is whole genome DNA extracted from an individual's single cell,
其巾, Its scarf,
针对所述全基因组 DNA进行扩增是通过选自 PEP-PCR、 DOP-PCR、 OmniPlex WGA 和 MDA的至少一种进行的。 Amplification of the whole genome DNA is performed by at least one selected from the group consisting of PEP-PCR, DOP-PCR, OmniPlex WGA and MDA.
13、 根据权利要求 11所述的方法, 其特征在于, 在针对所述核酸样本扩增产物, 构建 所述测序文库之前, 进一步包括: 利用核酸探针对所述核酸样本扩增产物进行筛选, 以便获得来自预定区域的核酸样本 扩增产物; 以及 13. The method according to claim 11, characterized in that, before constructing the sequencing library for the nucleic acid sample amplification product, further comprising: Use nucleic acid probes to screen the nucleic acid sample amplification products to obtain nucleic acid sample amplification products from a predetermined region; and
针对所述来自预定区域的核酸样本扩增产物, 构建所述测序文库。 The sequencing library is constructed based on the nucleic acid sample amplification product from the predetermined region.
14、根据权利要求 13所述的方法, 其特征在于, 所述核酸探针是以芯片的形式提供的。 14. The method according to claim 13, characterized in that the nucleic acid probe is provided in the form of a chip.
15、根据权利要求 13所述的方法, 其特征在于, 所述预定区域为至少一个外显子区域。15. The method according to claim 13, characterized in that the predetermined region is at least one exon region.
16、根据权利要求 1所述的方法,其特征在于,所述测序是利用选自 Illumina Hiseq2000、 Genome Analyzer、 SOLiD测序系统、 Ion Torrent、 Ion Proton、 454、 PacBio RS测序系统、 Helicos tSMS技术以及纳米孔测序技术的至少一种进行的。 16. The method according to claim 1, characterized in that the sequencing is performed using a sequencing system selected from the group consisting of Illumina Hiseq2000, Genome Analyzer, SOLiD sequencing system, Ion Torrent, Ion Proton, 454, PacBio RS sequencing system, Helicos tSMS technology and Nano performed by at least one of the well sequencing technologies.
17、 根据权利要求 16所述的方法, 其特征在于, 所述测序是利用 Illumina Hiseq2000 进行的,所述测序数据的长度为 90bp。 17. The method according to claim 16, wherein the sequencing is performed using Illumina Hiseq2000, and the length of the sequencing data is 90 bp.
18、 根据权利要求 1 所述的方法, 其特征在于, 所述基于所述测序数据, 确定所述测 序结果中所包含的已知 SNP, 是通过将所述测序数据与参考基因进行比对进行的。 18. The method of claim 1, wherein determining the known SNPs contained in the sequencing results based on the sequencing data is performed by comparing the sequencing data with a reference gene. of.
19、根据权利要求 18所述的方法, 其特征在于, 所述参考基因是已知的人基因组序列。 19. The method of claim 18, wherein the reference gene is a known human genome sequence.
20、根据权利要求 18所述的方法,其特征在于,所述比对是 SOAP/SOAP2软件进行的。 20. The method according to claim 18, characterized in that the comparison is performed by SOAP/SOAP2 software.
21、 根据权利要求 20所述的方法, 其特征在于, 进一步包括对所述测序结果中所包含 的已知 SNP进行过滤, 所述过滤基于如下过滤条件: 21. The method according to claim 20, further comprising filtering known SNPs contained in the sequencing results, and the filtering is based on the following filtering conditions:
SNP calling质量值大于 20; SNP calling quality value is greater than 20;
SNP位点测序深度大于 8; SNP site sequencing depth is greater than 8;
SNP位点深度小于基因组平均深度的 5倍; The SNP site depth is less than 5 times the average depth of the genome;
SNP位点拷贝数不大于 2; 和 The copy number of the SNP locus is not greater than 2; and
SNP位点与最近的其他 SNP位点之间的距离大于 5。 The distance between the SNP site and the nearest other SNP site is greater than 5.
22、根据权利要求 1所述的方法, 其特征在于, 所述已知 SNP位于人染色体 HBB基因 区。 22. The method according to claim 1, characterized in that the known SNP is located in the HBB gene region of the human chromosome.
23、根据权利要求 22所述的方法,其特征在于,所述已知 SNP为选自下列的至少一种: rs33985472、 rs63750954、 rs63751128、 rs33978907、 rs34029390、 rs34809925、 rs33953406、 rs33910569、 rs33910569、 rs33971634、 rs3392539 rs33946267、 rs35485099、 rs36015961、 rs33930977、 rs35256489、 rs33952266、 rs33952266、 rs33952266、 rs33914668、 rs33914668、 rs33913413, rs33913413, rs34483965、 rs34793594、 rs35703285、 rs63750433、 rs34690599、 rs63751175、 rs35328027、 rsl609812、 rs34451549、 rs7480526、 rs 10768683、 rs35099082、 rs63750283、 rs63750283、 rs33945777、 rs33945777、 rs33945777、 rs33933298, rs33913712、 rs33995148、 rs33931779、 rs33969400、 rs33922842、 rsl 1549407、 rs33974936、 rs33991059、 rs33982568、 rs33948578、 rsl l3507 rs3394300 rs3394300 rs63750513、 rs34527846、 rs35456885、 rs35004220、 rs63750195、 rs35724775、 rs33915217、 rs33915217、 rs33915217、 rs33956879、 rs33956879、 rs33956879、 rs33971440、 rs33971440、 rs33960103、 rs33960103、 rs35684407、 rs35578002、 rs33916412、 rs35424040、 rs33950507、 rs33950507、 rs33951465、 rs33959855、 rs33972047、 rs33986703、 rs3471601 rs63750783、 rs35799536、 rs33930702、 rs33930702、 rs33930702、 rs33941849、 rs33941849、 rs33941849、 rs34563000、 rs34135787、 rs34704828、 rs63750628、 rs34305195以及位于人染色体 11上 5246716位、 5246879位、 5247026 位、 5248161位的 SNP。 23. The method according to claim 22, characterized in that the known SNP is at least one selected from the following: rs33985472, rs63750954, rs63751128, rs33978907, rs34029390, rs34809925, rs33953406, rs33910569, rs33910 569, rs33971634, rs3392539 rs33946267, rs35485099, rs36015961, rs33930977, rs35256489, rs33952266, rs33952266, rs33952266, rs33914668, rs33914668, rs33913413, rs3 107686 83. rs35099082, rs63750283 , rs63750283, rs33945777, rs33945777, rs33945777, rs33933298, rs33913712, rs33995148, rs33931779, rs33969400, rs33922842, rsl 1549407, rs33974936, rs33991059, rs33982568, rs33948578, rsl l3507 rs3394300 rs3394300 rs63750513, rs34527846, rs35456885, rs35004220, rs63750195, rs35724775, rs33915217 , rs33915217, rs33915217, rs33956879, rs33956879, rs33956879, rs33971440, rs33971440, rs33960103, rs33960103, rs35684407, rs35578002, rs3 3916412, rs35424040, rs33950507, rs33950507, rs33951465, rs33959855, rs33972047, rs33986703, rs3471601 rs63750783, rs35799536, rs33930702, rs33930702, rs33930702, rs339 41849, rs33941849, rs33941849, rs34563000, rs34135787, rs34704828, rs63750628, rs34305195 and located on human chromosome 11 at positions 5246716, 5246879 and 5247026 , 5248161 SNPs.
24、 一种确定个体是否患有异常状态的系统, 其特征在于, 包括: 24. A system for determining whether an individual suffers from an abnormal state, characterized by: including:
测序文库构建装置, 所述测序文库构建装置用于针对所述个体的核酸样本, 构建测序 文库; A sequencing library construction device, the sequencing library construction device is used to construct a sequencing library for the nucleic acid sample of the individual;
测序装置, 所述测序装置与所述测序文库构建装置相连, 用于对所述测序文库进行测 序, 以便获得测序结果, 所述测序结果由多个测序数据构成; A sequencing device, the sequencing device is connected to the sequencing library construction device, and is used to sequence the sequencing library to obtain a sequencing result, where the sequencing result is composed of multiple sequencing data;
SNP确定装置, 所述 SNP确定装置与所述测序装置相连, 用于基于所述测序数据, 确 定所述测序结果中所包含的已知 SNP; 以及 SNP determination device, the SNP determination device is connected to the sequencing device, and is used to determine the known SNP included in the sequencing result based on the sequencing data; and
异常状态确定装置, 所述异常装置确定装置与所述 SNP确定装置相连, 用于基于所述 已知 SNP , 确定所述个体是否患有与所述已知 SNP相关的异常状态。 Abnormal state determining device, the abnormal device determining device is connected to the SNP determining device, and is used to determine whether the individual suffers from an abnormal state related to the known SNP based on the known SNP.
25、 根据权利要求 24所述的系统, 其特征在于, 进一步包括: 25. The system according to claim 24, further comprising:
核酸样本提取装置, 所述核酸样本提取装置适于从个体的单细胞或者微量样本中提取 个体的全基因组 DNA的至少一部分。 Nucleic acid sample extraction device, the nucleic acid sample extraction device is suitable for extracting at least a part of the individual's whole genome DNA from the individual's single cells or trace samples.
26、 根据权利要求 25所述的系统, 其特征在于, 进一步包括: 26. The system according to claim 25, further comprising:
生物样本分离装置, 所述生物样本分离装置适于从个体的选自血液、 组织、 尿液、 配 子、 受精卵、 卵裂球和胚胎的至少一种分离单细胞或者 ί量样本。 A biological sample separation device, the biological sample separation device is adapted to separate single cells or Z amount of samples from at least one selected from the group consisting of blood, tissue, urine, gametes, fertilized eggs, blastomeres and embryos of an individual.
27、 根据权利要求 26所述的系统, 其特征在于, 所述生物样本分离装置适于通过选自 稀释法、 口吸管分离法、 显 操作、 显 切割、 流式细胞分离术、 流控的至少一种分离 单细胞或者微量样本。 27. The system according to claim 26, characterized in that the biological sample separation device is suitable for at least one selected from the group consisting of dilution method, mouth pipette separation method, visualization operation, visualization cutting, flow cytometry, and flow control. A method of isolating single cells or trace samples.
28、 根据权利要求 24所述的系统, 其特征在于, 所述测序文库构建装置进一步包括: 核酸样本扩增单元, 所述核酸样本扩增单元适于对所述核酸样本进行扩增, 以便得到 核酸样本扩增产物。 28. The system according to claim 24, wherein the sequencing library construction device further includes: a nucleic acid sample amplification unit, the nucleic acid sample amplification unit is adapted to amplify the nucleic acid sample to obtain Nucleic acid sample amplification products.
29、根据权利要求 24所述的系统,其特征在于,所述扩增单元适于进行选自 PEP-PCR、 29. The system of claim 24, wherein the amplification unit is adapted to perform a process selected from the group consisting of PEP-PCR,
DOP-PCR、 OmniPlex WGA和 MDA的至少一种。 At least one of DOP-PCR, OmniPlex WGA and MDA.
30、 根据权利要求 29所述的系统, 其特征在于, 所述测序文库构建装置进一步包括: 筛选单元, 所述歸选单元中设置有核酸探针 , 以便利用所述核酸探针对所述核酸样本 扩增产物进行筛选, 以便获得来自预定区域的核酸样本扩增产物; 以及 30. The system according to claim 29, characterized in that the sequencing library construction device further includes: a screening unit, and a nucleic acid probe is provided in the sorting unit to use the nucleic acid probe to detect the nucleic acid. Screening the sample amplification products to obtain nucleic acid sample amplification products from a predetermined region; and
针对所述来自预定区域的核酸样本扩增产物, 构建所述测序文库。 The sequencing library is constructed based on the nucleic acid sample amplification product from the predetermined region.
31、根据权利要求 30所述的系统, 其特征在于, 所述核酸探针是以芯片的形式提供的。 31. The system according to claim 30, characterized in that the nucleic acid probe is provided in the form of a chip.
32、根据权利要求 24所述的系统,其特征在于,所述测序装置为选自 Illumina Hiseq2000、 Genome Analyzer、 SOLiD测序系统、 Ion Torrent、 Ion Proton、 454、 PacBio RS测序系统、 Helicos tSMS测序装置以及纳米孔测序装置的至少一种。 32. The system according to claim 24, wherein the sequencing device is selected from the group consisting of Illumina Hiseq2000, Genome Analyzer, SOLiD sequencing system, Ion Torrent, Ion Proton, 454, PacBio RS sequencing system, Helicos tSMS sequencing device, and At least one nanopore sequencing device.
33、 根据权利要求 24所述的系统, 其特征在于, 所述 SNP确定装置进一步包括: 比对单元, 所述比对单元用于通过将所述测序数据与参考基因进行比对确定所述测序 结果中所包含的已知 SNP。 33. The system according to claim 24, characterized in that the SNP determination device further includes: a comparison unit, the comparison unit is used to determine the sequencing data by comparing the sequencing data with a reference gene. Known SNPs included in the results.
34、 根据权利要求 33所述的系统, 其特征在于, 所述比对单元适于利用 SOAP/SOAP2 软件进行比对。 34. The system according to claim 33, characterized in that the comparison unit is adapted to use SOAP/SOAP2 software for comparison.
35、 根据权利要求 33所述的系统, 其特征在于, 所述 SNP确定装置进一步包括: SNP过滤单元, 所述 SNP过滤单元适于基于如下过滤条件, 对所述测序结果中所包含 的已知 SNP进行过滤: 35. The system according to claim 33, characterized in that the SNP determination device further includes: a SNP filtering unit, the SNP filtering unit is adapted to filter known genes contained in the sequencing results based on the following filtering conditions. SNP filtering:
SNP calling质量值大于 20; SNP calling quality value is greater than 20;
SNP位点测序深度大于 8; SNP site sequencing depth is greater than 8;
SNP位点深度小于基因组平均深度的 5倍; The SNP site depth is less than 5 times the average depth of the genome;
SNP位点拷贝数不大于 2; 和 The copy number of the SNP locus is not greater than 2; and
SNP位点与最近的其他 SNP位点之间的距离大于 5。 The distance between the SNP site and the nearest other SNP site is greater than 5.
36、 根据权利要求 24所述的系统, 其特征在于, 所述已知 SNP位于人染色体 HBB基 因区。 36. The system according to claim 24, characterized in that the known SNP is located in the HBB gene region of the human chromosome.
37、根据权利要求 36所述的系统,其特征在于,所述已知 SNP为选自下列的至少一种: rs33985472、 rs63750954、 rs63751128、 rs33978907、 rs34029390、 rs34809925、 rs33953406、 rs33910569、 rs33910569、 rs33971634、 rs3392539 rs33946267、 rs35485099、 rs36015961、 rs33930977、 rs35256489、 rs33952266、 rs33952266、 rs33952266、 rs33914668、 rs33914668、 rs33913413, rs33913413、 rs34483965、 rs34793594、 rs35703285、 rs63750433、 rs34690599、 rs63751175、 rs35328027、 rsl609812、 rs34451549、 rs7480526、 rsl0768683、 rs35099082、 rs63750283、 rs63750283、 rs33945777、 rs33945777、 rs33945777、 rs33933298、 rs33913712、 rs33995148、 rs33931779、 rs33969400、 rs33922842、 rsll549407、 rs33974936、 rs33991059、 rs33982568、 rs33948578、 rsl l3507 rs3394300 rs3394300 rs63750513、 rs34527846、 rs35456885、 rs35004220、 rs63750195、 rs35724775、 rs33915217、 rs33915217、 rs33915217、 rs33956879、 rs33956879、 rs33956879、 rs33971440、 rs33971440、 rs33960103、 rs33960103、 rs35684407、 rs35578002、 rs33916412、 rs35424040、 rs33950507、 rs33950507、 rs33951465、 rs33959855、 rs33972047、 rs33986703、 rs3471601 rs63750783、 rs35799536、 rs33930702、 rs33930702、 rs33930702、 rs33941849、 rs33941849、 rs33941849、 rs34563000、 rs34135787、 rs34704828、 rs63750628、 rs34305195以及位于人染色体 11上 5246716位、 5246879位、 5247026 位、 5248161位的 SNP。 37. The system according to claim 36, characterized in that the known SNP is at least one selected from the following: rs33985472, rs63750954, rs63751128, rs33978907, rs34029390, rs34809925, rs33953406, rs33910569, rs33910 569, rs33971634, rs3392539 rs33946267, rs35485099, rs36015961, rs33930977, rs35256489, rs33952266, rs33952266, rs33952266, rs33914668, rs33914668, rs33913413, rs3 3913413, rs34483965, rs34793594, rs35703285, rs63750433, rs34690599, rs63751175, rs35328027, rsl609812, rs34451549, rs7480526, rsl076868 3. rs35099082, rs63750283, rs63750283, rs33945777, rs33945777, rs33945777, rs33933298, rs33913712, rs33995148, rs33931779, rs33969400, rs33922842, rsll549407, rs33 974936, rs33991059, rs33982568, rs33948578, rsl l3507 rs3394300 rs3394300 rs63750513, rs34527846, rs35456885, rs35004220, rs63750195, rs35724775, rs33915217 , rs33915217, rs33915217, rs33956879, rs33956879, rs33956879, rs33971440, rs33971440, rs33960103, rs33960103, rs35684407, rs35578002, rs3 3916412, rs35424040, rs33950507, rs33950507, rs33951465, rs33959855, rs33972047, rs33986703, rs3471601 rs63750783, rs35799536, rs33930702, rs33930702, rs33930702, rs339 41849, rs33941849, rs33941849, rs34563000, rs34135787, rs34704828, rs63750628, rs34305195 and located on human chromosome 11 at positions 5246716, 5246879 and 5247026 , 5248161 SNPs.
PCT/CN2012/080500 2012-08-23 2012-08-23 Method and system for determining whether individual is in abnormal state WO2014029093A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201280074982.8A CN104508141A (en) 2012-08-23 2012-08-23 Method and system for determining whether individual is in abnormal state
PCT/CN2012/080500 WO2014029093A1 (en) 2012-08-23 2012-08-23 Method and system for determining whether individual is in abnormal state
HK15109589.1A HK1208889A1 (en) 2012-08-23 2015-09-29 Method and system for determining whether individual is in abnormal state

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2012/080500 WO2014029093A1 (en) 2012-08-23 2012-08-23 Method and system for determining whether individual is in abnormal state

Publications (1)

Publication Number Publication Date
WO2014029093A1 true WO2014029093A1 (en) 2014-02-27

Family

ID=50149356

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/080500 WO2014029093A1 (en) 2012-08-23 2012-08-23 Method and system for determining whether individual is in abnormal state

Country Status (3)

Country Link
CN (1) CN104508141A (en)
HK (1) HK1208889A1 (en)
WO (1) WO2014029093A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018133546A1 (en) * 2017-01-19 2018-07-26 人和未来生物科技(长沙)有限公司 CONSTRUCTION METHOD, DETECTION METHOD AND KIT FOR NON-INVASIVE PRENATAL FETAL α-THALASSEMIA GENE MUTATION DETECTION LIBRARY
WO2018133547A1 (en) * 2017-01-19 2018-07-26 人和未来生物科技(长沙)有限公司 METHOD FOR CONSTRUCTING LIBRARY FOR NON-INVASIVE PRENATAL FETAL β-THALASSEMIA GENE MUTATION DETECTION, DETECTION METHOD AND KIT
CN110093413A (en) * 2019-04-09 2019-08-06 深圳市卫生健康发展研究中心 Detect the primer sets and kit of beta Thalassemia

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101374963A (en) * 2005-12-22 2009-02-25 凯津公司 Method for high-throughput AFLP-based polymorphism detection

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101374963A (en) * 2005-12-22 2009-02-25 凯津公司 Method for high-throughput AFLP-based polymorphism detection

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LAM, K.W.G. ET AL.: "Noninvasive Prenatal Diagnosis of Monogenic Diseases by Targeted Massively Parallel Sequencing of Maternal Plasma: Application to beta Thalassemia.", CLINICAL CHEMISTRY, vol. 58, no. 10, 15 August 2012 (2012-08-15), pages 1 - 9 *
SABATH, D.E. ET AL.: "A Multiplex Approach to the Molecular Diagnosis of B-Thalassemia.", THE JOURNAL OF MOLECULAR DIAGNOSTICS., vol. 13, no. 4, 2011, pages 369 - 370 *
WEI, XIAOMING ET AL.: "Identification of Sequence Variants in Genetic Disease-Causing Genes Using Targeted Next-Generation Sequencing.", PLOS ONE., vol. 6, no. 12, 21 December 2011 (2011-12-21), pages 1 - 10 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018133546A1 (en) * 2017-01-19 2018-07-26 人和未来生物科技(长沙)有限公司 CONSTRUCTION METHOD, DETECTION METHOD AND KIT FOR NON-INVASIVE PRENATAL FETAL α-THALASSEMIA GENE MUTATION DETECTION LIBRARY
WO2018133547A1 (en) * 2017-01-19 2018-07-26 人和未来生物科技(长沙)有限公司 METHOD FOR CONSTRUCTING LIBRARY FOR NON-INVASIVE PRENATAL FETAL β-THALASSEMIA GENE MUTATION DETECTION, DETECTION METHOD AND KIT
CN110093413A (en) * 2019-04-09 2019-08-06 深圳市卫生健康发展研究中心 Detect the primer sets and kit of beta Thalassemia

Also Published As

Publication number Publication date
CN104508141A (en) 2015-04-08
HK1208889A1 (en) 2016-03-18

Similar Documents

Publication Publication Date Title
US20230295716A1 (en) Laboratory execution and automation systems
Hu et al. Mutation screening in 86 known X-linked mental retardation genes by droplet-based multiplex PCR and massive parallel sequencing
US9624490B2 (en) Multiplexed sequential ligation-based detection of genetic variants
KR101850437B1 (en) Method for predicting transplantation rejection using next generation sequencing
Tsai et al. Amplification-free, CRISPR-Cas9 targeted enrichment and SMRT sequencing of repeat-expansion disease causative genomic regions
Precone et al. Cracking the code of human diseases using next-generation sequencing: applications, challenges, and perspectives
CN112037860B (en) Statistical analysis for non-invasive chromosome aneuploidy determination
CN113330121A (en) Method for circulating cell analysis
López-Girona et al. CRISPR-Cas9 enrichment and long read sequencing for fine mapping in plants
Hahn et al. Recent progress in non-invasive prenatal diagnosis
WO2013052557A2 (en) Methods for preimplantation genetic diagnosis by sequencing
JP2014507164A (en) Method and system for haplotype determination
WO2013130848A1 (en) Informatics enhanced analysis of fetal samples subject to maternal contamination
JP2014502845A (en) Noninvasive prenatal testing
TR201807917T4 (en) Methods for determining the fraction of fetal nucleic acids in maternal samples.
JP2008526247A5 (en)
WO2013086744A1 (en) Method and system for determining whether genome is abnormal
WO2017193044A1 (en) Noninvasive prenatal diagnostic
WO2015042980A1 (en) Method, system, and computer-readable medium for determining snp information in a predetermined chromosomal region
WO2012027483A2 (en) Defining diagnostic and therapeutic targets of conserved free floating fetal dna in maternal circulating blood
EP2885445A1 (en) Methods and compositions for reducing genetic library contamination
CA3176541A1 (en) Single step sample preparation for next generation sequencing
WO2013143133A1 (en) Whole genome amplification method and application thereof
EP2978860B1 (en) Assessment of risk of aneuploidy
WO2014029093A1 (en) Method and system for determining whether individual is in abnormal state

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12883246

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 02/07/2015)

122 Ep: pct application non-entry in european phase

Ref document number: 12883246

Country of ref document: EP

Kind code of ref document: A1