CN105051208B - Method, system, and computer readable medium for determining base information of predetermined area in fetal genome - Google Patents

Method, system, and computer readable medium for determining base information of predetermined area in fetal genome Download PDF

Info

Publication number
CN105051208B
CN105051208B CN201380074395.3A CN201380074395A CN105051208B CN 105051208 B CN105051208 B CN 105051208B CN 201380074395 A CN201380074395 A CN 201380074395A CN 105051208 B CN105051208 B CN 105051208B
Authority
CN
China
Prior art keywords
embryo
sequencing
embryonic
gene group
base information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201380074395.3A
Other languages
Chinese (zh)
Other versions
CN105051208A (en
Inventor
殷旭阳
蒋慧
陈盛培
龚淳
陈芳
张春雷
潘小瑜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Huada medical laboratory Co.,Ltd.
Original Assignee
BGI Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BGI Shenzhen Co Ltd filed Critical BGI Shenzhen Co Ltd
Publication of CN105051208A publication Critical patent/CN105051208A/en
Application granted granted Critical
Publication of CN105051208B publication Critical patent/CN105051208B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Abstract

A method, a system, and a computer readable medium for determining base information of a predetermined area in a fetal genome are provided. The method for determining base information of a predetermined area in a fetal genome comprises the following steps: obtaining a sequencing result of a DNA sample of a genome of an embryo cell, and a sequencing result of a genome sample of an individual related to embryo heredity; constructing a heredity sketch of the embryo based on the sequencing result of the DNA sample of the genome of the embryo cell, to determine an initial genotype of the embryo; determining haplotypes of parents of the embryo based on the sequencing result of the genome sample of the individual related to the embryo heredity; and using the initial genotype of the embryo as an observation sequence according to the hidden markov model, and determining the base information of the predetermined area in the fetal genome based on the haplotypes of the parents of the embryo.

Description

Determine the method for presumptive area base information, system and computer in embryonic gene group Computer-readable recording medium
Priority information
Nothing
Technical field
The present invention relates to determine that the method for presumptive area base information in embryonic gene group, system and computer-readable are situated between Matter.
Background technology
Hereditary is the disease caused because hereditary material changes, with congenital, familial, lifelong participation With genetic feature.Hereditary can be divided into 3 big class:Single gene inheritance disease, disease of multifactorial inheritance and chromosomal abnormality. Wherein monogenic disease is more due to dominant or recessive inheritance's gene function exception of single Disease-causing gene;And disease of multifactorial inheritance It is then the disease caused by multiple gene variations affect, can be affected by outside environmental elements to a certain extent;Chromosome It is abnormal to include numerical abnormality and textural anomaly, the most common be due to the Down's syndrome of No. 21 chromosome trisomy, Infant shows as other congenital features such as mongolism and limbs shape anomaly.Due to there is no effectively to hereditary at present Therapeutic modality, can only pointedly be supported treatment or agents alleviate, somewhat expensive, to society and family bring heavy Economy and mental burden.Therefore, in child just the disease state of child detected before birth, prevention work is carried out, to reach It is very necessary to the purpose of prenatal and postnatal care.For example in vitro in reproductive process, before Embryonic limb bud cell uterus, to embryo Disease state detected, will be significant.Hereditary credit is carried out to gamete or the embryo being implanted to before cavity of uterus Analysis, will exclude the fetal tissues implantation uterus of paathogenic factor, so as to prevent the gestation of the child with heredopathia.To with high wind The Mr. and Mrs of danger gestation genetic defect embryo implement the detection of the disease state of preimplantation embryo, undoubtedly can effectively reduce ill child Birth, and avoid many harm that selective abortion brings to anemia of pregnant woman and family.
However, current coherent detection means still have much room for improvement.
The content of the invention
The present invention is completed based on the following discovery of inventor:
Inventor's discovery, can sample the cell number for being detected with biopsy limited in preimplantation embryo.For example cultivate to The blastomere of three days only has 4-8 cell, can carry out only 1-2 cell of biopsy sampling.Even if in culture to the 5th day Blastaea, carrying out outer trophoblastic cell biopsy sampling can only also get 3-10 cell.Due to single or minority embryonic cell DNA Amount is limited, it is difficult to directly carry out comprehensive hereditary variation detection, it is necessary to through whole genome amplification (Whole Genome Amplification, WGA), enough amount of DNA are reached, analysis comprehensively could be done to hereditary variation.And whole genome amplification is often deposited In Preference, the phenomenons such as certain heterozygous deletion or allele dropout (Allele Drop Out) can be produced, these are all embryo The related hereditary variation detection band of the heredopathia of fetus cells carrys out risk.
Therefore, the side that the mistakes such as the allele dropout that necessary exploitation can cause to amplification are positioned and corrected Method, to improve the accuracy of hereditary variation detection.
It is contemplated that at least solving one of technical problem present in prior art.Embryo is calculated by the inventive method Haplotype, can position to the site that allele dropout occurs, and can be corrected.And embryo's haplotype point Analysis method can be based on the hereditary information of family member such as father and mother and proband, or based on other embryos from same father and mother The hereditary information of cell is calculated.
In one aspect of the invention, the present invention proposes presumptive area base information in a kind of determination embryonic gene group Method.The method is inferred with statistic algorithm to carry out the determination of embryo's haplotype with reference to pedigree, by related individuals in family Genotype, that is, review the transmission of chromosome segment, and haplotype state is inferred with reference to statistic algorithm.Enforcement of the invention Example, the method comprises the following steps:Obtain the sequencing result of embryonic cell genomic DNA sample, and related of embryo genetic The sequencing result of body genome sample;Based on the sequencing result of embryonic cell genomic DNA sample, the heredity of the embryo is built Sketch, to determine embryo's initial gene type;Based on the sequencing result of the embryo genetic related individuals genome sample, it is determined that The haplotype of embryo father and mother;And according to hidden Markov model, using embryo's initial gene type as observation sequence, base In the haplotype of the embryo father and mother, the base information of presumptive area in embryonic gene group is determined.
It should be noted that the genome forming process of filial generation, (connects equivalent to once recombinating at random for parental gene group Lock exchanges haplotype restructuring, and the random combine of gamete).Using the haplotype of embryo as hidden state (hidden States), using the sequencing data after the unicellular whole genome amplification of embryo as observation sequence (observations), using shellfish This algorithm of leaf extrapolates state transition probability (transition probabilities) by priori data, builds observation sequence Probability distribution (observation symbol probabilities) and initial state probabilities distribution (initial state Distribution), then most probable embryo's monomer can be inferred that than algorithm (Viterbi algorithm) by Hui Te Type is combined.Thus, embodiments in accordance with the present invention by hidden Markov model, for example, can compare algorithm by using Hui Te (Viterbi algorithm), with reference to the hereditary information of embryo genetic related individuals, it may be determined that given zone in embryonic gene group The nucleotide sequence in domain, thus, it is possible to effectively carry out being implanted into front detection to the hereditary information of embryonic gene group.It is thin in embryo Level of coverage is relatively low in born of the same parents' whole genome amplification DNA sequencing result, there is heterozygous deletion or allele dropout (Allele Drop Out) site or DNA sequence, can exactly be inferred by this method and be obtained.Therefore by this method, can be to miscellaneous The site or sequence of conjunction disappearance or allele dropout (Allele Drop Out) is corrected, and makes testing result more accurate It is reliable.
In still another aspect of the invention, the present invention proposes a kind of for determining presumptive area base letter in embryonic gene group The system of breath.Embodiments in accordance with the present invention, the system includes:Library construction device, the library construction device is suitable to be directed to The genomic DNA sample of embryonic gene group DNA sample and embryo genetic related individuals, builds respectively sequencing library;Sequencing dress Put, the sequencing device is connected with the library construction device, and be suitable to that the sequencing library is sequenced, to obtain Embryo and the sequencing result of embryo genetic related individuals;And analytical equipment, the analytical equipment and the sequencing device phase Connect, and be suitable to:Based on the sequencing result of embryonic gene group DNA sample, the hereditary sketch of the embryo is built, to determine embryo Tire initial gene type;Based on the sequencing result of the embryo genetic related individuals genome sample, single times of embryo father and mother is determined Build;And according to hidden Markov model, using embryo's initial gene type as observation sequence, based on the embryo father and mother Haplotype, determine the base information of presumptive area in embryonic gene group.Embodiments in accordance with the present invention, can also be further Including embryo biopsy device, the embryo biopsy device is suitable to carry out biopsy for the embryo of In vitro culture, and embryonic cell is entered Row sampling.Using the system, can effectively implement presumptive area base information in foregoing determination embryonic gene group Method, can be by hidden Markov model, for example can be by using Hui Te than algorithm (Viterbi algorithm), ginseng Examine the hereditary information of embryo genetic related individuals, it may be determined that the nucleotide sequence of specific region in embryonic gene group, thus, it is possible to Effectively the hereditary information of embryonic gene group is carried out being implanted into front detection, such that it is able to effectively to the heredity letter of embryonic gene group Breath is determined.
In another aspect of this invention, the invention allows for a kind of computer-readable medium.Enforcement of the invention Example, based on the sequencing result of embryonic cell genomic DNA sample, builds the hereditary sketch of embryo, to determine embryo's just primordium Because of type;Based on the sequencing result of the embryo genetic related individuals genome sample, the haplotype of embryo father and mother is determined;And So that according to hidden Markov model, embryo's initial gene type is used as observation sequence, the monoploid based on the embryo father and mother Type, determines the base information of presumptive area in embryonic gene group.Using the present invention computer-readable medium, can effectively by Computing device its storage instruction, so that by hidden Markov model, such as algorithm can be compared by using Hui Te (Viterbi algorithm), based on the sequencing result of embryonic cell, with reference to the hereditary information of embryo genetic related individuals, can The nucleotide sequence of specific region in determine embryonic gene group, thus, it is possible to effectively enter to the hereditary information of embryonic gene group Detection before row implantation.
The additional aspect and advantage of the present invention will be set forth in part in the description, and partly will become from the following description Obtain substantially, or recognized by the practice of the present invention.
Description of the drawings
The above-mentioned and/or additional aspect and advantage of the present invention will become from the description with reference to accompanying drawings below to embodiment It is substantially and easy to understand, wherein:
Fig. 1 is the schematic flow sheet being analyzed according to the utilization hidden Markov model of one embodiment of the invention;With And
Fig. 2 be according to one embodiment of the present of invention for determining embryonic gene group in presumptive area nucleotide sequence be The structural representation of system.
Detailed description of the Invention
Embodiments of the invention are described below in detail, the example of the embodiment is shown in the drawings, wherein from start to finish Same or similar label represents same or similar element or the element with same or like function.Below with reference to attached The embodiment of figure description is exemplary, is only used for explaining the present invention, and is not considered as limiting the invention.
It should be noted that term " first ", " second " are only used for describing purpose, and it is not intended that indicating or implying phase To importance or the implicit quantity for indicating indicated technical characteristic.Thus, " first " is defined, the feature of " second " can be with Express or implicitly include one or more this feature.Further, in describing the invention, unless otherwise saying Bright, " multiple " are meant that two or more.
The method for determining presumptive area base information in embryonic gene group
In a first aspect of the present invention, the present invention proposes presumptive area base information in a kind of determination embryonic gene group Method.Embodiments in accordance with the present invention, the method comprises the following steps:
First, the sequencing result of embryonic cell genomic DNA sample, and embryo genetic related individuals genome sample are obtained This sequencing result.Wherein, the term " embryo genetic related individuals " for being used herein is referred in hereditary meaning, with Have akin individuality, such as embodiments in accordance with the present invention between embryo, " related of the embryo genetic that can be adopted Body " is other children of parental generation such as father and mother, grand parents, grand parents and the embryo father and mother of embryo.Embryo mentioned here Other children of father and mother should be interpreted broadly, and both can be the children, or still unborn children's (fertilization being born Ovum or embryo), or the embryo of death, or the embryo of In vitro culture or germ cell, as long as with embryo to be detected Total identical father and mother.
Embodiments in accordance with the present invention, the source of embryonic cell genomic DNA sample is not particularly restricted.According to this Bright embodiment, specifically, can carry out biopsy sampling for embryo, obtain embryonic cell, and embryonic cell sample is carried out Whole genome amplification (WGA), to obtain embryonic cell genomic DNA.The term " embryo biopsy " for being adopted herein is Finger utilizes microtechnique, the separate section embryonic cell from embryo, or from the technology of separate section cell in germ cell/gamete. Wherein, embodiments in accordance with the present invention, the embryonic cell may be from blastomere, blastaea trophoderm, germ cell and gamete Any one, can be individual cells, or 2-10 cell.In addition it is also possible to using any containing the pregnant of embryo's nucleic acid Woman's sample carries out whole genome amplification, to obtain embryonic gene group DNA sample.Thus, it is possible to not affect embryo normally to send out On the premise of educating, the genome of embryo is effectively monitored.Wherein, " whole genome amplification (WGA) " mainly includes many heavy chains Displacement amplification (Multiple Displacement Amplification, MDA) and the WGA technologies of PCR-based, can adopt The WGA amplification flow processs of independent development, it would however also be possible to employ the reagent of the REPLI-g series of business-like test kit such as Qiagen companies Box, GenomePlex WGA test kits (WGA4) of Sigma-Aldrich companies, Rubicon Genomics companies PicoPlex WGA test kits, illustra Genomiphi WGA test kits of GE-Healthcare companies etc..In amplification After DNA sample, the genomic DNA sample of DNA amplification sample and embryo genetic related individuals for embryonic cell, respectively Build sequencing library.
With regard to for sample of nucleic acid, building the method and flow process of sequencing library, those skilled in the art can be according to difference Sequencing technologies suitably selected, with regard to the details of flow process, may refer to that such as Illumina companies of manufacturer of instrument are sequenced The code for being provided, for example, see Illumina companies Multiplexing Sample Preparation Guide (Part# 1005361;) or Paired-End Sample Prep Guide (Part#1005063 Feb2010;Feb 2010), by ginseng According to being incorporated into herein.Embodiments in accordance with the present invention, from the method and apparatus that biological specimen extracts sample of nucleic acid, also not by spy Do not limit, the nucleic acid extraction kit of commercialization can be adopted to carry out.
After sequencing library is built, sequencing library is applied to instrument is sequenced, sequencing library is sequenced, and obtain phase The sequencing result answered, the sequencing result is made up of multiple sequencing datas.Embodiments in accordance with the present invention, can be used for carrying out The method and apparatus of sequencing is not particularly restricted, including but not limited to dideoxy chain termination;It is preferred that high-throughout sequencing side Method, the characteristics of thereby, it is possible to using the high flux of these sequencing devices, deep sequencing, further improves sequencing efficiency.So as to, Can improve and subsequently sequencing data is analyzed, especially the accuracy and accuracy during statistical check analysis.The high pass The sequence measurement of amount includes but is not limited to second filial generation sequencing technologies or single-molecule sequencing technology.The second filial generation microarray dataset (Metzker ML.Sequencing technologies-the next generation.Nat Rev Genet.2010Jan;11(1):31-46) include but is not limited to Illumina-Solexa (GATM, HiSeq2000TM, Miseq Deng), ABI-SOLiD, Life Technologies-Ion Torrent/Proton and Roche-454 (pyrosequencing) sequencing Platform;Single-molecule sequencing platform (technology) includes but is not limited to the true single-molecule sequencing technology (True of Helicos companies Single Molecule DNA sequencing), Pacific Biosciences companies unimolecule is sequenced in real time (single molecule real-time(SMRTTM)), and the nano-pore sequencing skill of Oxford Nanopore Technologies companies (Rusk, Nicole (2009-04-01) the .Cheap Third-Generation Sequencing.Nature Methods such as art 6(4):244-245).With the continuous evolution of sequencing technologies, skilled artisans appreciate that be can also to adopt other Sequence measurement and device carry out genome sequencing.Specific example of the invention, it is possible to use selected from Illumina- Solexa, ABI-SOLiD, Life Technologies-Ion Torrent/Proton, Roche-454 and single-molecule sequencing are filled At least one put is sequenced to the genome sequencing library.
Embodiments in accordance with the present invention, after sequencing result is obtained, can be based on embryonic cell genomic DNA sample Sequencing result, builds the hereditary sketch of the embryo, to determine embryo's initial gene type.It is related individual based on the embryo genetic The sequencing result of body genome sample, determines the haplotype of embryo father and mother.Embodiments in accordance with the present invention, by the way that embryo is thin The sequencing result of born of the same parents' genomic DNA sample is compared with reference sequences, builds the hereditary sketch of the embryo.By will be described The sequencing result of embryo genetic related individuals genome sample is compared with reference sequences, determines related of the embryo genetic The genotype of body;And the genotype based on the embryo genetic related individuals, determine the haplotype of the embryo father and mother.Root According to embodiments of the invention, can be using known mankind's reference gene group as reference sequences.For example, reality of the invention Example is applied, the mankind's reference gene group for adopting is for NCBI 36.3, HG18.In addition, embodiments in accordance with the present invention, compare Method is not particularly restricted.According to a particular embodiment of the invention, can be compared using SOAP.
Finally, using embryo's initial gene type as observation sequence, based on the haplotype of the embryo father and mother, according to Hidden Markov model, determines the base information of presumptive area in embryonic gene group.
The term " presumptive area " for being used herein should broadly understood, and refer to any comprising the predetermined thing of possible generation The region of the nucleic acid molecules in part site.For snp analysis, the region comprising SNP site is may refer to.For analysis dyeing Body aneuploidy, then presumptive area refer to total length or the part of the chromosome to be analyzed, that is, select all from the dye The sequencing data of colour solid.The method of the sequencing data from respective regions is selected from sequencing result can be not particularly limited. Embodiments in accordance with the present invention, can be by the way that resulting all sequencing datas be compared with known nucleic acid reference sequences It is right, so as to obtain coming from the sequencing data of presumptive area.In addition, embodiments in accordance with the present invention, presumptive area can also be Discontinuous multiple spaced points on genome.Embodiments in accordance with the present invention, it is possible to use the type of reference sequences do not receive Especially limit, can be any known array containing area-of-interest.
Embodiments in accordance with the present invention, can be by the sequencing result based on the embryonic cell, with reference to embryo genetic phase Individual hereditary information is closed, according to hidden Markov model, the base information of the presumptive area is determined.Reality of the invention Example is applied, using Hui Te than algorithm (Viterbi algorithm), embryonic gene group can be determined by hidden Markov model The base information of middle specific region.Thus, it is possible to effectively carry out being implanted into front detection to the hereditary information of embryonic gene group.
Below with reference to Fig. 1, to being carried out in detail than the principle that algorithm is analyzed using Hui Te by hidden Markov model Description:
As it was previously stated, the term " embryo genetic related individuals " for being used herein is referred in hereditary meaning, with Have akin individuality, such as embodiments in accordance with the present invention between embryo, " related of the embryo genetic that can be adopted Body " is other children of parental generation such as father and mother, grand parents, grand parents and the embryo father and mother of embryo.Embryo mentioned here Other children of father and mother should be interpreted broadly, and both can be the children, or still unborn children (embryo being born Or germ cell), or the embryo of death, or the embryo of In vitro culture or germ cell, as long as with embryo to be detected Total identical father and mother.
Thus, the genome forming process of filial generation, it is (i.e. chain to exchange single equivalent to once recombinating at random for parental gene group Ploidy is recombinated, and the random combine of gamete).Using the haplotype of embryo as hidden state (hidden states), by embryo Sequencing data after the unicellular whole genome amplification of tire is extrapolated as observation sequence (observations) by priori data State transition probability (transition probabilities), builds and determines observation sequence probability distribution (observation Symbol probabilities) and initial state probabilities distribution (initial state distribution), then pass through Hui Te can be inferred that most probable embryo's haplotype combination than algorithm (Viterbi algorithm).Thus, according to the present invention Embodiment, by hidden Markov model, for example can be by using Hui Te than algorithm (Viterbi algorithm), ginseng Examine the hereditary information of embryo genetic related individuals, it may be determined that the nucleotide sequence of specific region in embryonic gene group, thus, it is possible to Effectively the hereditary information of embryonic gene group is carried out being implanted into front detection.
Labor step is as follows:
Labelling:
I. the number of sites for detecting is needed to be N.
II. observation sequence;
III. hidden state collection is:S={ 0,1 }, defines that father and mother which bar chromosomal inheritance gives filial generation.0 represents heredity That chromosome of proband is given, 1 represents that chromosome for not entailing proband.
IV. observer state collection is:V={ 0,1 }, 1 represents the chromosome and embryo genetic sketch gene for entailing proband Type is consistent, 0 represent it is inconsistent.
The first step, builds initial state probabilities distribution vector, monoploid restructuring transfer matrix and observation sequence probability square Battle array
I. initial state probabilities distribution is designated as π={ πi, (i ∈ S, πi=0.5),
In the case of without reference to data, it can be assumed that the probability that every kind of hidden state occurs is equal.
II. remember that monoploid restructuring transfer matrix is designated as A={ aij, the probability of (i, j ∈ S), i.e. hidden state transfer.
Nr, Np represent respectively expectation restructuring number and mononucleotide polymorphism site number, according to the desirable 20-40 of priori data Nr Between natural number, for calculating the transition probability of hidden state, that is, calculate each base composition haplotype occur restructuring Probability.
III. remember that observation sequence probability matrix is B={ bi(k) }, (i ∈ S, k ∈ V), probability distribution b hereiK () uses son In generation, must be homozygosis (Must-hom) or must be the site of heterozygosis (Must-het) estimating.
Second step, builds local probability matrix and reverse pointer
Define local probability:δt(j)=max [δt-1(i)·aij]·bj(K) t∈{1...N}。
Define reverse pointer:Ψi(j)=arg max δt-1(i)·aij t∈{1...N}。
3rd step, determines end-state state, and recalls optimal path
Determine end-state,
Backtracking optimal path, i.e. most probable embryo single-gene type qt *t+1(q* t+1) (t=1,2,3 ..., N-1).
4th step, output result
That is, embodiments in accordance with the present invention, according to hidden Markov model, determine the alkali of presumptive area in embryonic gene group Base information is further included:
Build initial state probabilities distribution vector, the probability matrix of hidden state transfer and observation sequence probability matrix;
Determine end-state than algorithm using Hui Te and recall optimal path, to determine embryonic gene group in presumptive area Base information.
According to specific embodiment, the hidden Markov model adopts following parameters:
Initial state probabilities are distributed as π={ πi, (i ∈ S, πi=0.5), the probability matrix of hidden state transfer is A= {aij, (i, j ∈ S), wherein,Nr, Np represent respectively expectation restructuring number and mononucleotide Polymorphic position is counted, and Nr is natural number, and span 20-40, observation sequence probability matrix is B={ bi(k) }, (i ∈ S, k ∈ V), wherein,
#sites (L > 0, Must-hom.) is set to the number in the site of homozygosis, #sites (L > 0, Must- for filial generation one Hom.or Must-het.) be set to for filial generation one homozygosis site number and filial generation one be set to heterozygosis site number Summation;
Local probability is δt(j)=max [δt-1(i)·aij]·bj(K) t ∈ { 1...N },
Reverse pointer is Ψi(j)=arg max δt-1(i)·aijT ∈ { 1...N },
Yi Huite is than algorithm Jing recursive call end-stateBacktracking optimal path, it is determined that most The base information of possible embryo's presumptive area is qt *t+1(q* t+1) (t=1,2,3 ..., N-1).
Wherein, term " local probability δ used herein abovei(qi) " and " reverse pointer Ψi(qi) " all continue to use The classical definition of Viterbi algorithm.With regard to the detailed description of the definition of the parameter, Lawrence R.Rabiner are may refer to, PROCEEDINGS OF THE 2 months IEEE, Vol.77, No.2,1989 year, by referring to being incorporated by herein.
Thereby, it is possible to effectively be analyzed to the sequence of embryonic gene group.Compare and detect skill before other existing implantation Art method, this method has following technical advantage, is mainly reflected on accuracy and obtainable hereditary information amount:
1) according to embodiments of the present invention, for the father source property site and Disease in Infants site of embryo, can detect well, Detection accuracy may be up to more than 95%, and can detect various variation types, expand the scope of disease detection.
2) according to embodiments of the present invention, for some are embryo is unicellular or few cells whole genome amplification DNA sequencing As a result middle level of coverage is relatively low, site or the DNA sequences that there is heterozygous deletion or allele dropout (Allele Drop Out) Row, can exactly be inferred by this method and be obtained.Therefore by this method, can be to heterozygous deletion or allele dropout The site of (Allele Drop Out) or sequence are corrected, and make testing result more accurately and reliably.
3) according to embodiments of the present invention, genetic diseasess mapping can be carried out, for some chain relevant diseases, other can be passed through The information in site is directly inferred to, once obtainable to contain much information, and more has directive significance to Clinical detection.
In addition, according to embodiments of the present invention, the method for presumptive area base information in the determination embryonic gene group of the present invention, A certain kind genetic polymorphism such as SNP or STR site is not limited only to, it is applicable to all of genetic polymorphism site, and can Used simultaneously with various sites, so as to checking mutually.
The system of presumptive area base information in for determining embryonic gene group
In still another aspect of the invention, the present invention proposes a kind of for determining presumptive area nucleic acid sequence in embryonic gene group The system of row.Embodiments in accordance with the present invention, with reference to Fig. 2, the system 1000 can include:Library construction device 100, sequencing dress Put 200 and analytical equipment 400.
Embodiments in accordance with the present invention, library construction device 100 is suitable to be lost for embryonic gene group DNA sample and embryo The genomic DNA sample of related individuals is passed, sequencing library is built respectively.Embodiments in accordance with the present invention, sequencing device 200 and text Storehouse construction device 100 is connected, and is suitable to that the sequencing library is sequenced, related to obtain embryo and embryo genetic Individual sequencing result.Embodiments in accordance with the present invention, can further include DNA sample segregation apparatuss and DNA cloning dress (not shown) is put, the DNA sample segregation apparatuss are suitable to carry out biopsy sampling for embryo, obtain embryonic cell.Embryo is thin Born of the same parents may be from any one in blastomere, blastaea trophoderm, germ cell and gamete, can be individual cells, or micro- Amount such as 2-10 cell of cell, it would however also be possible to employ any anemia of pregnant woman's sample containing embryo's nucleic acid.The DNA cloning device is suitable to pin Sampling the embryonic cell for obtaining to biopsy carries out whole genome amplification, to obtain q.s DNA for subsequent detection analysis.By This, can effectively be monitored on the premise of embryo's normal development is not affected to the genome of embryo.Expand with regard to full-length genome The method and flow process of increasing, mainly including multiple strand displacement amplification (Multiple Displacement Amplification, MDA) and PCR-based WGA technologies, can adopt independent development WGA amplification flow process, it would however also be possible to employ business-like test kit Such as test kit, the GenomePlex WGA test kits of Sigma-Aldrich companies of the REPLI-g series of Qiagen companies (WGA4), PicoPlex WGA test kits, the illustra of GE-Healthcare companies of Rubicon Genomics companies Genomiphi WGA test kits etc..
With regard to for sample of nucleic acid, building the method and flow process of sequencing library, those skilled in the art can be according to difference Sequencing technologies suitably selected, with regard to the details of flow process, may refer to that such as Illumina companies of manufacturer of instrument are sequenced The code for being provided, for example, see Illumina companies Multiplexing Sample Preparation Guide (Part# 1005361;Feb 2010) or Paired-End SamplePrep Guide (Part#1005063;Feb 2010), by ginseng According to being incorporated into herein.Embodiments in accordance with the present invention, from the method and apparatus that biological specimen extracts sample of nucleic acid, also not by spy Do not limit, the nucleic acid extraction kit of commercialization can be adopted to carry out.Embodiments in accordance with the present invention, can be used for being sequenced Method and apparatus be not particularly restricted, including but not limited to dideoxy chain termination;It is preferred that high-throughout sequence measurement, by This, the characteristics of can utilize high flux, the deep sequencing of these sequencing devices, further improves sequencing efficiency.Thus, it is possible to carry Height is subsequently analyzed to sequencing data, especially the accuracy and accuracy during statistical check analysis.The high-throughout survey Sequence method includes but is not limited to second filial generation sequencing technologies or single-molecule sequencing technology.The second filial generation microarray dataset (Metzker ML.Sequencing technologies-the next generation.Nat Rev Genet.2010 Jan;11(1):31-46) include but is not limited to Illumina-Solexa (GATM、HiSeqTM, Miseq etc.), ABI-SOLiD, Life Technologies-Ion Torrent/Proton and Roche-454 (pyrosequencing) microarray datasets;Unimolecule is surveyed Sequence platform (technology) includes but is not limited to true single-molecule sequencing technology (the True Single Molecule of Helicos companies DNA sequencing), Pacific Biosciences companies unimolecule is sequenced in real time (single molecule real- time(SMRTTM)), and the nano-pore sequencing technology of Oxford Nanopore Technologies companies etc. (Rusk, Nicole(2009-04-01).Cheap Third-Generation Sequencing.Nature Methods 6(4):244- 245).With the continuous evolution of sequencing technologies, skilled artisans appreciate that be can also be using other sequencing sides Method and device carry out genome sequencing.Specific example of the invention, it is possible to use selected from Illumina-Solexa, ABI-SOLiD, Life Technologies-Ion Torrent/Proton, Roche-454 and single-molecule sequencing device are extremely Few one kind is sequenced to the genome sequencing library.
Embodiments in accordance with the present invention, can also include comparison device 300.Embodiments in accordance with the present invention, comparison device 300 are connected with sequencing device 200, and are suitable to that resulting sequencing result and reference sequences are compared, with will pass through by The sequencing result of embryonic cell genomic DNA sample is compared with reference sequences, builds the hereditary sketch of the embryo;Pass through The sequencing result of the embryo genetic related individuals genome sample is compared with reference sequences, the embryo genetic is determined The genotype of related individuals;And the genotype based on the embryo genetic related individuals, determine single times of the embryo father and mother Build.Embodiments in accordance with the present invention, it is possible to use the type of reference sequences be not particularly restricted, can for it is any containing The known array of area-of-interest.Embodiments in accordance with the present invention, can be using known mankind's reference gene group as reference Sequence.For example, embodiments in accordance with the present invention, the mankind's reference gene group for adopting is for NCBI 36.3, HG18.In addition, according to this Inventive embodiment, the method compared is not particularly restricted.According to a particular embodiment of the invention, SOAP can be adopted Compare.
Embodiments in accordance with the present invention, analytical equipment 400 is suitable to:Sequencing based on embryonic cell genomic DNA sample is tied Really, the hereditary sketch of the embryo is built, to determine embryo's initial gene type;Based on the embryo genetic related individuals gene The sequencing result of group sample, determines the haplotype of embryo father and mother;And according to hidden Markov model, it is initial with the embryo Genotype, based on the haplotype of the embryo father and mother, determines the base of presumptive area in embryonic gene group as observation sequence Information.
Embodiments in accordance with the present invention, according to hidden Markov model, determine the base of presumptive area in embryonic gene group Information is further included:
Build initial state probabilities distribution vector, the probability matrix of hidden state transfer and observation sequence probability matrix;
Determine end-state than algorithm using Hui Te and recall optimal path, to determine embryonic gene group in presumptive area Base information.
According to specific embodiment, the hidden Markov model adopts following parameters:
Initial state probabilities are distributed as π={ πi, (i ∈ S, πi=0.5), the probability matrix of hidden state transfer is A= {aij, (i, j ∈ S), wherein,Nr, Np represent respectively expectation restructuring number and mononucleotide Polymorphic position is counted, the natural number between the desirable 20-40 of Nr, and observation sequence probability matrix is B={ bi(k) }, (i ∈ S, k ∈ V), wherein,
#sites (L > 0, Must-hom.) is set to the number in the site of homozygosis, #sites (L > 0, Must- for filial generation one Hom.or Must-het.) be set to for filial generation one homozygosis site number and filial generation one be set to heterozygosis site number Summation;
Local probability is δt(j)=max [δt-1(i)·aij]·bj(K) t ∈ { 1...N },
Reverse pointer is Ψi(j)=arg max δt-1(i)·aijT ∈ { 1...N },
Recursive call end-state isBacktracking optimal path, determines that most probable embryo makes a reservation for The base information in region is qt *t+1(q* t+1) (t=1,2,3 ..., N-1).
Term local probability δ used herein abovei(qi) and reverse pointer Ψi(qi) all it is the Jing for continuing to use Viterbi algorithm Allusion quotation is defined.With regard to the detailed description of the definition of the parameter, Lawrence R.Rabiner, PROCEEDINGS OF are may refer to 2 months THE IEEE, Vol.77, No.2,1989 year, by referring to being incorporated by herein.
With regard to data analysis component, before have been carried out describe in detail, also of course be applied to determine embryonic gene group The system of middle presumptive area nucleotide sequence.Repeat no more.
Thus, using the system, presumptive area nucleic acid in foregoing determination embryonic gene group can effectively be implemented The method of sequence, can decode for example, by Hui Te by hidden Markov model than algorithm (Viterbi algorithm), The base information of specific region in embryonic gene group is determined, thus, it is possible to effectively carry out to the hereditary information of embryonic gene group Detection before implantation.
Additionally, embodiments in accordance with the present invention, presumptive area is the known site that there is genetic polymorphism, and genetic polymorphism Property is at least one selected from single nucleotide polymorphism and STR.
Herein described term " connected " should broadly understood, and both can be to be joined directly together, or indirect phase Even, as long as the linking in above-mentioned functions can be realized.
It should be noted that it will be appreciated by those skilled in the art that pre- in determination embryonic gene group described above The feature and advantage for determining the method for region nucleotide sequence are also suitable for determining that presumptive area nucleotide sequence is in embryonic gene group System, for convenience of description, no longer describes in detail.
Computer-readable medium
In still another aspect of the invention, the present invention proposes a kind of computer-readable medium.Embodiments in accordance with the present invention, Be stored with instruction on the computer-readable medium, the instruction be suitable to be executed by processor so as to:
Based on the sequencing result of embryonic cell genomic DNA sample, the hereditary sketch of embryo is built, at the beginning of to determine embryo Beginning genotype;
Based on the sequencing result of the embryo genetic related individuals genome sample, the haplotype of embryo father and mother is determined;With And
According to hidden Markov model, using embryo's initial gene type as observation sequence, the list based on the embryo father and mother Ploidy, determines the base information of presumptive area in embryonic gene group.
Thus, using the computer-readable medium, foregoing method can effectively be implemented, such that it is able to pass through example If Hui Te is than algorithm (Viterbi algorithm), by hidden Markov model, specific region in embryonic gene group is determined Base information, thus, it is possible to effectively carry out being implanted into front detection to the hereditary information of embryonic gene group.
Embodiments in accordance with the present invention, according to hidden Markov model, determine the base of presumptive area in embryonic gene group Information is further included:
Build initial state probabilities distribution vector, the probability matrix of hidden state transfer and observation sequence probability matrix;
Determine end-state than algorithm using Hui Te and recall optimal path, to determine embryonic gene group in presumptive area Base information.
According to specific embodiment, the hidden Markov model adopts following parameters:
Initial state probabilities are distributed as π={ πi, (i ∈ S, πi=0.5), the probability matrix of hidden state transfer is A= {aij, (i, j ∈ S), wherein,Nr, Np represent respectively expectation restructuring number and mononucleotide Polymorphic position is counted, the natural number between the desirable 20-40 of Nr, and observation sequence probability matrix is B={ bi(k) }, (i ∈ S, k ∈ V), wherein,
#sites (L > 0, Must-hom.) is set to the number in the site of homozygosis, #sites (L > 0, Must- for filial generation one Hom.or Must-het.) be set to for filial generation one homozygosis site number and filial generation one be set to heterozygosis site number Summation;
Local probability is δt(j)=max [δt-1(i)·aij]·bj(K) t∈{1...N}
Reverse pointer is Ψi(j)=arg max δt-1(i)·aijT ∈ { 1...N },
Jing recursive call end-state isBacktracking optimal path, determines that most probable embryo is pre- The base information for determining region is qt *t+1(q* t+1) (t=1,2,3 ..., N-1).
Term local probability δ used herein abovei(qi) and reverse pointer Ψi(qi) all it is the Jing for continuing to use Viterbi algorithm Allusion quotation is defined.With regard to the detailed description of the definition of the parameter, Lawrence R.Rabiner, PROCEEDINGS OF are may refer to 2 months THE IEEE, Vol.77, No.2,1989 year, by referring to being incorporated by herein.
With regard to data analysis component, before have been carried out describe in detail, also of course be applied to determine embryonic gene group The system of middle presumptive area nucleotide sequence.Repeat no more.
Thus, using the system, presumptive area nucleic acid in foregoing determination embryonic gene group can effectively be implemented The method of sequence, can for example, by Hui Te than algorithm (Viterbi algorithm), by hidden Markov model, it is determined that The base information of specific region in embryonic gene group, thus, it is possible to effectively be implanted into the hereditary information of embryonic gene group Front detection.
Additionally, embodiments in accordance with the present invention, presumptive area is the known site that there is genetic polymorphism, and genetic polymorphism Property is at least one selected from single nucleotide polymorphism and STR.
For the purpose of this specification, " computer-readable medium " can any can be included, store, communicate, propagate or pass The dress that defeated program is used for instruction execution system, device or equipment or with reference to these instruction execution systems, device or equipment Put.The more specifically example (non-exhaustive list) of computer-readable medium includes following:With the electricity that one or more are connected up Connecting portion (electronic installation), portable computer diskette box (magnetic device), random access memory (RAM), read only memory (ROM), erasable edit read-only storage (EPROM or flash memory), fiber device, and portable optic disk is read-only deposits Reservoir (CDROM).In addition, computer-readable medium can even is that the paper that can thereon print described program or other are suitable Medium, because for example can then enter edlin, interpretation or if necessary with it by carrying out optical scanning to paper or other media His suitable method is processed to electronically obtain described program, in being then stored in computer storage.
It should be appreciated that each several part of the present invention can be realized with hardware, software, firmware or combinations thereof.Above-mentioned In embodiment, the software that multiple steps or method can in memory and by suitable instruction execution system be performed with storage Or firmware is realizing.For example, if realized with hardware, and in another embodiment, can be with well known in the art Any one of row technology or their combination are realizing:With for realizing the logic gates of logic function to data signal Discrete logic, the special IC with suitable combinational logic gate circuit, programmable gate array (PGA), scene Programmable gate array (FPGA) etc..
Those skilled in the art are appreciated that to realize all or part of step that above-described embodiment method is carried Suddenly the hardware that can be by program to instruct correlation is completed, and described program can be stored in a kind of computer-readable storage medium In matter, the program upon execution, including one or a combination set of the step of embodiment of the method.
Additionally, each functional unit in each embodiment of the invention can be integrated in a processing module, it is also possible to It is that unit is individually physically present, it is also possible to which two or more units are integrated in a module.Above-mentioned integrated mould Block both can be realized in the form of hardware, it would however also be possible to employ the form of software function module is realized.The integrated module is such as Fruit is realized and as independent production marketing or when using using in the form of software function module, it is also possible to be stored in a computer In read/write memory medium.
The solution of the present invention is explained below in conjunction with embodiment.It will be understood to those of skill in the art that following Embodiment is merely to illustrate the present invention, and should not be taken as limiting the scope of the invention.Unreceipted particular technique or bar in embodiment Part, (for example write with reference to J. Pehanorm Brookers etc., Huang Peitang etc. is translated according to the technology or condition described by document in the art 's《Molecular Cloning:A Laboratory guide》, the third edition, Science Press) or carry out according to product description.Agents useful for same or instrument Unreceipted production firm person, be can pass through city available from conventional products, for example can purchase from Illumina companies.
Conventional method:
With reference to Fig. 1, the key step of the embodiment of the present invention includes:
1) unicellular or few cells sample, the 3rd day spilting of an egg glomus cell or blastaea trophoblastic cell biopsy in the 5th day are obtained Sampling, or germ cell sampling, or other unicellular or few cells samplings.
2) unicellular or few cells sample carries out whole genome amplification, and amplified production DNA is according to new-generation sequencing platform Require to build library, and be sequenced.
3) unicellular or few cells sequencing data is compared after filtration with human genome reference sequences.
4) the preliminary inspection of the hereditary variation such as embryonic cell single nucleotide polymorphism and disappearance repetition is carried out according to comparison result Survey.
5) the hereditary sketch of embryonic gene group is built, embryonic gene type is initialized.
6) embryo genetic related individuals, such as family member such as father and mother and proband are collected, and/or grand parents, grand parents Deng or with other embryo's samples of a pair of father and mother, extract genomic DNA, build library according to new-generation sequencing Platform Requirements, And be sequenced.
7) embryo genetic related individuals DNA sequencing data are filtered, and are compared with human genome reference sequences.
8) according to comparison result, analysis determines the individual genotype of the genetic correlations such as proband, father and mother.
9) using the genotype that the genetic correlations such as proband, father and mother are individual, the haplotype of parents is inferred.
10) by HMM decoding process, with father and mother's haplotype result embryo's haplotype is inferred.
11) the final determination of embryo genetic variation.
With reference to Fig. 1, information analysis part is used in the embodiment of the present invention HMM and labor step It is rapid as follows:
Labelling:
1st, the number of sites for needing detection is N.
2nd, observation sequence:
3rd, hidden state collection is:S={ 0,1 }, defines that father and mother which bar chromosomal inheritance gives filial generation.0 representative is entailed That chromosome of proband.
4th, observer state collection is:V={ 0,1 }.1 represents the chromosome and embryo genetic sketch genotype for entailing proband Unanimously.0 represent it is inconsistent.
The first step, builds initial state probabilities distribution vector, monoploid restructuring transfer matrix and observation sequence probability square Battle array
I. initial state probabilities distribution is designated as π={ πi, (i ∈ S, πi=0.5)
In the case of without reference to data, if the probability that every kind of hidden state occurs is equal.
II. remember that monoploid restructuring transfer matrix is designated as A={ aij, the probability of (i, j ∈ S), i.e. hidden state transfer.
Nr, Np represent respectively expectation restructuring number and mononucleotide polymorphism site number.
Build observation sequence probability matrix
Note observation sequence probability matrix is B={ bi(k) }, (i ∈ S, k ∈ V), probability distribution filial generation here must be Homozygosis (Must-hom) must be the site of heterozygosis (Must-het) estimating.
Second step, builds local probability matrix, and reverse pointer
Define local probability:δt(j)=max [δt-1(i)·aij]·bj(K) t∈{1...N}
Define reverse pointer:Ψi(j)=arg max δt-1(i)·aij t∈{1...N}
3rd step, determines end-state state, and recalls optimal path
Determine end-state,
Backtracking optimal path, i.e. most probable embryo single-gene type qt *t+1(q* t+1) (t=1,2,3 ..., N-1)
4th step, output result
Embodiment 1
First, sample collection and process:
A couple once gave birth to the daughter A (proband) of a severe beta Thalassemia, therefore received IVF and (receive in vitro Essence joint embryo transfer technology)-PGD treatments, it would be desirable to a healthy baby being consistent with proband's HLA distribution type is given birth to again.Jing An IVF-PGD treatment cycle is crossed, the unicellular biopsy sampling of blastomere embryos was carried out at the 3rd day, the blastomere after biopsy continues Cultivate to the 5th day blastula stage, carry out blastaea vitrificated cryopreserration.The unicellular sample of blastomere that biopsy is obtained carries out full base Because of a group amplification (PED7-1-OWGA), whole genome amplification adopts the REPLI-g MDA amplification kits of Qiagen companies, strictly Operated according to kit specification, amplified production is used for new-generation sequencing and other analyses.Also collect this family simultaneously The peripheral blood of middle father (PED4-F), mother (PED3-M) and daughter proband (PED-Sister), extracts genomic DNA.
After IVF-PGD treatments are completed, correct Embryonic limb bud cell uterus is selected, after successful pregnancy, can be after 14 weeks Amniotic fluid of pregnant woman (PED7-2-OAF) is extracted, or collects embryo's Cord blood, extract DNA, be sequenced, to verify dividing for embryonic cell Type result.
The unicellular whole genome amplification product DNA Covaris of embryoTMInterrupt instrument to interrupt to the fragment of 200bp sizes, root According toCompany HiSeq2000TMUpper confidential the asking of sequenator carries out the library Jing for building storehouse, building Bioanalyzer 2100 and Q-PCR methods carry out Quality Control, and qualified library proceedsHiSeq2000TMSequencing Instrument is sequenced, and sequencing strategy is the index of Pair End 90 (i.e. two-way 90bp index sequencing), and depth 30X, full-length genome is sequenced Sequencing.The parameter setting and operational approach of its Instrumental all according toWorkbook (can be by http:// Www.illumina.com/support/documentation.ilmn is obtained).
Father, mother and daughter's proband peripheral blood DNA, and amniotic fluid of pregnant woman or embryo Cord blood DNA CovarisTM Interrupt instrument to interrupt to the fragment of 200bp sizes, according toCompany HiSeq2000TMUpper confidential the asking of sequenator is built Storehouse, library carries out the capture of All in One chips, and concrete operations are shown in that storehouse description is built in the capture of NimbleGen SeqCap EZ chips (http://www.nimblegen.com/products/seqcap/ez/index.html can be obtained).All in One chips It is the target area capture chip of autonomous Design, its target area includes full exon region, the SNP site region of 1M and MHC Gene regions.Library Jing after chip captureBioanalyzer 2100 and Q-PCR methods carry out Quality Control, qualified text Storehouse proceeds HiSeq2000TMSequencer, sequencing strategy is the index of Pair End 90, and depth is sequenced For the 30-50X of target area.The parameter setting and operational approach of its Instrumental all according toWorkbook (can be by http://www.illumina.com/support/documentation.ilmn is obtained).
2nd, sequencing data analysis:
1st, proband and parent gene group DNA sequencing typing:
A) sequence of low quality value and joint pollution is filtered out
B) Burrows-Wheeler Aligner (BWA, http are used://bio-bwa.sourceforge.net/) will High-quality sequencing sequence is compared at mankind's reference gene group (Hg19, NCBI release GRCh37).And gone with SAMtools Remove the sequence of PCR repeat amplification protcols.
C) Genome Analysis Toolkit (GATK, http are used://www.broadinstitute.org/ Gatk/) calibrate base mass value and carry out the detection that single base nucleotide polymorphisms and disappearance repeat.
2nd, embryonic gene group heredity sketch builds.
3rd, the haplotype of father and mother is inferred according to proband's genotype:
Infer haplotype according to core families, specifically refer to document B.L.Browning and S.R.Browning.Am J Hum Genet 84:210-223. (2009) carry out.
4th, the haplotype of embryonic cell is inferred according to father and mother's haplotype:
A) the unicellular sequencing data of embryo is compared to mankind reference gene group (Hg19, NCBI release using BWA GRCh37)
B) initial state probability vector is built, and embryo monoploid restructuring transfer matrix, Nr=30, other press institute State.
C) count the sequencing information in each site of embryonic cell, and build observation sequence probability matrix (by upper described).
D) local probability matrix is built, and reverse pointer (by upper described).
E) determine end-state, and recall optimal path
F) export
G) with the postnatal amniotic fluid genotypic results of embryo as reference, the gene type accuracy of embryonic cell is counted It is as follows:
Note:" sketch heterozygosis " represents the site that heterozygosis is before and after reconstruct;" corrigendum heterozygosis " represents before reconstruct to be homozygosis, heavy It is the site of heterozygosis after structure.
5th, β-ground is lean and HLA types judge
A) by father and mother's haplotype, genotype of the embryonic cell on rs7480526 is successfully reduced, predicts that it is β-ground Lean feminine gender, and find that the allele with proband on the position is different, so that it is determined that embryo is the lean feminine gender in β-ground.
B) can be drawn by this method, proband and embryo in the haplotype in MHC areas as, so as to judge theirs HLA types are matchings.
According to the lean genotype in β-ground and HLA genotyping results of above-mentioned embryo, so as to select correct embryo, i.e. lean the moon in β-ground Property and the embryo that matches with proband of HLA types, Cryopreservation and implantation uterus are carried out, so as to prevent the gestation of ill youngster.
Embodiment 2
First, sample collection and process:
A couple is the lean carrier in β-ground, receives IVF-PGD treatments.Through an IVF-PGD treatment cycle, 8 blastomere embryos are obtained within three days, the unicellular biopsy sampling of blastomere embryos is carried out, the blastomere after biopsy continues to cultivate To the 5th day blastula stage, blastaea vitrificated cryopreserration is carried out.The unicellular sample of 8 blastomeres that biopsy is obtained carries out full genome Group amplification, whole genome amplification adopts the REPLI-g MDA amplification kits of Qiagen companies, strictly according to kit specification Operated, amplified production is used for new-generation sequencing and other analyses.
The peripheral blood of father, mother in this family is also collected simultaneously, extracts genomic DNA.
IVF-PGD treatments are completed, correct Embryonic limb bud cell uterus is being selected, after successful pregnancy, can taken out after 14 weeks Amniotic fluid of pregnant woman is taken, or collects embryo's Cord blood, extract DNA, be sequenced, to verify the genotyping result of embryonic cell.
The unicellular whole genome amplification product DNA of 8 embryos, father, mother's peripheral blood DNA, and amniotic fluid of pregnant woman or embryo Tire Cord blood DNA CovarisTMInterrupt instrument to interrupt to the fragment of 200bp sizes, according toCompany HiSeq2000TM Upper confidential the asking of sequenator carries out building storehouse, and library carries out the capture of All in One chips, and NimbleGen is shown in concrete operations Storehouse description is built in the capture of SeqCap EZ chips(http://www.nimblegen.com/products/seqcap/ez/ index.htmlCan obtain).All in One chips are the target area capture chips of autonomous Design, and its target area includes complete Exon region, the SNP site region of 1M and mhc gene area.Library Jing after chip captureBioanalyzer 2100 and Q-PCR methods carry out Quality Control, and qualified library proceeds HiSeq2000TMSequencer, sequencing Strategy is the index of Pair End 90 (i.e. two-way 90bp index sequencings), and 30-50X of the depth for target area is sequenced.Wherein The parameter setting and operational approach of instrument all according toMould makees handbook (can be byhttp://www.illumina.com/ support/documentation.ilmnObtain).
2nd, sequencing data analysis:
1) parent gene group DNA, the analysis of amniotic fluid DNA (checking sample) sequencing data and typing:
A) sequence of low quality value and joint pollution is filtered out
B) using Burrows-Wheeler Aligner (BWA,http://bio-bwa.sourceforge.net/) will High-quality sequencing sequence is compared at mankind's reference gene group (Hg19, NCBI release GRCh37).And gone with SAMtools Remove the sequence of PCR repeat amplification protcols.
C) Genome Analysis Toolkit (GATK, http are used://www.broadinstitute.org/ Gatk/) calibrate base mass value and carry out the detection that single base nucleotide polymorphisms and disappearance repeat.
2) according to the unicellular sequencing data analysis of 8 embryos, each initialized base of embryonic cell genome is given respectively Because of type information, the cellular genome heredity sketch of each embryo is built.
3) unicellular any one embryo to be measured of 8 embryos is determined, according to embryonic cell genome to be measured heredity sketch and father Female genotype, initializes to the haplotype of father and mother.
4) by HMM, reconstruct and determine single times of father and mother according to the genotype of remaining 7 embryo respectively Build.
5) by HMM, the haplotype of embryo to be measured is reconstructed according to father and mother's haplotype.
6) determination of embryo genetic variation information to be measured.
By taking No. 22 chromosomes as an example, with reference to amniotic fluid genotypic results, the gene type accuracy of embryo to be measured is counted It is as follows:
7) β-ground is lean and HLA types judge
A) by father and mother's haplotype, haplotype of the embryonic cell to be measured on rs7480526 is successfully reduced, predicts that it is The lean feminine gender in β-ground.
B) by this method, with reference to the Genetic Detection result of first disease person, can determine that the monomer of proband and embryo in MHC areas Whether type is consistent, so as to whether the HLA types for judging them match.
According to the lean genotype in β-ground and HLA genotyping results of above-mentioned embryo, so as to select correct embryo, i.e. lean the moon in β-ground Property and the embryo that match with first disease person of HLA types, carry out Cryopreservation and be implanted into uterus, so as to prevent the gestation of ill youngster.
Industrial applicibility
The method of presumptive area base information in the determination embryonic gene group of the present invention, for determining embryonic gene group in it is pre- Determine the system and computer-readable medium of region base information, can be effectively applied to presumptive area in embryonic gene group Nucleotide sequence be analyzed.
Although the specific embodiment of the present invention has obtained detailed description, it will be understood to those of skill in the art that.Root According to disclosed all teachings, various modifications and replacement can be carried out to those details, these change in the guarantor of the present invention Within the scope of shield.The four corner of the present invention is given by claims and its any equivalent.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " illustrative examples ", The description of " example ", " specific example " or " some examples " etc. means to combine specific features, the knot that the embodiment or example are described Structure, material or feature are contained at least one embodiment of the present invention or example.In this manual, to above-mentioned term Schematic representation is not necessarily referring to identical embodiment or example.And, the specific features of description, structure, material or spy Point can in an appropriate manner be combined in any one or more embodiments or example.

Claims (15)

1. a kind of method for determining presumptive area base information in embryonic gene group, methods described is used for non-diagnostic purpose, and it is special Levy and be, comprise the following steps:
Obtain the sequencing result of embryonic cell genomic DNA sample, and the sequencing of embryo genetic related individuals genome sample As a result;
Based on the sequencing result of embryonic cell genomic DNA sample, the hereditary sketch of the embryo is built, at the beginning of to determine embryo Beginning genotype;
Based on the sequencing result of the embryo genetic related individuals genome sample, the haplotype of embryo father and mother is determined;And
According to hidden Markov model, using embryo's initial gene type as observation sequence, the list based on the embryo father and mother Ploidy, determines the base information of presumptive area in embryonic gene group,
Wherein,
According to hidden Markov model, the base information for determining presumptive area in embryonic gene group is further included:
Build initial state probabilities distribution vector, the probability matrix of hidden state transfer and observation sequence probability matrix;
Determine end-state than algorithm using Hui Te and recall optimal path, to determine embryonic gene group in presumptive area alkali Base information,
The hidden Markov model adopts following parameters:
Initial state probabilities are distributed as π={ πi},(i∈S,πi=0.5),
The probability matrix of hidden state transfer is A={ aij, (i, j ∈ S), wherein,
Nr, Np represent respectively expectation restructuring number and mononucleotide polymorphism site number, and Nr is natural number, span 20-40, observation sequence Probability matrix is B={ bi(k) }, (i ∈ S, k ∈ V), wherein,
#sites(L>0, Must-hom.) it is set to the number in the site of homozygosis, #sites (L for filial generation one>0,Must- Hom.orMust-het. the number and filial generation one that) are set to the site of homozygosis for filial generation one are set to the number in the site of heterozygosis Summation;
Local probability is δt(j)=max [δt-1(i)·aij]·bj(K) t ∈ { 1...N },
Reverse pointer is Ψi(j)=argmax δt-1(i)·aijT ∈ { 1...N },
Recursive call end-state is
Backtracking optimal path, the base information for determining most probable embryo's presumptive area is qt *t+1(q* t+1) (t=1,2, 3,...,N-1)。
2. method according to claim 1, it is characterised in that the embryonic gene group DNA sample is thin from micro embryo What born of the same parents obtained, wherein, the micro embryonic cell comes from blastomere, blastaea or germ cell.
3. method according to claim 2, it is characterised in that the embryonic gene group DNA sample is unicellular from embryo Obtain.
4. method according to claim 1, it is characterised in that the sequencing result is using selected from Illumina- Solexa/Hiseq, ABI-SOLiD, Life Technologies-Ion Torrent/Proton, Roche-454 and unimolecule At least one acquisition of sequencing device.
5. method according to claim 1, it is characterised in that tied by the sequencing by embryonic cell genomic DNA sample Fruit is compared with reference sequences, builds the hereditary sketch of the embryo.
6. method according to claim 1, it is characterised in that by by the embryo genetic related individuals genome sample Sequencing result compare with reference sequences, determine the genotype of the embryo genetic related individuals;And
Based on the genotype of the embryo genetic related individuals, the haplotype of the embryo father and mother is determined.
7. the method according to claim 5 or 6, it is characterised in that the reference sequences are mankind's reference gene group.
8. method according to claim 1, it is characterised in that the embryo genetic related individuals include the father of the embryo Female and proband.
9. method according to claim 8, it is characterised in that the proband for the embryo father and mother other children.
10. method according to claim 1, it is characterised in that further include:There is something lost for known in the presumptive area Pass the site of polymorphism.
11. methods according to claim 10, it is characterised in that the genetic polymorphism is selected from single nucleotide polymorphism With at least one of STR.
12. it is a kind of for determining embryonic gene group in presumptive area base information system, it is characterised in that include:
Library construction device, the library construction device is suitable to related individual for embryonic gene group DNA sample and embryo genetic The genomic DNA sample of body, builds respectively sequencing library;
Sequencing device, the sequencing device is connected with the library construction device, and is suitable to survey the sequencing library Sequence, to obtain the sequencing result of embryo and embryo genetic related individuals;And
Analytical equipment, the analytical equipment is connected with the sequencing device, and is suitable to:
Based on the sequencing result of embryonic gene group DNA sample, the hereditary sketch of the embryo is built, to determine embryo's just primordium Because of type;
Based on the sequencing result of the embryo genetic related individuals genome sample, the haplotype of embryo father and mother is determined;And
According to hidden Markov model, using embryo's initial gene type as observation sequence, the list based on the embryo father and mother Ploidy, determines the base information of presumptive area in embryonic gene group,
Wherein, according to hidden Markov model, the base information for determining presumptive area in embryonic gene group is further included:
Build initial state probabilities distribution vector, the probability matrix of hidden state transfer and observation sequence probability matrix;
Determine end-state than algorithm using Hui Te and recall optimal path, to determine embryonic gene group in presumptive area alkali Base information,
The hidden Markov model adopts following parameters:
Initial state probabilities are distributed as π={ πi},(i∈S,πi=0.5),
The probability matrix of hidden state transfer is A={ aij, (i, j ∈ S), wherein,
Nr, Np represent respectively expectation restructuring number and mononucleotide polymorphism site number, and Nr is natural number, span 20-40, observation sequence Probability matrix is B={ bi(k) }, (i ∈ S, k ∈ V), wherein,
#sites(L>0, Must-hom.) it is set to the number in the site of homozygosis, #sites (L for filial generation one>0,Must- Hom.orMust-het. the number and filial generation one that) are set to the site of homozygosis for filial generation one are set to the number in the site of heterozygosis Summation;
Local probability is δt(j)=max [δt-1(i)·aij]·bj(K) t ∈ { 1...N },
Reverse pointer is Ψi(j)=argmax δt-1(i)·aijT ∈ { 1...N },
Recursive call end-state is
Backtracking optimal path, the base information for determining most probable embryo's presumptive area is qt *t+1(q* t+1) (t=1,2, 3,...,N-1)。
13. systems according to claim 12, it is characterised in that further include DNA sample segregation apparatuss, the DNA Sample segregation apparatuss are suitable to carry out biopsy sampling from embryo, obtain micro embryonic cell, and from the micro embryonic cell embryo is extracted Tire genomic DNA sample, wherein, the micro embryonic cell comes from blastomere, blastaea or germ cell.
14. systems according to claim 12, it is characterised in that the sequencing device is selected from Illumina-Solexa/ Hiseq, ABI-SOLiD, Life Technologies-Ion Torrent/Proton, Roche-454 and single-molecule sequencing are filled At least one put.
15. systems according to claim 12, it is characterised in that further include comparison device, the comparison device with The sequencing device is connected, for the sequencing result and reference sequences to be compared, so as to:
By the way that the sequencing result of embryonic cell genomic DNA sample is compared with reference sequences, the something lost of the embryo is built Pass sketch;
By the way that the sequencing result of the embryo genetic related individuals genome sample is compared with reference sequences, it is determined that described The genotype of embryo genetic related individuals;And
Based on the genotype of the embryo genetic related individuals, the haplotype of the embryo father and mother is determined.
CN201380074395.3A 2013-03-28 2013-03-28 Method, system, and computer readable medium for determining base information of predetermined area in fetal genome Active CN105051208B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2013/073375 WO2014153757A1 (en) 2013-03-28 2013-03-28 Method, system, and computer readable medium for determining base information of predetermined area in fetal genome

Publications (2)

Publication Number Publication Date
CN105051208A CN105051208A (en) 2015-11-11
CN105051208B true CN105051208B (en) 2017-04-19

Family

ID=51622393

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201380074395.3A Active CN105051208B (en) 2013-03-28 2013-03-28 Method, system, and computer readable medium for determining base information of predetermined area in fetal genome

Country Status (3)

Country Link
CN (1) CN105051208B (en)
HK (1) HK1213945A1 (en)
WO (1) WO2014153757A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021232388A1 (en) * 2020-05-22 2021-11-25 深圳华大智造科技有限公司 Method for determining base type of predetermined site in embryonic cell chromosome, and application thereof

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110846310B (en) * 2018-08-21 2024-03-22 深圳华大法医科技有限公司 SNP (Single nucleotide polymorphism) locus set and method for performing genetic identification on embryo nucleic acid sample and application
CN112639129A (en) * 2018-09-03 2021-04-09 深圳华大智造科技有限公司 Method and apparatus for determining the genetic status of a new mutation in an embryo
CN109522378A (en) * 2018-10-10 2019-03-26 深圳韦格纳医学检验实验室 The display methods and display equipment of hereditary birthplace probability distribution
CN110349631B (en) * 2019-07-30 2021-10-29 苏州亿康医学检验有限公司 Analysis method and device for determining haplotype of offspring object
AU2020323958B2 (en) 2019-08-16 2022-02-03 The Chinese University Of Hong Kong Determination of base modifications of nucleic acids
CN111739584B (en) * 2020-07-01 2024-02-09 苏州贝康医疗器械有限公司 Construction method and device of genotyping evaluation model for PGT-M detection
CN115064210B (en) * 2022-07-27 2022-11-18 北京大学第三医院(北京大学第三临床医学院) Method for identifying chromosome cross-exchange positions in diploid embryonic cells and application

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2746632C (en) * 2008-12-11 2020-06-30 Pacific Biosciences Of California, Inc. Characterization of modified nucleic acids
CN102127818A (en) * 2010-12-15 2011-07-20 张康 Method for creating fetus DNA library by utilizing peripheral blood of pregnant woman

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基予隐马尔可夫模型的DNA序列识别;罗泽举等;《华南理工大学学报(自然科学版)》;20070831;第35卷(第8期);第123-137页 *
隐马尔可夫模型在生物信息学中的应用;杜世平等;《大学数学》;20041031;第20卷(第5期);第24-30页 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021232388A1 (en) * 2020-05-22 2021-11-25 深圳华大智造科技有限公司 Method for determining base type of predetermined site in embryonic cell chromosome, and application thereof

Also Published As

Publication number Publication date
CN105051208A (en) 2015-11-11
WO2014153757A1 (en) 2014-10-02
HK1213945A1 (en) 2016-07-15

Similar Documents

Publication Publication Date Title
CN105051208B (en) Method, system, and computer readable medium for determining base information of predetermined area in fetal genome
Tsang et al. Integrative single-cell and cell-free plasma RNA transcriptomics elucidates placental cellular dynamics
CN105143466B (en) Pass through extensive parallel RNA sequencing analysis mother blood plasma transcript profile
CN103492589B (en) The molecular testing of multiple pregnancy
US20180320235A1 (en) Method, system and computer readable medium for determining base information in predetermined area of fetus genome
CN103608818B (en) The antenatal ploidy identification device of Noninvasive
JP6328934B2 (en) Noninvasive prenatal testing
CN105392894B (en) It determines in sample genome with the presence or absence of method, system and the computer-readable medium of copy number variation
CN102597266A (en) Methods for non-invasive prenatal ploidy calling
CN105392893A (en) Method, system, and capturing chip for detecting scheduled event in nucleic acid sample
CN105051209A (en) Noninvasive prenatal molecular karyotyping from maternal plasma
Liu et al. A forward look at noninvasive prenatal testing
Hua et al. Detection of aneuploidy from single fetal nucleated red blood cells using whole genome sequencing
Van Opstal et al. Unexpected finding of uniparental disomy mosaicism in term placentas: Is it a common feature in trisomic placentas?
Dennis Lo Fetal DNA in maternal plasma: progress through epigenetics
Sale et al. Planning and executing a genome wide association study (GWAS)
Chen et al. Predicting disease onset from mutation status using proband and relative data with applications to huntington's disease
Du et al. A review of pre-implantation genetic testing technologies and applications
Clerget-Darpoux et al. Will formal genetics become dispensable?
Gerasimova et al. Preimplantation Genetic Screening and Diagnosis Using Fluorescent In Situ Hybridization (FISH)
Chen et al. Research Article Predicting Disease Onset from Mutation Status Using Proband and Relative Data with Applications to Huntington’s Disease
CN107988343A (en) The antenatal ploidy recognition methods of Noninvasive

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1213945

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210928

Address after: 401120 floors 1-2, building 1, No. 101, datagu Middle Road, Xiantao street, Yubei District, Chongqing

Patentee after: Chongqing Huada medical laboratory Co.,Ltd.

Address before: 518083 Huada Complex Park, 21 Hongan Third Street, Yantian District, Shenzhen City, Guangdong Province, 7 buildings, 7 floors-14 floors

Patentee before: BGI SHENZHEN Co.,Ltd.