Description of the drawings
The above-mentioned and/or additional aspect and advantage of the present invention will become from the description with reference to accompanying drawings below to embodiment
It is substantially and easy to understand, wherein:
Fig. 1 is the schematic flow sheet being analyzed according to the utilization hidden Markov model of one embodiment of the invention;With
And
Fig. 2 be according to one embodiment of the present of invention for determining embryonic gene group in presumptive area nucleotide sequence be
The structural representation of system.
Detailed description of the Invention
Embodiments of the invention are described below in detail, the example of the embodiment is shown in the drawings, wherein from start to finish
Same or similar label represents same or similar element or the element with same or like function.Below with reference to attached
The embodiment of figure description is exemplary, is only used for explaining the present invention, and is not considered as limiting the invention.
It should be noted that term " first ", " second " are only used for describing purpose, and it is not intended that indicating or implying phase
To importance or the implicit quantity for indicating indicated technical characteristic.Thus, " first " is defined, the feature of " second " can be with
Express or implicitly include one or more this feature.Further, in describing the invention, unless otherwise saying
Bright, " multiple " are meant that two or more.
The method for determining presumptive area base information in embryonic gene group
In a first aspect of the present invention, the present invention proposes presumptive area base information in a kind of determination embryonic gene group
Method.Embodiments in accordance with the present invention, the method comprises the following steps:
First, the sequencing result of embryonic cell genomic DNA sample, and embryo genetic related individuals genome sample are obtained
This sequencing result.Wherein, the term " embryo genetic related individuals " for being used herein is referred in hereditary meaning, with
Have akin individuality, such as embodiments in accordance with the present invention between embryo, " related of the embryo genetic that can be adopted
Body " is other children of parental generation such as father and mother, grand parents, grand parents and the embryo father and mother of embryo.Embryo mentioned here
Other children of father and mother should be interpreted broadly, and both can be the children, or still unborn children's (fertilization being born
Ovum or embryo), or the embryo of death, or the embryo of In vitro culture or germ cell, as long as with embryo to be detected
Total identical father and mother.
Embodiments in accordance with the present invention, the source of embryonic cell genomic DNA sample is not particularly restricted.According to this
Bright embodiment, specifically, can carry out biopsy sampling for embryo, obtain embryonic cell, and embryonic cell sample is carried out
Whole genome amplification (WGA), to obtain embryonic cell genomic DNA.The term " embryo biopsy " for being adopted herein is
Finger utilizes microtechnique, the separate section embryonic cell from embryo, or from the technology of separate section cell in germ cell/gamete.
Wherein, embodiments in accordance with the present invention, the embryonic cell may be from blastomere, blastaea trophoderm, germ cell and gamete
Any one, can be individual cells, or 2-10 cell.In addition it is also possible to using any containing the pregnant of embryo's nucleic acid
Woman's sample carries out whole genome amplification, to obtain embryonic gene group DNA sample.Thus, it is possible to not affect embryo normally to send out
On the premise of educating, the genome of embryo is effectively monitored.Wherein, " whole genome amplification (WGA) " mainly includes many heavy chains
Displacement amplification (Multiple Displacement Amplification, MDA) and the WGA technologies of PCR-based, can adopt
The WGA amplification flow processs of independent development, it would however also be possible to employ the reagent of the REPLI-g series of business-like test kit such as Qiagen companies
Box, GenomePlex WGA test kits (WGA4) of Sigma-Aldrich companies, Rubicon Genomics companies
PicoPlex WGA test kits, illustra Genomiphi WGA test kits of GE-Healthcare companies etc..In amplification
After DNA sample, the genomic DNA sample of DNA amplification sample and embryo genetic related individuals for embryonic cell, respectively
Build sequencing library.
With regard to for sample of nucleic acid, building the method and flow process of sequencing library, those skilled in the art can be according to difference
Sequencing technologies suitably selected, with regard to the details of flow process, may refer to that such as Illumina companies of manufacturer of instrument are sequenced
The code for being provided, for example, see Illumina companies Multiplexing Sample Preparation Guide (Part#
1005361;) or Paired-End Sample Prep Guide (Part#1005063 Feb2010;Feb 2010), by ginseng
According to being incorporated into herein.Embodiments in accordance with the present invention, from the method and apparatus that biological specimen extracts sample of nucleic acid, also not by spy
Do not limit, the nucleic acid extraction kit of commercialization can be adopted to carry out.
After sequencing library is built, sequencing library is applied to instrument is sequenced, sequencing library is sequenced, and obtain phase
The sequencing result answered, the sequencing result is made up of multiple sequencing datas.Embodiments in accordance with the present invention, can be used for carrying out
The method and apparatus of sequencing is not particularly restricted, including but not limited to dideoxy chain termination;It is preferred that high-throughout sequencing side
Method, the characteristics of thereby, it is possible to using the high flux of these sequencing devices, deep sequencing, further improves sequencing efficiency.So as to,
Can improve and subsequently sequencing data is analyzed, especially the accuracy and accuracy during statistical check analysis.The high pass
The sequence measurement of amount includes but is not limited to second filial generation sequencing technologies or single-molecule sequencing technology.The second filial generation microarray dataset
(Metzker ML.Sequencing technologies-the next generation.Nat Rev
Genet.2010Jan;11(1):31-46) include but is not limited to Illumina-Solexa (GATM, HiSeq2000TM, Miseq
Deng), ABI-SOLiD, Life Technologies-Ion Torrent/Proton and Roche-454 (pyrosequencing) sequencing
Platform;Single-molecule sequencing platform (technology) includes but is not limited to the true single-molecule sequencing technology (True of Helicos companies
Single Molecule DNA sequencing), Pacific Biosciences companies unimolecule is sequenced in real time (single
molecule real-time(SMRTTM)), and the nano-pore sequencing skill of Oxford Nanopore Technologies companies
(Rusk, Nicole (2009-04-01) the .Cheap Third-Generation Sequencing.Nature Methods such as art
6(4):244-245).With the continuous evolution of sequencing technologies, skilled artisans appreciate that be can also to adopt other
Sequence measurement and device carry out genome sequencing.Specific example of the invention, it is possible to use selected from Illumina-
Solexa, ABI-SOLiD, Life Technologies-Ion Torrent/Proton, Roche-454 and single-molecule sequencing are filled
At least one put is sequenced to the genome sequencing library.
Embodiments in accordance with the present invention, after sequencing result is obtained, can be based on embryonic cell genomic DNA sample
Sequencing result, builds the hereditary sketch of the embryo, to determine embryo's initial gene type.It is related individual based on the embryo genetic
The sequencing result of body genome sample, determines the haplotype of embryo father and mother.Embodiments in accordance with the present invention, by the way that embryo is thin
The sequencing result of born of the same parents' genomic DNA sample is compared with reference sequences, builds the hereditary sketch of the embryo.By will be described
The sequencing result of embryo genetic related individuals genome sample is compared with reference sequences, determines related of the embryo genetic
The genotype of body;And the genotype based on the embryo genetic related individuals, determine the haplotype of the embryo father and mother.Root
According to embodiments of the invention, can be using known mankind's reference gene group as reference sequences.For example, reality of the invention
Example is applied, the mankind's reference gene group for adopting is for NCBI 36.3, HG18.In addition, embodiments in accordance with the present invention, compare
Method is not particularly restricted.According to a particular embodiment of the invention, can be compared using SOAP.
Finally, using embryo's initial gene type as observation sequence, based on the haplotype of the embryo father and mother, according to
Hidden Markov model, determines the base information of presumptive area in embryonic gene group.
The term " presumptive area " for being used herein should broadly understood, and refer to any comprising the predetermined thing of possible generation
The region of the nucleic acid molecules in part site.For snp analysis, the region comprising SNP site is may refer to.For analysis dyeing
Body aneuploidy, then presumptive area refer to total length or the part of the chromosome to be analyzed, that is, select all from the dye
The sequencing data of colour solid.The method of the sequencing data from respective regions is selected from sequencing result can be not particularly limited.
Embodiments in accordance with the present invention, can be by the way that resulting all sequencing datas be compared with known nucleic acid reference sequences
It is right, so as to obtain coming from the sequencing data of presumptive area.In addition, embodiments in accordance with the present invention, presumptive area can also be
Discontinuous multiple spaced points on genome.Embodiments in accordance with the present invention, it is possible to use the type of reference sequences do not receive
Especially limit, can be any known array containing area-of-interest.
Embodiments in accordance with the present invention, can be by the sequencing result based on the embryonic cell, with reference to embryo genetic phase
Individual hereditary information is closed, according to hidden Markov model, the base information of the presumptive area is determined.Reality of the invention
Example is applied, using Hui Te than algorithm (Viterbi algorithm), embryonic gene group can be determined by hidden Markov model
The base information of middle specific region.Thus, it is possible to effectively carry out being implanted into front detection to the hereditary information of embryonic gene group.
Below with reference to Fig. 1, to being carried out in detail than the principle that algorithm is analyzed using Hui Te by hidden Markov model
Description:
As it was previously stated, the term " embryo genetic related individuals " for being used herein is referred in hereditary meaning, with
Have akin individuality, such as embodiments in accordance with the present invention between embryo, " related of the embryo genetic that can be adopted
Body " is other children of parental generation such as father and mother, grand parents, grand parents and the embryo father and mother of embryo.Embryo mentioned here
Other children of father and mother should be interpreted broadly, and both can be the children, or still unborn children (embryo being born
Or germ cell), or the embryo of death, or the embryo of In vitro culture or germ cell, as long as with embryo to be detected
Total identical father and mother.
Thus, the genome forming process of filial generation, it is (i.e. chain to exchange single equivalent to once recombinating at random for parental gene group
Ploidy is recombinated, and the random combine of gamete).Using the haplotype of embryo as hidden state (hidden states), by embryo
Sequencing data after the unicellular whole genome amplification of tire is extrapolated as observation sequence (observations) by priori data
State transition probability (transition probabilities), builds and determines observation sequence probability distribution (observation
Symbol probabilities) and initial state probabilities distribution (initial state distribution), then pass through
Hui Te can be inferred that most probable embryo's haplotype combination than algorithm (Viterbi algorithm).Thus, according to the present invention
Embodiment, by hidden Markov model, for example can be by using Hui Te than algorithm (Viterbi algorithm), ginseng
Examine the hereditary information of embryo genetic related individuals, it may be determined that the nucleotide sequence of specific region in embryonic gene group, thus, it is possible to
Effectively the hereditary information of embryonic gene group is carried out being implanted into front detection.
Labor step is as follows:
Labelling:
I. the number of sites for detecting is needed to be N.
II. observation sequence;
III. hidden state collection is:S={ 0,1 }, defines that father and mother which bar chromosomal inheritance gives filial generation.0 represents heredity
That chromosome of proband is given, 1 represents that chromosome for not entailing proband.
IV. observer state collection is:V={ 0,1 }, 1 represents the chromosome and embryo genetic sketch gene for entailing proband
Type is consistent, 0 represent it is inconsistent.
The first step, builds initial state probabilities distribution vector, monoploid restructuring transfer matrix and observation sequence probability square
Battle array
I. initial state probabilities distribution is designated as π={ πi, (i ∈ S, πi=0.5),
In the case of without reference to data, it can be assumed that the probability that every kind of hidden state occurs is equal.
II. remember that monoploid restructuring transfer matrix is designated as A={ aij, the probability of (i, j ∈ S), i.e. hidden state transfer.
Nr, Np represent respectively expectation restructuring number and mononucleotide polymorphism site number, according to the desirable 20-40 of priori data Nr
Between natural number, for calculating the transition probability of hidden state, that is, calculate each base composition haplotype occur restructuring
Probability.
III. remember that observation sequence probability matrix is B={ bi(k) }, (i ∈ S, k ∈ V), probability distribution b hereiK () uses son
In generation, must be homozygosis (Must-hom) or must be the site of heterozygosis (Must-het) estimating.
Second step, builds local probability matrix and reverse pointer
Define local probability:δt(j)=max [δt-1(i)·aij]·bj(K) t∈{1...N}。
Define reverse pointer:Ψi(j)=arg max δt-1(i)·aij t∈{1...N}。
3rd step, determines end-state state, and recalls optimal path
Determine end-state,
Backtracking optimal path, i.e. most probable embryo single-gene type qt *=Ψt+1(q* t+1) (t=1,2,3 ..., N-1).
4th step, output result
That is, embodiments in accordance with the present invention, according to hidden Markov model, determine the alkali of presumptive area in embryonic gene group
Base information is further included:
Build initial state probabilities distribution vector, the probability matrix of hidden state transfer and observation sequence probability matrix;
Determine end-state than algorithm using Hui Te and recall optimal path, to determine embryonic gene group in presumptive area
Base information.
According to specific embodiment, the hidden Markov model adopts following parameters:
Initial state probabilities are distributed as π={ πi, (i ∈ S, πi=0.5), the probability matrix of hidden state transfer is A=
{aij, (i, j ∈ S), wherein,Nr, Np represent respectively expectation restructuring number and mononucleotide
Polymorphic position is counted, and Nr is natural number, and span 20-40, observation sequence probability matrix is B={ bi(k) }, (i ∈ S, k ∈
V), wherein,
#sites (L > 0, Must-hom.) is set to the number in the site of homozygosis, #sites (L > 0, Must- for filial generation one
Hom.or Must-het.) be set to for filial generation one homozygosis site number and filial generation one be set to heterozygosis site number
Summation;
Local probability is δt(j)=max [δt-1(i)·aij]·bj(K) t ∈ { 1...N },
Reverse pointer is Ψi(j)=arg max δt-1(i)·aijT ∈ { 1...N },
Yi Huite is than algorithm Jing recursive call end-stateBacktracking optimal path, it is determined that most
The base information of possible embryo's presumptive area is qt *=Ψt+1(q* t+1) (t=1,2,3 ..., N-1).
Wherein, term " local probability δ used herein abovei(qi) " and " reverse pointer Ψi(qi) " all continue to use
The classical definition of Viterbi algorithm.With regard to the detailed description of the definition of the parameter, Lawrence R.Rabiner are may refer to,
PROCEEDINGS OF THE 2 months IEEE, Vol.77, No.2,1989 year, by referring to being incorporated by herein.
Thereby, it is possible to effectively be analyzed to the sequence of embryonic gene group.Compare and detect skill before other existing implantation
Art method, this method has following technical advantage, is mainly reflected on accuracy and obtainable hereditary information amount:
1) according to embodiments of the present invention, for the father source property site and Disease in Infants site of embryo, can detect well,
Detection accuracy may be up to more than 95%, and can detect various variation types, expand the scope of disease detection.
2) according to embodiments of the present invention, for some are embryo is unicellular or few cells whole genome amplification DNA sequencing
As a result middle level of coverage is relatively low, site or the DNA sequences that there is heterozygous deletion or allele dropout (Allele Drop Out)
Row, can exactly be inferred by this method and be obtained.Therefore by this method, can be to heterozygous deletion or allele dropout
The site of (Allele Drop Out) or sequence are corrected, and make testing result more accurately and reliably.
3) according to embodiments of the present invention, genetic diseasess mapping can be carried out, for some chain relevant diseases, other can be passed through
The information in site is directly inferred to, once obtainable to contain much information, and more has directive significance to Clinical detection.
In addition, according to embodiments of the present invention, the method for presumptive area base information in the determination embryonic gene group of the present invention,
A certain kind genetic polymorphism such as SNP or STR site is not limited only to, it is applicable to all of genetic polymorphism site, and can
Used simultaneously with various sites, so as to checking mutually.
The system of presumptive area base information in for determining embryonic gene group
In still another aspect of the invention, the present invention proposes a kind of for determining presumptive area nucleic acid sequence in embryonic gene group
The system of row.Embodiments in accordance with the present invention, with reference to Fig. 2, the system 1000 can include:Library construction device 100, sequencing dress
Put 200 and analytical equipment 400.
Embodiments in accordance with the present invention, library construction device 100 is suitable to be lost for embryonic gene group DNA sample and embryo
The genomic DNA sample of related individuals is passed, sequencing library is built respectively.Embodiments in accordance with the present invention, sequencing device 200 and text
Storehouse construction device 100 is connected, and is suitable to that the sequencing library is sequenced, related to obtain embryo and embryo genetic
Individual sequencing result.Embodiments in accordance with the present invention, can further include DNA sample segregation apparatuss and DNA cloning dress
(not shown) is put, the DNA sample segregation apparatuss are suitable to carry out biopsy sampling for embryo, obtain embryonic cell.Embryo is thin
Born of the same parents may be from any one in blastomere, blastaea trophoderm, germ cell and gamete, can be individual cells, or micro-
Amount such as 2-10 cell of cell, it would however also be possible to employ any anemia of pregnant woman's sample containing embryo's nucleic acid.The DNA cloning device is suitable to pin
Sampling the embryonic cell for obtaining to biopsy carries out whole genome amplification, to obtain q.s DNA for subsequent detection analysis.By
This, can effectively be monitored on the premise of embryo's normal development is not affected to the genome of embryo.Expand with regard to full-length genome
The method and flow process of increasing, mainly including multiple strand displacement amplification (Multiple Displacement Amplification,
MDA) and PCR-based WGA technologies, can adopt independent development WGA amplification flow process, it would however also be possible to employ business-like test kit
Such as test kit, the GenomePlex WGA test kits of Sigma-Aldrich companies of the REPLI-g series of Qiagen companies
(WGA4), PicoPlex WGA test kits, the illustra of GE-Healthcare companies of Rubicon Genomics companies
Genomiphi WGA test kits etc..
With regard to for sample of nucleic acid, building the method and flow process of sequencing library, those skilled in the art can be according to difference
Sequencing technologies suitably selected, with regard to the details of flow process, may refer to that such as Illumina companies of manufacturer of instrument are sequenced
The code for being provided, for example, see Illumina companies Multiplexing Sample Preparation Guide (Part#
1005361;Feb 2010) or Paired-End SamplePrep Guide (Part#1005063;Feb 2010), by ginseng
According to being incorporated into herein.Embodiments in accordance with the present invention, from the method and apparatus that biological specimen extracts sample of nucleic acid, also not by spy
Do not limit, the nucleic acid extraction kit of commercialization can be adopted to carry out.Embodiments in accordance with the present invention, can be used for being sequenced
Method and apparatus be not particularly restricted, including but not limited to dideoxy chain termination;It is preferred that high-throughout sequence measurement, by
This, the characteristics of can utilize high flux, the deep sequencing of these sequencing devices, further improves sequencing efficiency.Thus, it is possible to carry
Height is subsequently analyzed to sequencing data, especially the accuracy and accuracy during statistical check analysis.The high-throughout survey
Sequence method includes but is not limited to second filial generation sequencing technologies or single-molecule sequencing technology.The second filial generation microarray dataset
(Metzker ML.Sequencing technologies-the next generation.Nat Rev Genet.2010
Jan;11(1):31-46) include but is not limited to Illumina-Solexa (GATM、HiSeqTM, Miseq etc.), ABI-SOLiD,
Life Technologies-Ion Torrent/Proton and Roche-454 (pyrosequencing) microarray datasets;Unimolecule is surveyed
Sequence platform (technology) includes but is not limited to true single-molecule sequencing technology (the True Single Molecule of Helicos companies
DNA sequencing), Pacific Biosciences companies unimolecule is sequenced in real time (single molecule real-
time(SMRTTM)), and the nano-pore sequencing technology of Oxford Nanopore Technologies companies etc. (Rusk,
Nicole(2009-04-01).Cheap Third-Generation Sequencing.Nature Methods 6(4):244-
245).With the continuous evolution of sequencing technologies, skilled artisans appreciate that be can also be using other sequencing sides
Method and device carry out genome sequencing.Specific example of the invention, it is possible to use selected from Illumina-Solexa,
ABI-SOLiD, Life Technologies-Ion Torrent/Proton, Roche-454 and single-molecule sequencing device are extremely
Few one kind is sequenced to the genome sequencing library.
Embodiments in accordance with the present invention, can also include comparison device 300.Embodiments in accordance with the present invention, comparison device
300 are connected with sequencing device 200, and are suitable to that resulting sequencing result and reference sequences are compared, with will pass through by
The sequencing result of embryonic cell genomic DNA sample is compared with reference sequences, builds the hereditary sketch of the embryo;Pass through
The sequencing result of the embryo genetic related individuals genome sample is compared with reference sequences, the embryo genetic is determined
The genotype of related individuals;And the genotype based on the embryo genetic related individuals, determine single times of the embryo father and mother
Build.Embodiments in accordance with the present invention, it is possible to use the type of reference sequences be not particularly restricted, can for it is any containing
The known array of area-of-interest.Embodiments in accordance with the present invention, can be using known mankind's reference gene group as reference
Sequence.For example, embodiments in accordance with the present invention, the mankind's reference gene group for adopting is for NCBI 36.3, HG18.In addition, according to this
Inventive embodiment, the method compared is not particularly restricted.According to a particular embodiment of the invention, SOAP can be adopted
Compare.
Embodiments in accordance with the present invention, analytical equipment 400 is suitable to:Sequencing based on embryonic cell genomic DNA sample is tied
Really, the hereditary sketch of the embryo is built, to determine embryo's initial gene type;Based on the embryo genetic related individuals gene
The sequencing result of group sample, determines the haplotype of embryo father and mother;And according to hidden Markov model, it is initial with the embryo
Genotype, based on the haplotype of the embryo father and mother, determines the base of presumptive area in embryonic gene group as observation sequence
Information.
Embodiments in accordance with the present invention, according to hidden Markov model, determine the base of presumptive area in embryonic gene group
Information is further included:
Build initial state probabilities distribution vector, the probability matrix of hidden state transfer and observation sequence probability matrix;
Determine end-state than algorithm using Hui Te and recall optimal path, to determine embryonic gene group in presumptive area
Base information.
According to specific embodiment, the hidden Markov model adopts following parameters:
Initial state probabilities are distributed as π={ πi, (i ∈ S, πi=0.5), the probability matrix of hidden state transfer is A=
{aij, (i, j ∈ S), wherein,Nr, Np represent respectively expectation restructuring number and mononucleotide
Polymorphic position is counted, the natural number between the desirable 20-40 of Nr, and observation sequence probability matrix is B={ bi(k) }, (i ∈ S, k ∈
V), wherein,
#sites (L > 0, Must-hom.) is set to the number in the site of homozygosis, #sites (L > 0, Must- for filial generation one
Hom.or Must-het.) be set to for filial generation one homozygosis site number and filial generation one be set to heterozygosis site number
Summation;
Local probability is δt(j)=max [δt-1(i)·aij]·bj(K) t ∈ { 1...N },
Reverse pointer is Ψi(j)=arg max δt-1(i)·aijT ∈ { 1...N },
Recursive call end-state isBacktracking optimal path, determines that most probable embryo makes a reservation for
The base information in region is qt *=Ψt+1(q* t+1) (t=1,2,3 ..., N-1).
Term local probability δ used herein abovei(qi) and reverse pointer Ψi(qi) all it is the Jing for continuing to use Viterbi algorithm
Allusion quotation is defined.With regard to the detailed description of the definition of the parameter, Lawrence R.Rabiner, PROCEEDINGS OF are may refer to
2 months THE IEEE, Vol.77, No.2,1989 year, by referring to being incorporated by herein.
With regard to data analysis component, before have been carried out describe in detail, also of course be applied to determine embryonic gene group
The system of middle presumptive area nucleotide sequence.Repeat no more.
Thus, using the system, presumptive area nucleic acid in foregoing determination embryonic gene group can effectively be implemented
The method of sequence, can decode for example, by Hui Te by hidden Markov model than algorithm (Viterbi algorithm),
The base information of specific region in embryonic gene group is determined, thus, it is possible to effectively carry out to the hereditary information of embryonic gene group
Detection before implantation.
Additionally, embodiments in accordance with the present invention, presumptive area is the known site that there is genetic polymorphism, and genetic polymorphism
Property is at least one selected from single nucleotide polymorphism and STR.
Herein described term " connected " should broadly understood, and both can be to be joined directly together, or indirect phase
Even, as long as the linking in above-mentioned functions can be realized.
It should be noted that it will be appreciated by those skilled in the art that pre- in determination embryonic gene group described above
The feature and advantage for determining the method for region nucleotide sequence are also suitable for determining that presumptive area nucleotide sequence is in embryonic gene group
System, for convenience of description, no longer describes in detail.
Computer-readable medium
In still another aspect of the invention, the present invention proposes a kind of computer-readable medium.Embodiments in accordance with the present invention,
Be stored with instruction on the computer-readable medium, the instruction be suitable to be executed by processor so as to:
Based on the sequencing result of embryonic cell genomic DNA sample, the hereditary sketch of embryo is built, at the beginning of to determine embryo
Beginning genotype;
Based on the sequencing result of the embryo genetic related individuals genome sample, the haplotype of embryo father and mother is determined;With
And
According to hidden Markov model, using embryo's initial gene type as observation sequence, the list based on the embryo father and mother
Ploidy, determines the base information of presumptive area in embryonic gene group.
Thus, using the computer-readable medium, foregoing method can effectively be implemented, such that it is able to pass through example
If Hui Te is than algorithm (Viterbi algorithm), by hidden Markov model, specific region in embryonic gene group is determined
Base information, thus, it is possible to effectively carry out being implanted into front detection to the hereditary information of embryonic gene group.
Embodiments in accordance with the present invention, according to hidden Markov model, determine the base of presumptive area in embryonic gene group
Information is further included:
Build initial state probabilities distribution vector, the probability matrix of hidden state transfer and observation sequence probability matrix;
Determine end-state than algorithm using Hui Te and recall optimal path, to determine embryonic gene group in presumptive area
Base information.
According to specific embodiment, the hidden Markov model adopts following parameters:
Initial state probabilities are distributed as π={ πi, (i ∈ S, πi=0.5), the probability matrix of hidden state transfer is A=
{aij, (i, j ∈ S), wherein,Nr, Np represent respectively expectation restructuring number and mononucleotide
Polymorphic position is counted, the natural number between the desirable 20-40 of Nr, and observation sequence probability matrix is B={ bi(k) }, (i ∈ S, k ∈
V), wherein,
#sites (L > 0, Must-hom.) is set to the number in the site of homozygosis, #sites (L > 0, Must- for filial generation one
Hom.or Must-het.) be set to for filial generation one homozygosis site number and filial generation one be set to heterozygosis site number
Summation;
Local probability is δt(j)=max [δt-1(i)·aij]·bj(K) t∈{1...N}
Reverse pointer is Ψi(j)=arg max δt-1(i)·aijT ∈ { 1...N },
Jing recursive call end-state isBacktracking optimal path, determines that most probable embryo is pre-
The base information for determining region is qt *=Ψt+1(q* t+1) (t=1,2,3 ..., N-1).
Term local probability δ used herein abovei(qi) and reverse pointer Ψi(qi) all it is the Jing for continuing to use Viterbi algorithm
Allusion quotation is defined.With regard to the detailed description of the definition of the parameter, Lawrence R.Rabiner, PROCEEDINGS OF are may refer to
2 months THE IEEE, Vol.77, No.2,1989 year, by referring to being incorporated by herein.
With regard to data analysis component, before have been carried out describe in detail, also of course be applied to determine embryonic gene group
The system of middle presumptive area nucleotide sequence.Repeat no more.
Thus, using the system, presumptive area nucleic acid in foregoing determination embryonic gene group can effectively be implemented
The method of sequence, can for example, by Hui Te than algorithm (Viterbi algorithm), by hidden Markov model, it is determined that
The base information of specific region in embryonic gene group, thus, it is possible to effectively be implanted into the hereditary information of embryonic gene group
Front detection.
Additionally, embodiments in accordance with the present invention, presumptive area is the known site that there is genetic polymorphism, and genetic polymorphism
Property is at least one selected from single nucleotide polymorphism and STR.
For the purpose of this specification, " computer-readable medium " can any can be included, store, communicate, propagate or pass
The dress that defeated program is used for instruction execution system, device or equipment or with reference to these instruction execution systems, device or equipment
Put.The more specifically example (non-exhaustive list) of computer-readable medium includes following:With the electricity that one or more are connected up
Connecting portion (electronic installation), portable computer diskette box (magnetic device), random access memory (RAM), read only memory
(ROM), erasable edit read-only storage (EPROM or flash memory), fiber device, and portable optic disk is read-only deposits
Reservoir (CDROM).In addition, computer-readable medium can even is that the paper that can thereon print described program or other are suitable
Medium, because for example can then enter edlin, interpretation or if necessary with it by carrying out optical scanning to paper or other media
His suitable method is processed to electronically obtain described program, in being then stored in computer storage.
It should be appreciated that each several part of the present invention can be realized with hardware, software, firmware or combinations thereof.Above-mentioned
In embodiment, the software that multiple steps or method can in memory and by suitable instruction execution system be performed with storage
Or firmware is realizing.For example, if realized with hardware, and in another embodiment, can be with well known in the art
Any one of row technology or their combination are realizing:With for realizing the logic gates of logic function to data signal
Discrete logic, the special IC with suitable combinational logic gate circuit, programmable gate array (PGA), scene
Programmable gate array (FPGA) etc..
Those skilled in the art are appreciated that to realize all or part of step that above-described embodiment method is carried
Suddenly the hardware that can be by program to instruct correlation is completed, and described program can be stored in a kind of computer-readable storage medium
In matter, the program upon execution, including one or a combination set of the step of embodiment of the method.
Additionally, each functional unit in each embodiment of the invention can be integrated in a processing module, it is also possible to
It is that unit is individually physically present, it is also possible to which two or more units are integrated in a module.Above-mentioned integrated mould
Block both can be realized in the form of hardware, it would however also be possible to employ the form of software function module is realized.The integrated module is such as
Fruit is realized and as independent production marketing or when using using in the form of software function module, it is also possible to be stored in a computer
In read/write memory medium.
The solution of the present invention is explained below in conjunction with embodiment.It will be understood to those of skill in the art that following
Embodiment is merely to illustrate the present invention, and should not be taken as limiting the scope of the invention.Unreceipted particular technique or bar in embodiment
Part, (for example write with reference to J. Pehanorm Brookers etc., Huang Peitang etc. is translated according to the technology or condition described by document in the art
's《Molecular Cloning:A Laboratory guide》, the third edition, Science Press) or carry out according to product description.Agents useful for same or instrument
Unreceipted production firm person, be can pass through city available from conventional products, for example can purchase from Illumina companies.
Conventional method:
With reference to Fig. 1, the key step of the embodiment of the present invention includes:
1) unicellular or few cells sample, the 3rd day spilting of an egg glomus cell or blastaea trophoblastic cell biopsy in the 5th day are obtained
Sampling, or germ cell sampling, or other unicellular or few cells samplings.
2) unicellular or few cells sample carries out whole genome amplification, and amplified production DNA is according to new-generation sequencing platform
Require to build library, and be sequenced.
3) unicellular or few cells sequencing data is compared after filtration with human genome reference sequences.
4) the preliminary inspection of the hereditary variation such as embryonic cell single nucleotide polymorphism and disappearance repetition is carried out according to comparison result
Survey.
5) the hereditary sketch of embryonic gene group is built, embryonic gene type is initialized.
6) embryo genetic related individuals, such as family member such as father and mother and proband are collected, and/or grand parents, grand parents
Deng or with other embryo's samples of a pair of father and mother, extract genomic DNA, build library according to new-generation sequencing Platform Requirements,
And be sequenced.
7) embryo genetic related individuals DNA sequencing data are filtered, and are compared with human genome reference sequences.
8) according to comparison result, analysis determines the individual genotype of the genetic correlations such as proband, father and mother.
9) using the genotype that the genetic correlations such as proband, father and mother are individual, the haplotype of parents is inferred.
10) by HMM decoding process, with father and mother's haplotype result embryo's haplotype is inferred.
11) the final determination of embryo genetic variation.
With reference to Fig. 1, information analysis part is used in the embodiment of the present invention HMM and labor step
It is rapid as follows:
Labelling:
1st, the number of sites for needing detection is N.
2nd, observation sequence:
3rd, hidden state collection is:S={ 0,1 }, defines that father and mother which bar chromosomal inheritance gives filial generation.0 representative is entailed
That chromosome of proband.
4th, observer state collection is:V={ 0,1 }.1 represents the chromosome and embryo genetic sketch genotype for entailing proband
Unanimously.0 represent it is inconsistent.
The first step, builds initial state probabilities distribution vector, monoploid restructuring transfer matrix and observation sequence probability square
Battle array
I. initial state probabilities distribution is designated as π={ πi, (i ∈ S, πi=0.5)
In the case of without reference to data, if the probability that every kind of hidden state occurs is equal.
II. remember that monoploid restructuring transfer matrix is designated as A={ aij, the probability of (i, j ∈ S), i.e. hidden state transfer.
Nr, Np represent respectively expectation restructuring number and mononucleotide polymorphism site number.
Build observation sequence probability matrix
Note observation sequence probability matrix is B={ bi(k) }, (i ∈ S, k ∈ V), probability distribution filial generation here must be
Homozygosis (Must-hom) must be the site of heterozygosis (Must-het) estimating.
Second step, builds local probability matrix, and reverse pointer
Define local probability:δt(j)=max [δt-1(i)·aij]·bj(K) t∈{1...N}
Define reverse pointer:Ψi(j)=arg max δt-1(i)·aij t∈{1...N}
3rd step, determines end-state state, and recalls optimal path
Determine end-state,
Backtracking optimal path, i.e. most probable embryo single-gene type qt *=Ψt+1(q* t+1) (t=1,2,3 ..., N-1)
4th step, output result