CN107239676A

CN107239676A - A kind of sequence data processing unit for embryo chromosome

Info

Publication number: CN107239676A
Application number: CN201710347798.0A
Authority: CN
Inventors: 糜庆丰; 陈样宜; 黄铨飞; 彭春方; 饶兴蔷; 罗东红
Original assignee: CapitalBio Genomics Co Ltd
Current assignee: CapitalBio Genomics Co Ltd
Priority date: 2017-05-17
Filing date: 2017-05-17
Publication date: 2017-10-10
Anticipated expiration: 2037-05-17
Also published as: CN107239676B

Abstract

Include processor the invention discloses a kind of sequence data processing unit for embryo chromosome, be adapted for carrying out various instructions, instruction is suitable to be loaded by processor and perform following steps：Obtain unique sequence of matching completely；According to the reading long segment distribution situation of unique sequence of matching completely, divide different reading length interval, calculate the DNA fragmentation ratio of each length of interval on every chromosome, whether according to DNA fragmentation ratio and known autosome DNA fragmentation ratio in the case where different length interval difference between the two of the chromosome different length to be measured under interval, it is aneuploid to judge chromosome to be measured；The DNA fragmentation ratio is drawn according to the length computation of all autosomal DNA fragmentation number summations under length of interval of the DNA fragmentation number under length of interval, sample and chromosome.Apparatus of the present invention can improve the accuracy rate of embryo chromosome numerical abnormality detection, it is possible to decrease false positive rate and false positive rate.The device of the present invention can be widely applied in high throughput sequencing technologies.

Description

A kind of sequence data processing unit for embryo chromosome

Technical field

The present invention relates to data processing technique, more particularly to a kind of sequence data processing unit for embryo chromosome, Suitable for embryo chromosome aneuploid detection technique.

Background technology

Chromosome abnormality is to cause the important clinical factors such as spontaneous abortion, inborn defect, fetus Poly-monstrosity.The dyeing Body includes numerical abnormalities of chromosomes and the micro- repetition of microdeletion extremely.Wherein, the unknown spontaneous abortion of pregnant early stage reason Caused by middle major part is chromosome aneuploid, B ultrasound shows in the fetus for exist Poly-monstrosity 10%, and to anomaly exist chromosome non- In euploid, the neonate of inborn defect about 20% also be chromosome abnormality caused by.Therefore, chromosome abnormality is detected, This aspect is conducive to caused by whether investigation miscarriage be fetal chromosomal abnormalities, particularly to multiple for Early-stage cervical cancer Couple can be carried out chromosome abnormality detection by the pregnant woman of the pregnant early stage recurrent abortion of unknown reason, pregnant again to reduce The possibility of abnormal infant birth when being pregnent；On the other hand, be conducive to caused by whether early detection fetal abnormality be chromosome abnormality, The auxiliary information of diagnosis is provided for doctor, so as to realize the early treatment of fetal abnormality, and then inborn defect is reduced.

In addition, in recent years, the fast development of Issues of Human Assisted Reproductive Technologies causes " test-tube baby " technology to be gradually applied to face Bed, helps man and wife that is more infertile or older or carrying genetic disease to obtain of future generation.But a large amount of clinical researches It was found that, it is fertilized in vitro in the embryo of formation, the embryo of only about half of left and right has the phenomenon of chromosome abnormality, and this is often perhaps The main cause [1] of plantation failure or spontaneous abortion or stillbirth repeatedly occurs in grand multigravida.And as pregnant woman age increases, embryo The risk of fetal hair life chromosome abnormality is also higher, significantly limit the success rate of auxiliary procreation technology.Therefore, before Embryonic limb bud cell Accurate examination that can be abnormal to embryo chromosome, and then select the Embryonic limb bud cell of health, is can significantly improve test-tube baby pregnant Rate of being pregnent and live birth rate.

At present, the method detected for chromosome abnormality mainly includes FISH, microarray-comparative genome hybridization (array-CGH) technology and high throughput sequencing technologies.Fluorescence in situ hybridization technique (fluorescence in situ Hybridization, FISH) be early phase chromosome abnormality detection golden standard.Although FISH has quick, specificity high Advantage, but be due to be limited by probe species and the plain species of mark fluorescent so that the technology is only capable of once to chromosome dyad Numerical abnormality is detected, and can not be detected in the level of full-length genome.More it is widely used in chromosome at present different The method often detected is microarray-comparative genome hybridization (array-CGH) technology [2].Compared to FISH technology, array-CGH Technology only can just detect that all 23 pairs of chromosome numbers change by a hybrid experiment, but the resolution ratio of its detection takes Certainly in the density (the unlapped region of probe can not be detected) of probe, to detect 23 pairs of dyes from full-length genome level The abnormal situation of colour solid, must just increase the quantity of probe, substantially increase cost.And with high-flux sequence cost Reduction, in recent years, based on high throughput sequencing technologies carry out embryo chromosome aneuploid detection method be increasingly becoming master Stream.

Detect that the main process of embryo chromosome aneuploid is as follows based on high throughput sequencing technologies：1), obtain reasonable (apoblema tissue or embryonic tissue then can be with direct enzyme cuttings or ultrasonic by DNA fragmentation for the DNA profiling of quantity；And blastomere Or cleavage-cell due to starting DNA profiling for microgram rank so needing to carry out unicellular amplification in advance)；2) one, is selected Determine the DNA molecular (such as 150-250bp) of clip size；3) library, is built, sequencing joint is added at above-mentioned DNA molecular two ends； 4), upper machine sequencing obtains the sequence (reads) of certain length；5), using compare software by sequence (reads) compare to the mankind ginseng Genome, filtering repetitive sequence and low-quality sequence are examined, the sequence number (reads of each chromosome diverse location is obtained ) and sequence ratio (reads ratio) number；6), judge that embryo whether there is chromosome abnormality using statistical model.Work as embryo When there is chromosome aneuploid in tire, corresponding total chromosome number have it is a certain proportion of be raised and lowered, therefore can with it is certain The reference set that amount sample is constituted compare or itself sample in compare to judge chromosome with the presence or absence of abnormal.Chromosome is different The statistical method often detected can be largely classified into reference subset composition and division in a proportion compared with compare two methods in itself sample.

Reference subset composition and division in a proportion compared with exemplary process be Z test [3]：Z test model is built using a large amount of normal samples Reference database, obtains average and standard deviation that reference data concentrates the long ratio of reading (reads ratio) of each chromosome, then Z-score of the sample to be tested in every chromosome is calculated, come judgement sample whether is aneuploid according to Z-score.But, The subject matter that Z test model has is that the Z-score sizes of sample to be tested are very strong to the model dependence of reference data set, If sensitivity and specificity serious reduction can be caused when sample to be tested and the low data consistency of reference sample set.It is right In aneuploidy screening before Embryonic limb bud cell (PGS), the starting DNA content of embryo is about 6.6pg~30pg, the template of DNA startings Content is very low, so need to carry out whole genome amplification (whole genome amplification, WGA) and then sequencing, And whole genome amplification can introduce serious GC preferences, this often leads to the uniformity of sample to be tested and reference data set sample very Difference, it is seen then that Z-score models are not suitable for Embryonic limb bud cell prochromosome aneuploid detection method.

Therefore, examination mainly uses the method compared in itself sample before Embryonic limb bud cell：Genome is divided into different windows The bins (data box) of size, counts all bins sequence ratio (copy ratio), then by reading the change of long ratio Trend infers whether to exist chromosome abnormality [4].And the subject matter based on the method for inspection compared in itself sample then exists Single statistical indicator " the copy ratio ", when unicellular amplification homogeneity is poor of single sample are based only in the result of inspection When, " copy ratio " fluctuation is very big, it may appear that the result of substantial amounts of exceptional value and false positive.Therefore in order to solve to pass The problem of result precision and low reliability in itself sample produced by comparative approach of uniting, the present invention is in itself sample The data handling procedure of comparative approach proposes improvement.

Bibliography

1.Bielanska,M.,S.L.Tan,and A.Ao,Chromosomal mosaicism throughout human preimplantation development in vitro:incidence,type,and relevance to embryo outcome.Hum Reprod,2002.17(2):p.413-9.

2.Gutierrez-Mateo,C.,et al.,Validation of microarray comparative genomic hybridization for comprehensive chromosome analysis of embryos.Fertil Steril,2011.95(3):p.953-8.

3.Chiu,R.W.,et al.,Noninvasive prenatal diagnosis of fetal chromosomal aneuploidy by massively paralel genomic sequencing of DNA in maternal plasma.Proc Natl Acad Sci U S A,2008.105(51):p.20458-63.

4.Fu,Y.,et al.,Uniform and accurate single-cel sequencing based on emulsion whole-genome amplification.Proc Natl Acad Sci U S A,2015.112(38): p.11923-8.

The content of the invention

In order to solve the above-mentioned technical problem, at it is an object of the invention to provide a kind of sequence data for embryo chromosome Manage device.

The technical solution adopted in the present invention is：A kind of sequence data processing unit for embryo chromosome, the device Including：

Sequencing data acquiring unit, the DNA obtained for obtaining after high-flux sequence reads long segment；

Sequencing data processing unit, is compared for the DNA of acquisition to be read into long segment with human genome standard sequence, Each DNA is read into long segment to compare to chromosome relevant position, so as to obtain the chromosome corresponding to each DNA readings long segment, starting Site and sequence length, and unique sequence of matching completely；

Data result analytic unit, for the reading long segment distribution situation according to unique sequence of matching completely, divides different Reading length it is interval, the DNA fragmentation ratio of each length of interval on every chromosome is calculated, according to chromosome different length area to be measured Between under the difference of DNA fragmentation ratio between the two in the case where different length is interval of DNA fragmentation ratio and known autosome, sentence Whether the chromosome to be measured that breaks is aneuploid；

Wherein, the DNA fragmentation ratio be according to the DNA fragmentation number under length of interval, sample under length of interval All autosomal DNA fragmentation number summations and the length computation of chromosome are drawn.

Further, on the chromosome length of interval DNA fragmentation ratio, the calculation formula that it is used is as follows：

Wherein, i is expressed as chromosome numbers；J is expressed as length of interval numbering；ratio_ijIt is expressed as on No. i-th chromosome DNA fragmentation ratio under j-th of length of interval；reads_n_ijIt is expressed as the DNA under j-th of length of interval on No. i-th chromosome Segment number；reads_n_jIt is expressed as all autosomal DNA fragmentation number summations of the sample under j-th of length of interval； chr_len_iIt is expressed as the length of No. i-th chromosome.

Further, it is described to be existed according to DNA fragmentation ratio of the chromosome different length to be measured under interval with known autosome Whether the difference of DNA fragmentation ratio between the two under different length is interval, it is this step of aneuploid to judge chromosome to be measured Suddenly, it is specifically included：

Judge that DNA fragmentation ratio and known autosome under chromosome different length interval to be measured are interval in different length Under DNA fragmentation ratio difference between the two whether in coincidence statistics meaning significant difference standard, if so, then judging to treat Survey chromosome is aneuploid, conversely, then judging chromosome to be measured not for aneuploid.

Further, the length of the chromosome refers to that chromosome filters out the length behind centromere, telomere and sat-zone.

Further, the division for reading long interval is realized using sliding window method.

Another technical scheme of the present invention is：A kind of sequence data processing unit for embryo chromosome, bag Processor is included, various instructions are adapted for carrying out, the instruction is suitable to be loaded by processor and perform following steps：

Obtain the DNA obtained after high-flux sequence and read long segment；

The DNA of acquisition is read into long segment to be compared with human genome standard sequence, each DNA is read into long segment comparison and arrived Chromosome relevant position, so as to obtain chromosome, initiation site and the sequence length corresponding to each DNA readings long segment, Yi Jiwei One matches sequence completely；

According to the reading long segment distribution situation of unique sequence of matching completely, different reading length intervals are divided, every dye is calculated The DNA fragmentation ratio of each length of interval on colour solid, according to the DNA fragmentation ratio under chromosome different length to be measured interval and Know the difference of DNA fragmentation ratio of the autosome in the case where different length is interval between the two, whether judge chromosome to be measured is non- Euploid；

The beneficial effects of the invention are as follows：By the way that apparatus of the present invention are applied into comparative approach in itself traditional sample, come real During existing embryo chromosome numerical abnormality, not only accuracy rate is high, and the ginseng that the present apparatus need not be built using normal negative sample Examine collection and be used as reference, it is to avoid reference subset, which closes comparative approach and in reference subset and sample to be tested there are severe deviations, to be caused False positive and false negative.Meanwhile, apparatus of the present invention introduce the reading long message of each chromosome, make the judgement to chromosome abnormality The numerical value change of sequence ratio (copy ratio) is not merely depended on, and also needs to investigate copy ratio in different reading length Whether the changing features under (reads length) ratio are reasonable, more accurate with the presence or absence of abnormal judgement to chromosome, can To reduce false positive rate and false positive rate simultaneously.

Brief description of the drawings

Fig. 1 is the analysis process figure that embryo chromosome aneuploid judgement is carried out based on high-flux sequence data；

Fig. 2 is the P value index number distribution maps of each chromosome after each chromosome Multiple range test of amniocyte T2 samples；

Fig. 3 is the P value tables of each chromosome Multiple range test of amniocyte T2 samples；

Fig. 4 is the P value index numbers point of each chromosome after each chromosome Multiple range test of the unicellular amplified production T4 samples of blastomere Butut；

Fig. 5 is the P value tables of each chromosome Multiple range test of the unicellular amplified production T4 samples of blastomere.

Embodiment

The present invention thought be：In itself sample on the basis of comparative approach, the length information of calling sequence utilizes sequence The length of row is classified to the copy ratio values of chromosome, meanwhile, the present invention is removed when judging chromosome with the presence or absence of exception The change of consideration sequence ratio (reads ratio), it is also contemplated that different sequence ratios for reading long (reads length) Whether numerical value is reasonable, therefore the testing result drawn by using apparatus of the present invention is more accurately and reliably, and can reduce simultaneously False positive rate and false negative rate.It can be seen that, the present invention is not only applicable to the chromosome abnormality detection of apoblema and embryonic tissue, It is a general detection means suitable for examination before the Embryonic limb bud cell based on unicellular amplification.

Apparatus of the present invention are described in detail below in conjunction with specific embodiment.

Embodiment 1

A kind of sequence data processing unit for embryo chromosome, is specifically included：

Sequencing data acquiring unit, the DNA obtained for obtaining after high-flux sequence reads long segment；Wherein, the DNA Read long segment and refer to the information such as the DNA information that sequencing is obtained, including DNA base sequence and length；

Wherein, the DNA acquired reads long segment, and it is to the unicellular amplified production of blastomere or abortion tissue or amniotic fluid DNA in cell carries out DNA obtained from after high-flux sequence and reads long segment；

Sequencing data processing unit, is carried out for the DNA of acquisition to be read into long segment with human genome standard sequence hg19 Compare, each DNA is read into long segment compared to arrive chromosome relevant position, thus obtain each DNA read chromosome corresponding to long segment, Specific initiation site and sequence length；Meanwhile, read long segment in DNA and process is compared in human genome standard sequence hg19 In, by rejecting the nucleotide sequence in tandem sequence repeats position and transposons repeatable position, and low-quality, many matchings and Non-fully match after the nucleotide sequence on chromosome, obtain unique sequences, i.e., unique sequence of matching completely；

Data result analytic unit, for the reading long segment distribution situation according to unique sequences, divides different readings long Interval, different reading length interval is interval for different length；

The DNA fragmentation ratio of each length of interval on every chromosome is calculated using sliding window method, then the DNA to calculating Fragment ratio carries out GC corrections, by compare the chromosome different length to be measured after correction it is interval under DNA fragmentation ratio and its Whether he the difference of DNA fragmentation ratio of the known autosome in the case where different length is interval has conspicuousness, so as to judge to be measured Whether chromosome is aneuploid；

Preferably, the use sliding window method calculates this step of the DNA fragmentation ratio of each length of interval on every chromosome Suddenly, it is specifically included：

Using sliding window method, according to default length gradient and step (step-length), difference is respectively divided in DNA readings long segment Length of interval, specifically, using 10bp as length gradient (window), using 10bp as step (step-length), obtain different length Fragment interval is：[100,110), [110,120), [120,130) ... ..., [210,220), [220,230)；

Then, in order to different in view of length between chromosome, chromosome is introduced in DNA fragmentation ratio calculation formula long Variable is spent, the linear module unification of reads ratio between coloured differently body is realized, i.e. length of interval on the chromosome DNA fragmentation ratio, the first calculation formula that it is used is as follows：

Wherein, i is expressed as chromosome numbers；J is expressed as length of interval numbering；ratio_ijIt is expressed as on No. i-th chromosome DNA fragmentation ratio under j-th of length of interval；reads_n_ijIt is expressed as the DNA under j-th of length of interval on No. i-th chromosome Segment number；reads_n_jIt is expressed as all autosomal DNA fragmentation number summations of the sample under j-th of length of interval； chr_len_iIt is expressed as the length of No. i-th chromosome；

Wherein, above-mentioned is the reading long segment after being corrected based on GC through counting the DNA fragmentation number under the length of interval drawn Distribution situation carries out what statistics was drawn；

Preferably, it is described by compare the chromosome different length to be measured after correction it is interval under DNA fragmentation ratio and its Whether he the difference of DNA fragmentation ratio of the known autosome in the case where different length is interval has conspicuousness, so as to judge to be measured The step for whether chromosome is aneuploid, it is specifically included：

Judge the DNA fragmentation ratio under chromosome different length interval to be measured with other known autosomes in different length DNA fragmentation ratio under interval, difference between the two whether in coincidence statistics meaning significant difference standard, specifically, Judge different length under unit chromosome length it is interval in DNA read long segment ratio it is whether statistically significant on significance difference It is different, if so, chromosome to be measured is then judged for aneuploid, conversely, then judging chromosome to be measured not for aneuploid.

Above-mentioned sequencing data acquiring unit, sequencing data processing unit and data result analytic unit can be program module, Or hardware device module.

Embodiment 2

A kind of sequence data processing unit for embryo chromosome, including processor, are adapted for carrying out various instructions, described Instruction is suitable to be loaded by processor and perform following steps：

The DNA that S101, acquisition are obtained after high-flux sequence reads long segment, wherein, the DNA reads long segment and refers to surveying The DNA information that sequence is obtained, including the information such as DNA base sequence and length；

S102, the DNA of acquisition is read into long segment be compared with human genome standard sequence hg19, each DNA is read into length Fragment, which is compared, arrives chromosome relevant position, so as to obtain chromosome, specific initiation site and the sequence corresponding to each DNA readings long segment Row length；Meanwhile, during DNA reads long segment and human genome standard sequence hg19 is compared, it is in by rejecting Tandem sequence repeats position and the nucleotide sequence of transposons repeatable position, and low-quality, many matchings and non-fully match dye After nucleotide sequence on colour solid, unique sequences are obtained, i.e., unique sequence of matching completely；

S103, the reading long segment distribution situation according to unique sequences, divide different reading length intervals, different Du Chang areas Between correspondence different length it is interval；Count coloured differently body different length it is interval under DNA fragmentation number, when chromosome to be measured not Other known autosomal DNA fragmentation numbers with the DNA fragmentation number under length of interval and under corresponding length of interval, both it Between numerical difference when meeting conspicuousness condition, i.e., the DNA fragmentation number under chromosome different length to be measured is interval be significantly more than or During less than corresponding to other autosomal DNA fragmentation numbers under length of interval, then judge the chromosome to be measured for aneuploid；

Preferably, it is described count coloured differently body different length it is interval under DNA fragmentation number the step for before Provided with aligning step, the aligning step is：Reading long segment distribution situation to unique sequences carries out GC corrections；Namely Say,

DNA fragmentation number under coloured differently body different length is interval is the DNA fragmentation distribution situation after being corrected based on GC Counted；

S104, the DNA fragmentation ratio for calculating using sliding window method each length of interval on every chromosome, then to calculating DNA fragmentation ratio carry out GC corrections, by compare the chromosome different length to be measured after correction it is interval under DNA fragmentation ratio Whether there is conspicuousness with the difference of DNA fragmentation ratio of other known autosomes in the case where different length is interval, so as to judge Whether chromosome to be measured is aneuploid；

Embodiment 3

A kind of above-mentioned sequence data processing unit for embryo chromosome is applied in the inspection of embryo chromosome aneuploid In survey technology, it specifically detects that achievement unit point includes following six part, and it is as shown in Figure 1 to implement process step.

Part I, samples sources：2 samples come from amniocyte, and its karyotyping result is respectively 46, XN and 47, XN,+16；Blastomere unicellular amplified production of 2 samples from embryo's spilting of an egg period, its array-CGH Microarray results Respectively 46, XN and 47, XN ,+9.

Part II, sequencing data are compared and Quality Control

Sequencing data is compared with human genome standard sequence hg19, determines sequence dna fragment on chromosome Accurate location.In order to ensure the quality of sequencing result and avoid the interference of some repetitive sequences, low-quality sequence is rejected, and The base for being pointed to genome tandem sequence repeats and swivel base repeat region is filtered, the final DNA fragmentation for obtaining unique match, i.e., Unique sequences.

Part III, GC corrections

Influenceed to eliminate G/C content interior DNA fragmentation number interval on coloured differently body different length, count different GC and contain DNA fragmentation number under amount group, and it is corrected using median.

Part IV, the DNA fragmentation ratio for calculating each each length of interval of chromosome in sample to be tested

Using 10bp as length gradient (window) in a, embodiment, using 10bp as step (step-length), different length is obtained Fragment interval is：[100,110), [110,120), [120,130) ... ..., [210,220), [220,230)；

DNA fragmentation sum after each length of interval is corrected through GC in b, statistical sample；

DNA fragmentation number after each each length of interval of chromosome is corrected through GC in c, statistical sample；

D, according to above-mentioned first calculation formula, calculate the DNA fragmentation ratio of each each length of interval of chromosome in sample to be tested. As a result as shown in table 1-4, wherein i is No. i-th chromosome, and j is jth group length of interval.

The corresponding DNA fragmentation ratio of each each length of interval of autosome in the amniocyte sample T1 of table 1

The corresponding DNA fragmentation ratio of each each length of interval of autosome in the amniocyte sample T2 of table 2

The corresponding DNA fragmentation ratio of each each length of interval of autosome in the unicellular amplified production sample T3 of the blastomere of table 3

The corresponding DNA fragmentation ratio of each each length of interval of autosome in the unicellular amplified production sample T4 of the blastomere of table 4

Part V, the variance analysis (two-way that two-way classification is carried out to the DNA fragmentation ratio after correction classification ANOVA)

A, two factors：Factor 1：DNA fragmentation reads long interval, factor 2：Chromosome, does not consider reciprocation.According to P values And significance, judge that the lower DNA fragmentation ratio in each chromosome different length interval has indifference；

B, consideration two factors of DNA fragmentation length and chromosome, the variance analysis of two-way classification is carried out to DNA fragmentation ratio (assuming that H₀：22 euchromatic dna fragment ratio population mean is all equal, i.e., do not consider in the case of sex chromosome, the sample is Negative sample；H₁：22 euchromatic dna fragment ratio population mean is not all equal, i.e., the sample is positive sample, is existed non- Euploid chromosomal)；

C, the results of analysis of variance interpretation：Read long interval for 1-DNA fragmentation of factor, if P values (variance test result pair The probable value answered) be less than the level of signifiance 0.05, illustrate coloured differently body different length interval under DNA fragmentation ratio difference by To the factor influence, therefore the sample result be it is insecure (because different DNA fragmentation length produce be random by digestion Fragmentation is produced, and DNA fragmentation length and DNA fragmentation ratio are not in contact with)；If P values are more than 0.05, illustrate the sample knot Fruit is rational, and further the result of factor 2 can be analyzed；For 2-chromosome of factor, if P values are more than 0.05, say DNA fragmentation ratio between bright coloured differently body is not significantly different, and 22 autosomes are all euploid, therefore be can determine whether as just Normal sample (do not consider sex chromosome in the case of)；If P values are less than 0.05, DNA fragmentation is in the presence of aobvious between illustrating coloured differently body Write in difference, 22 autosomes and there is aneuploid chromosome, therefore next need to carry out the multiple ratio of a plurality of interchromosomal Compared with so that it is determined which bar chromosome is aneuploid.

D, according to the results of analysis of variance, calculate P values.As a result as shown in table 5 (P₁：Different DNA fragmentations read long interval factor； P₂：Chromosome factor).

The P value results of the variance analysis of table 5

Note：T1 and T2 is amniocyte；T3 and T4 is the unicellular amplified production of blastomere.

According to above-mentioned table 5, judge as follows：

1) for T1, P₁And P₂Both greater than 0.05, therefore deducibility is normal sample；Similarly, T3 is inferred to for normal sample.

2) for T2, P₁More than 0.05, and P₂Less than 0.05, then it is assumed that the sample has aneuploid chromosome, therefore judges

For positive sample；Similarly, it is also positive sample to be inferred to T4.

Part VI, each interchromosomal DNA fragmentation average to exceptional sample carry out Multiple range test

Because variance analysis can only judge that the sample whether there is aneuploid chromosome, without specifically any bar can determine that It is abnormal, therefore, the Multiple range test for being determined as that abnormal sample carries out average to variance analysis is examined using multiple t.I.e. to every For the population mean of the DNA fragmentation ratio of chromosome, respectively with the population mean of the DNA fragmentation ratio of other 21 chromosomes Otherness comparison is carried out, method is examined using the t of two Normal Means.Can the criminal of increase I due to repeatedly using t inspections The probability of class mistake (this indiscriminate two population mean is judged to difference), so that the conclusion of " there were significant differences " is not It is certain reliable.Therefore, P values are adjusted using Bonferroni methods.

Multiple range test analysis is carried out to above-mentioned two exceptional samples (T2 and T4), P value results are as shown in Figure 3.

For T2 samples, No. 16 chromosomes and other are can be seen that from the distribution map of the P value index numbers of Fig. 2 variance analysis Chromosome has the P values of Multiple range test in notable difference, Fig. 3 it can also be seen that being all between No. 16 chromosomes and other chromosomes Existing significant difference (P values are less than 0.05), but other chromosomes are each other without significant difference.And No. 16 chromosomes are not It is 5.627 with the average dna fragment ratio under length of interval, average dna fragment of other chromosomes in the case where different length is interval Ratio is between 3.7~3.8, it is therefore contemplated that many No. 16 chromosomes, therefore T2 samples caryogram is judged for 47, XN ,+16 (with core Type analysis result is consistent).

Similarly, for T4 samples, from the distribution map of the P value index numbers of Fig. 4 variance analysis can be seen that No. 9 chromosomes with Other chromosomes have the P values of Multiple range test in notable difference, Fig. 5 it can also be seen that between No. 9 chromosomes and other chromosomes all Significant difference (P values are less than 0.05) is presented, but other chromosomes are each other without significant difference.And No. 9 chromosomes exist Average dna fragment ratio under different length is interval is 5.915, average dna piece of other chromosomes in the case where different length is interval Section ratio is all 3.75 or so, it is therefore contemplated that many No. 9 chromosomes, therefore T4 samples caryogram is judged for 47, XN ,+9 (with Array-CGH analysis results are consistent).

Above is the preferable implementation to the present invention is illustrated, but the invention is not limited to the implementation Example, those skilled in the art can also make a variety of equivalent variations or replace on the premise of without prejudice to spirit of the invention Change, these equivalent deformations or replacement are all contained in the application claim limited range.

Claims

1. a kind of sequence data processing unit for embryo chromosome, it is characterised in that：The device includes：

Sequencing data processing unit, is compared for the DNA of acquisition to be read into long segment with human genome standard sequence, will be each DNA reads long segment and compared to chromosome relevant position, so as to obtain the chromosome corresponding to each DNA readings long segment, initiation site And sequence length, and unique sequence of matching completely；

Data result analytic unit, for the reading long segment distribution situation according to unique sequence of matching completely, divides different readings It is long interval, the DNA fragmentation ratio of each length of interval on every chromosome is calculated, according under chromosome different length to be measured interval The difference of DNA fragmentation ratio between the two in the case where different length is interval of DNA fragmentation ratio and known autosome, judge to treat Survey whether chromosome is aneuploid；

Wherein, the DNA fragmentation ratio is all under length of interval according to the DNA fragmentation number under length of interval, sample Autosomal DNA fragmentation number summation and the length computation of chromosome are drawn.

2. a kind of sequence data processing unit for embryo chromosome according to claim 1, it is characterised in that：The dye The DNA fragmentation ratio of length of interval on colour solid, the calculation formula that it is used is as follows：

<mrow> <msub> <mi>ratio</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>=</mo> <mfrac> <mrow> <mi>r</mi> <mi>e</mi> <mi>a</mi> <mi>d</mi> <mi>s</mi> <mo>_</mo> <msub> <mi>n</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> </mrow> <mrow> <mi>r</mi> <mi>e</mi> <mi>a</mi> <mi>d</mi> <mi>s</mi> <mo>_</mo> <msub> <mi>n</mi> <mi>j</mi> </msub> </mrow> </mfrac> <mo>&CenterDot;</mo> <mfrac> <mn>1</mn> <mrow> <mi>c</mi> <mi>h</mi> <mi>r</mi> <mo>_</mo> <msub> <mi>len</mi> <mi>i</mi> </msub> </mrow> </mfrac> </mrow>

Wherein, i is expressed as chromosome numbers；J is expressed as length of interval numbering；ratio_ijIt is expressed as on No. i-th chromosome j-th DNA fragmentation ratio under length of interval；reads_n_ijIt is expressed as the DNA fragmentation number under j-th of length of interval on No. i-th chromosome Mesh；reads_n_jIt is expressed as all autosomal DNA fragmentation number summations of the sample under j-th of length of interval；chr_len_iTable It is shown as the length of No. i-th chromosome.

3. a kind of sequence data processing unit for embryo chromosome according to claim 1 or claim 2, it is characterised in that：Institute State according to the DNA fragmentation ratio and known autosome under chromosome different length to be measured interval in the case where different length is interval The difference of DNA fragmentation ratio between the two, the step for whether chromosome to be measured is aneuploid is judged, it is specifically included：

Judge the DNA fragmentation ratio under chromosome different length interval to be measured with known autosome in the case where different length is interval The difference of DNA fragmentation ratio between the two whether in coincidence statistics meaning significant difference standard, if so, then judging dye to be measured Colour solid is aneuploid, conversely, then judging chromosome to be measured not for aneuploid.

4. a kind of sequence data processing unit for embryo chromosome according to claim 1 or claim 2, it is characterised in that：Institute The length for stating chromosome refers to that chromosome filters out the length behind centromere, telomere and sat-zone.

5. a kind of sequence data processing unit for embryo chromosome according to claim 1 or claim 2, it is characterised in that：Institute The long interval division of reading is stated to realize using sliding window method.

6. a kind of sequence data processing unit for embryo chromosome, including processor, are adapted for carrying out various instructions, its feature It is：The instruction is suitable to be loaded by processor and perform following steps：

Obtain the DNA obtained after high-flux sequence and read long segment；

The DNA of acquisition is read into long segment to be compared with human genome standard sequence, each DNA is read into long segment compares to dyeing Body relevant position, so that chromosome, initiation site and the sequence length corresponding to each DNA readings long segment are obtained, and it is unique complete Full matching sequence；

According to the reading long segment distribution situation of unique sequence of matching completely, different reading length intervals are divided, every chromosome is calculated The DNA fragmentation ratio of upper each length of interval, according to the lower DNA fragmentation ratio in chromosome different length to be measured interval with it is known normal Whether the difference of DNA fragmentation ratio of the chromosome in the case where different length is interval between the two, it is non-multiple to judge chromosome to be measured Body；

7. a kind of sequence data processing unit for embryo chromosome according to claim 6, it is characterised in that：The dye The DNA fragmentation ratio of length of interval on colour solid, the calculation formula that it is used is as follows：

8. a kind of sequence data processing unit for embryo chromosome according to claim 6 or 7, it is characterised in that：Institute State according to the DNA fragmentation ratio and known autosome under chromosome different length to be measured interval in the case where different length is interval The difference of DNA fragmentation ratio between the two, the step for whether chromosome to be measured is aneuploid is judged, it is specifically included：

9. a kind of sequence data processing unit for embryo chromosome according to claim 6 or 7, it is characterised in that：Institute The length for stating chromosome refers to that chromosome filters out the length behind centromere, telomere and sat-zone.

10. a kind of sequence data processing unit for embryo chromosome according to claim 6 or 7, it is characterised in that：Institute The long interval division of reading is stated to realize using sliding window method.