CN103824001A

CN103824001A - Method and device for detecting chromosome

Info

Publication number: CN103824001A
Application number: CN201410069562.1A
Authority: CN
Inventors: 阮航; 潘凯; 王海龙; 李瑞强
Original assignee: Nuo Hezhi Source Beijing Bioinformation Science And Technology Ltd
Current assignee: Nuo Hezhi Source Beijing Bioinformation Science And Technology Ltd
Priority date: 2014-02-27
Filing date: 2014-02-27
Publication date: 2014-05-28

Abstract

The invention discloses a method and device for detecting a chromosome. The method for detecting the chromosome comprises a receiving step, a dividing step, a comparing step and a determining step. In the receiving step, a reference sequence and a plurality of sequencing sequences are received; in the dividing step, each sequencing sequence is divided into a plurality of sequencing subsequences with the fixed length being an n2 basic group by using an n1 basic group as the step length, and the reference sequence is divided into a plurality of reference subsequences with the fixed length being an n2 basic group by using an n3 basic group as the step length; in the comparing step, the sequencing subsequences of each sequencing sequence are compared with the reference subsequences of the reference sequence, so that a target sequencing sequence is determined; in the determining step, according to the corresponding relation of a plurality of sequencing subsequences of the target sequencing sequence and the reference subsequences of the reference sequence, the name of the chromosome corresponding to the target sequencing sequence is determined. By means of the method and device for detecting the chromosome, the problem that the detecting speed of the chromosome is low in the prior art is solved, and therefore the effects for lowering the time spending and increasing the detecting speed are achieved.

Description

Chromosomal detection method and device

Technical field

The present invention relates to genetic engineering field, in particular to the chromosomal detection method of one and device.

Background technology

For the fetus dissociative DNA existing in maternal peripheral blood slurry, the most general detection method is based on second generation high throughput sequencing technologies at present, first to pregnant woman's peripheral blood that takes a morsel, extract the dissociative DNA in peripheral blood, then be generally that storehouse is built in multisample mixing, single-ended 50 base sequence length order-checkings, sequencing data is carried out to data separating to each sample according to different sample labeling (barcode), to after each sample data Quality Control, compare by third-party short sequence alignment software (as soapAligner or bwa) and people's reference genome again, draw the overburden depth of sequencing data at target chromosome, namely chromosome dosage (chromosome dosage), then be positive sample or negative sample by chromosome dosage judgement sample.

Because the sequencing data amount that obtains based on second generation high throughput sequencing technologies is very huge, for the abnormal technology of above-mentioned detection fetal chromosomal times type, each sample need to be measured the data volume of about 300M base.In the increasing situation of sample number, the rate request of the analysis of biological information method to this detection technique will be day by day obvious.But in existing detection mode, after original sequencing data output, need to pass through successively sample separation, comparing and comparing interpretation of result statistics just can obtain testing result, wherein, not only each step need to adopt relatively independent software to process, and each step also needs the Output rusults that reads previous step as input, these input and output have repeatedly expended a large amount of extra times, and the I/O(input and output of consumption systems greatly) performance and disk space.Meanwhile, determine chromosome for comparing, need to be undertaken by complicated account form, the processing speed of system is very slow, further causes length, inefficiency of sense cycle.

The slow problem detecting for chromosome in correlation technique, not yet proposes effective solution at present.

Summary of the invention

Fundamental purpose of the present invention is to provide a kind of chromosomal detection method and device, to solve the slow problem that in prior art, chromosome detects.

To achieve these goals, according to an aspect of the present invention, provide a kind of chromosomal detection method.

Chromosomal detection method according to the present invention comprises: receiving step: receive reference sequences and multiple sequencing sequence; Segmentation step: take n1 base as step-length, each sequencing sequence is all divided into the multiple order-checking subsequences of fixed length as n2 base, and take n3 base as step-length with reference to sequences segmentation multiple with reference to subsequence as fixed length as n2 base, wherein, n1, n2 and n3 are positive integer, and n1≤n3; Comparison step: multiple order-checking subsequences of each sequencing sequence are compared with reference to subsequence with the multiple of reference sequences, determine target sequencing sequence, wherein, multiple order-checking subsequences of target sequencing sequence are all included in the multiple with reference in subsequence of reference sequences; And determining step: according to multiple order-checking subsequences of target sequencing sequence and multiple corresponding relations with reference to subsequence of reference sequences, determine the chromosome title that target sequencing sequence is corresponding, wherein, the each of reference sequences and reference sequences all has corresponding chromosome title with reference to subsequence.

Further, multiple sequencing sequences are the sequencing sequence from multiple samples to be detected, and each sequencing sequence all has sample identification, and detection method also comprises: carry out segmentation step, comparison step and determining step in multi-process mode.

Further, the quantity of target sequencing sequence is multiple, multiple order-checking subsequences of each sequencing sequence are being compared with reference to subsequence with the multiple of reference sequences, after determining target sequencing sequence, and according to the multiple order-checking subsequences of target sequencing sequence and multiple corresponding relations with reference to subsequence of reference sequences, before determining the chromosome title that target sequencing sequence is corresponding, detection method also comprises: search and each target sequencing sequence C _ieach order-checking subsequence C _ijhave the reference subsequence of identical base, wherein, i gets 1 successively to i _max, j gets j successively _max, i _maxfor the quantity of target sequencing sequence; j _maxfor target sequencing sequence C _ithe quantity of order-checking subsequence; Determine and order-checking subsequence C _ijthe chromosome name with the reference subsequence of identical base is called order-checking subsequence C _ijchromosome title; Judge the first sequencing sequence C in multiple target sequencing sequences _{i '}each order-checking subsequence C _{i ' j}chromosome title whether all identical, wherein, i ' ∈ (1, i _max); And at the first sequencing sequence C judging in multiple target sequencing sequences _{i '}each order-checking subsequence C _{i ' j}the uneven identical situation of chromosome title under, from multiple target sequencing sequences, filter out the first sequencing sequence C _{i '}.

Further, determine in the following manner chromosome title corresponding to target sequencing sequence: search and the second sequencing sequence C _{i ' '}arbitrary order-checking subsequence C _{i ' ' j '}there is the reference subsequence of identical base, wherein, the second sequencing sequence C _{i ' '}for filtering out the first sequencing sequence C _{i '}arbitrary target sequencing sequence, i ' ' ∈ (1, i _max), j ' ∈ (1, j ' _max), j ' _maxbe the second sequencing sequence C _{i ' '}the quantity of order-checking subsequence, and i ' ' ≠ i '; And determine and order-checking subsequence C _{i ' ' j '}the chromosome name with the reference subsequence of identical base is called the second sequencing sequence C _{i ' '}chromosome title.

Further, take n3 base as step-length with reference to sequences segmentation multiple as fixed length as n2 base with reference to subsequence after, detection method also comprises: be stored in shared drive with reference to subsequence with reference to the multiple of sequence, wherein, multiple order-checking subsequences of each sequencing sequence are compared with reference to subsequence with the multiple of reference sequences in shared drive, determine target sequencing sequence, and according to multiple corresponding relations with reference to subsequence of the reference sequences in multiple order-checking subsequences of target sequencing sequence and shared drive, determine the chromosome title that target sequencing sequence is corresponding.

Further, take n3 base as step-length with reference to sequences segmentation multiple as fixed length as n2 base with reference to subsequence after, detection method also comprises: delete the multiple with reference to the repeat sequence in subsequence and/or the first subsequence of reference sequences, wherein, the first subsequence is the subsequence that comprises N base, wherein, multiple order-checking subsequences of each sequencing sequence are compared with reference to subsequence with the multiple of reference sequences, determining target sequencing sequence comprises: multiple order-checking subsequences of each sequencing sequence are compared with reference to subsequence with the target of reference sequences, determine target sequencing sequence, wherein, target is to delete multiple with reference to subsequence after repeat sequence and/or the first subsequence with reference to subsequence.

According to a further aspect in the invention, provide a kind of chromosomal pick-up unit, this pick-up unit is mainly used in carrying out any chromosomal detection method that foregoing of the present invention provides.

According to a further aspect in the invention, provide a kind of chromosomal pick-up unit, having comprised: receiving element, for receiving reference sequences and multiple sequencing sequence; Cutting unit, for each sequencing sequence being all divided into take n1 base as step-length to the multiple order-checking subsequences of fixed length as n2 base, and take n3 base as step-length with reference to sequences segmentation multiple with reference to subsequence, wherein as fixed length as n2 base, n1, n2 and n3 are positive integer, and n1≤n3; Comparing unit, for multiple order-checking subsequences of each sequencing sequence are compared with reference to subsequence with the multiple of reference sequences, determine target sequencing sequence, wherein, multiple order-checking subsequences of target sequencing sequence are all included in the multiple with reference in subsequence of reference sequences; And first determining unit, be used for according to multiple order-checking subsequences of target sequencing sequence and multiple corresponding relations with reference to subsequence of reference sequences, determine the chromosome title that target sequencing sequence is corresponding, wherein, the each of reference sequences and reference sequences all has corresponding chromosome title with reference to subsequence.

Further, multiple sequencing sequences are the sequencing sequence from multiple samples to be detected, and each sequencing sequence all has sample identification, and the quantity of cutting unit, comparing unit and the first determining unit is multiple.

Further, the quantity of target sequencing sequence is multiple, and pick-up unit also comprises: search unit, for searching and each target sequencing sequence C _ieach order-checking subsequence C _ijhave the reference subsequence of identical base, wherein, i gets 1 successively to i _max, j gets j successively _max, i _maxfor the quantity of target sequencing sequence; j _maxfor target sequencing sequence C _ithe quantity of order-checking subsequence; The second determining unit, for determining and order-checking subsequence C _ijthe chromosome name with the reference subsequence of identical base is called order-checking subsequence C _ijchromosome title; Judging unit, for judging the first sequencing sequence C of multiple target sequencing sequences _{i '}each order-checking subsequence C _{i ' j}chromosome title whether all identical, wherein, i ' ∈ (1, i _max); And filter element, for judging the first sequencing sequence C of multiple target sequencing sequences _{i '}each order-checking subsequence C _{i ' j}the uneven identical situation of chromosome title under, from multiple target sequencing sequences, filter out the first sequencing sequence C _{i '}.

Further, the first determining unit comprises: search module, for searching and the second sequencing sequence C _{i ' '}arbitrary order-checking subsequence C _{i ' ' j '}there is the reference subsequence of identical base, wherein, the second sequencing sequence C _{i ' '}for filtering out the first sequencing sequence C _{i '}arbitrary target sequencing sequence, i ' ' ∈ (1, i _max), j ' ∈ (1, j ' _max), j ' _maxbe the second sequencing sequence C _{i ' '}the quantity of order-checking subsequence, and i ' ' ≠ i '; And determination module, for determining and order-checking subsequence C _{i ' ' j '}the chromosome name with the reference subsequence of identical base is called the second sequencing sequence C _{i ' '}chromosome title.

Further, pick-up unit also comprises: storage unit, for being stored to shared drive with reference to the multiple of sequence with reference to subsequence, wherein, comparing unit is for comparing with the multiple of reference sequences of shared drive multiple order-checking subsequences of each sequencing sequence with reference to subsequence, determine target sequencing sequence, the first determining unit, for according to multiple corresponding relations with reference to subsequence of the reference sequences of multiple order-checking subsequences of target sequencing sequence and shared drive, is determined the chromosome title that target sequencing sequence is corresponding.

Further, pick-up unit also comprises: delete cells, for multiple repeat sequences with reference to subsequence and/or first subsequence of deleting reference sequences, wherein, the first subsequence is the subsequence that comprises N base, wherein, comparing unit is for comparing with the target of reference sequences multiple order-checking subsequences of each sequencing sequence with reference to subsequence, determine target sequencing sequence, wherein, target is to delete multiple with reference to subsequence after repeat sequence and/or the first subsequence with reference to subsequence.

The present invention adopts receiving step: receive reference sequences and multiple sequencing sequence; Segmentation step: take n1 base as step-length, each sequencing sequence is all divided into the multiple order-checking subsequences of fixed length as n2 base, and take n3 base as step-length with reference to sequences segmentation multiple with reference to subsequence as fixed length as n2 base, wherein, n1, n2 and n3 are positive integer, and n1≤n3; Comparison step: multiple order-checking subsequences of each sequencing sequence are compared with reference to subsequence with the multiple of reference sequences, determine target sequencing sequence, wherein, multiple order-checking subsequences of target sequencing sequence are all included in the multiple with reference in subsequence of reference sequences; And determining step: according to multiple order-checking subsequences of target sequencing sequence and multiple corresponding relations with reference to subsequence of reference sequences, determine the chromosome title that target sequencing sequence is corresponding, wherein, the each of reference sequences and reference sequences all has corresponding chromosome title with reference to subsequence.By sequencing sequence and reference sequences are cut apart, and then utilize the order-checking subsequence after cutting apart and contrast detection with reference to subsequence, relatively in prior art, need tediously long sequence to contrast the mode detecting, reduce the sequence reduced time, realize raising sequence alignment speed, and then realized and improved chromosomal detection speed, solve the slow problem that in prior art, chromosome detects, and then reached the spending of minimizing time, improve the effect of detection speed.

Accompanying drawing explanation

The accompanying drawing that forms the application's a part is used to provide a further understanding of the present invention, and schematic description and description of the present invention is used for explaining the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:

Fig. 1 is according to the process flow diagram of the chromosomal detection method of the embodiment of the present invention; And

Fig. 2 is according to the schematic diagram of the chromosomal pick-up unit of the embodiment of the present invention.

Embodiment

It should be noted that, in the situation that not conflicting, the feature in embodiment and embodiment in the application can combine mutually.Describe below with reference to the accompanying drawings and in conjunction with the embodiments the present invention in detail.

The embodiment of the present invention provides a kind of chromosomal detection method, below chromosomal detection method that the embodiment of the present invention is provided do concrete introduction:

Fig. 1 is according to the process flow diagram of the chromosomal detection method of the embodiment of the present invention, and as shown in Figure 1, this detection method mainly comprises the steps that S102 is to step S108:

S102(receiving step): receive reference sequences and multiple sequencing sequence, wherein, and the reference genome sequence that reference sequences is behaved, the genome sequence that sequencing sequence is sample to be detected, is generally single-ended 50 base sequence length.

S104(segmentation step): take n1 base as step-length, each sequencing sequence is all divided into the multiple order-checking subsequences of fixed length as n2 base, and take n3 base as step-length with reference to sequences segmentation multiple with reference to subsequence as fixed length as n2 base, wherein, n1, n2 and n3 are positive integer, and n1≤n3, preferably, before cutting apart, can first give up head and truncate each sequencing sequence two ends are respectively cut away to a base, become the sequence that length is 48 bases.Due to sequencing sequence, the quality of the base at two ends is lower end to end, so, after being excised, be conducive to improve the quality of sequencing sequence.

S106(comparison step): multiple order-checking subsequences of each sequencing sequence are compared with reference to subsequence with the multiple of reference sequences, determine target sequencing sequence, wherein, multiple order-checking subsequences of target sequencing sequence are all included in the multiple with reference in subsequence of reference sequences, , by multiple order-checking subsequences and the multiple of reference sequences of each sequencing sequence are contrasted with reference to subsequence, determine target sequencing sequence, for arbitrary sequencing sequence, if multiple order-checking subsequences of this sequencing sequence, all can find corresponding identical reference subsequence reference sequences multiple in reference to subsequence, determine that this sequencing sequence is target sequencing sequence.

S108(determining step): according to multiple order-checking subsequences of target sequencing sequence and multiple corresponding relations with reference to subsequence of reference sequences, determine the chromosome title that target sequencing sequence is corresponding, wherein, the each of reference sequences and reference sequences all has corresponding chromosome title with reference to subsequence.

The chromosomal detection method that the embodiment of the present invention provides, by sequencing sequence and reference sequences are cut apart, and then utilize the order-checking subsequence after cutting apart and contrast detection with reference to subsequence, relatively in prior art, need tediously long sequence to contrast the mode detecting, reduce the sequence reduced time, realize raising sequence alignment speed, and then realize and improved chromosomal detection speed, solve the slow problem that in prior art, chromosome detects, and then reached the spending of minimizing time, improve the effect of detection speed.

Wherein, n1 can equal 1, n2 and equal 16, that is, cut apart take fixed step size as 1 base-pair sequencing sequence, thereby the sequencing sequence of 48 base length is divided into the order-checking subsequence of 33 16 base length.For with reference to base, can cut apart take fixed step size as 1 base equally, also can adopt the step-length that is greater than 1 base to cut apart.

Preferably, multiple sequencing sequences are the sequencing sequence from multiple samples to be detected, and each sequencing sequence all has sample identification, the detection method of the embodiment of the present invention also comprises: carry out segmentation step, comparison step and determining step in multi-process mode,, multiple sequencing sequences cut apart simultaneously, compared and determine, thereby determine quickly the chromosome title that sequencing sequence is corresponding, reaching the effect of further raising detection speed.

Further, after determining the chromosome title that target sequencing sequence and target sequencing sequence are corresponding, the detection method of the embodiment of the present invention also comprises the steps S11 to S14:

S11: the quantity of adding up the target sequencing sequence that each chromosome is corresponding,, it is No. 1 chromosome that statistics has the chromosome title of how many target sequencing sequences, the chromosome title that has how many target sequencing sequences is No. 2 chromosomes,, the chromosome title that has how many target sequencing sequences is No. 23 chromosomes.

S12: calculating each chromosomal dosage according to the quantity of target sequencing sequence corresponding to each chromosome, is mainly the quantity of the target sequencing sequence that the each chromosome of homogenization is corresponding, obtains each chromosomal dosage, particularly, can be according to formula

the quantity of the target sequencing sequence that the each chromosome of homogenization is corresponding, obtains each chromosomal dosage, wherein, and x _kfor the quantity of target sequencing sequence corresponding to k chromosome, N is chromosomal total quantity, W _kfor No. k chromosomal dosage.

S13: according to the Z value of each chromosomal Rapid Dose Calculation target chromosome, particularly, can be according to formula

calculate the Z value of target chromosome, wherein, W _orderfor the dosage of target chromosome, m and S are preset value, Z _orderfor the Z value of target chromosome, m can be according to the mean value of experiment or the negative sample target chromosome determined of experience, and S can be the standard deviation of the negative sample target chromosome determined according to experiment or experience.

S14: determine that according to the size of target chromosome Z value target chromosome times type is negative or positive, particularly, if target chromosome Z value is greater than assign thresholds, determine that this target chromosome times type is positive, otherwise negative, wherein, assign thresholds can be determined according to experiment or experience.

Further, each position with reference to the residing chromosome position of subsequence is also known, , in embodiments of the present invention, pre-stored have each with reference to chromosome title corresponding to subsequence and each with reference to the residing chromosome position of subsequence, calculating after chromosome dosage, the detection method of the embodiment of the present invention also comprises: utilization is with reference to the corresponding relation of the metering of the reference subsequence of the residing chromosome position of subsequence and this position, the chromosome metering calculating in this detection method is proofreaied and correct, reach the effect of further raising chromosome accuracy in detection.Wherein, adopt respectively the chromosomal method of GC Preference and internal reference to proofread and correct, concrete bearing calibration can adopt any method of the prior art, repeats no more herein.

Preferably, the quantity of target sequencing sequence is multiple, multiple order-checking subsequences of each sequencing sequence are being compared with reference to subsequence with the multiple of reference sequences, after determining target sequencing sequence, and according to the multiple order-checking subsequences of target sequencing sequence and multiple corresponding relations with reference to subsequence of reference sequences, before determining the chromosome title that target sequencing sequence is corresponding, the detection method of the embodiment of the present invention also comprises the steps that S21 is to step S24:

S21: search and each target sequencing sequence C _ieach order-checking subsequence C _ijhave the reference subsequence of identical base, that is, check and accept out the reference subsequence with identical base and base distributing order, wherein, i gets 1 successively to i _max, j gets j successively _max, i _maxfor the quantity of target sequencing sequence; j _maxfor target sequencing sequence C _ithe quantity of order-checking subsequence.

S22: determine and order-checking subsequence C _ijthe chromosome name with the reference subsequence of identical base is called order-checking subsequence C _ijchromosome title.

S23: judge the first sequencing sequence C in multiple target sequencing sequences _{i '}each order-checking subsequence C _{i ' j}chromosome title whether all identical, wherein, i ' ∈ (1, i _max), that is, judge that whether the chromosome title of each order-checking subsequence of any one sequencing sequence is all identical.

S24: at the first sequencing sequence C judging in multiple target sequencing sequences _{i '}each order-checking subsequence C _{i ' j}the uneven identical situation of chromosome title under, from multiple target sequencing sequences, filter out the first sequencing sequence C _{i '}, that is, judging the chromosome title inequality identical (that is, not being same title) of each order-checking subsequence of certain sequencing sequence, this order-checking subsequence is deleted from target sequencing sequence.

If chromosome name corresponding to a part for certain target sequencing sequence order-checking subsequence is called p chromosome, chromosome name corresponding to another part order-checking subsequence is called q chromosome, to cause subsequent calculations to go out chromosome dosage occurs, by above-mentioned steps S21 to step S24, the chromosome title that will check order in subsequence is located afoul subsequence and is deleted, realize and accurately determined the chromosome title that target sequencing sequence is corresponding, and then guaranteed the accurate calculating of chromosome dosage.

Further, in embodiments of the present invention, can determine in the following manner the chromosome title that target sequencing sequence is corresponding: first, search and the second sequencing sequence C _{i ' '}arbitrary order-checking subsequence C _{i ' ' j '}there is the reference subsequence of identical base, wherein, the second sequencing sequence C _{i ' '}for filtering out the first sequencing sequence C _{i '}arbitrary target sequencing sequence, i ' ' ∈ (1, i _max), j ' ∈ (1, j ' _max), j ' _maxbe the second sequencing sequence C _{i ' '}the quantity of order-checking subsequence, and i ' ' ≠ i '; Then, determine and order-checking subsequence C _{i ' ' j '}the chromosome name with the reference subsequence of identical base is called the second sequencing sequence C _{i ' '}chromosome title.

Preferably, take n3 base as step-length with reference to sequences segmentation multiple as fixed length as n2 base with reference to subsequence after, the detection method of the embodiment of the present invention also comprises: be stored in shared drive with reference to subsequence with reference to the multiple of sequence, wherein, multiple order-checking subsequences of each sequencing sequence are compared with reference to subsequence with the multiple of reference sequences in shared drive, determine target sequencing sequence, and according to multiple corresponding relations with reference to subsequence of the reference sequences in multiple order-checking subsequences of target sequencing sequence and shared drive, determine the chromosome title that target sequencing sequence is corresponding.

By utilizing shared drive stored reference subsequence, and then in shared drive, determine target sequencing sequence and chromosome title corresponding to target sequencing sequence, realize chromosome is carried out in testing process, only need once read reference sequences, get final product the sequencing sequence of repeated detection arbitrary sample quantity, without comparing software and repeatedly carry out the input of reference sequences as calling third party in prior art, further reduce time spending, greatly improve detection speed.

Preferably, take n3 base as step-length with reference to sequences segmentation multiple as fixed length as n2 base with reference to subsequence after, the detection method of the embodiment of the present invention also comprises: delete the multiple with reference to the repeat sequence in subsequence and/or the first subsequence of reference sequences, wherein, the first subsequence is the subsequence that comprises N base, multiple order-checking subsequences of each sequencing sequence are compared with reference to subsequence with the multiple of reference sequences, determine target sequencing sequence: multiple order-checking subsequences of each sequencing sequence are compared with reference to subsequence with the target of reference sequences, determine target sequencing sequence, wherein, target is to delete multiple with reference to subsequence after repeat sequence and/or the first subsequence with reference to subsequence.

Delete by the repeat sequence with reference in subsequence and the subsequence that comprises N base (unknown base), and then based on filter after reference subsequence carry out target sequencing sequence and chromosome title thereof determine, realize the drawback of avoiding as far as possible comparison to make mistakes, reached the effect that improves chromosome accuracy in detection.

The embodiment of the present invention also provides a kind of chromosomal pick-up unit, this pick-up unit is mainly used in carrying out any chromosomal detection method that foregoing of the present invention provides, below chromosomal pick-up unit that the embodiment of the present invention is provided do concrete introduction:

Fig. 2 is according to the schematic diagram of the chromosomal pick-up unit of the embodiment of the present invention, and as shown in Figure 2, this pick-up unit mainly comprises receiving element 10, cutting unit 20, comparing unit 30 and the first determining unit 40, wherein:

Receiving element 10 is for receiving reference sequences and multiple sequencing sequence, wherein, the reference genome sequence that reference sequences is behaved, the genome sequence that sequencing sequence is sample to be detected, is generally single-ended 50 base sequence length.

Cutting unit 20 is for being all divided into each sequencing sequence the multiple order-checking subsequences of fixed length as n2 base take n1 base as step-length, and take n3 base as step-length with reference to sequences segmentation multiple with reference to subsequence as fixed length as n2 base, wherein, n1, n2 and n3 are positive integer, and n1≤n3, preferably, before cutting apart, can first give up head and truncate each sequencing sequence two ends are respectively cut away to a base, become the sequence that length is 48 bases.Due to sequencing sequence, the quality of the base at two ends is lower end to end, so, after being excised, be conducive to improve the quality of sequencing sequence.

Comparing unit 30 is for comparing with the multiple of reference sequences multiple order-checking subsequences of each sequencing sequence with reference to subsequence, determine target sequencing sequence, wherein, multiple order-checking subsequences of target sequencing sequence are all included in the multiple with reference in subsequence of reference sequences, , by multiple order-checking subsequences and the multiple of reference sequences of each sequencing sequence are contrasted with reference to subsequence, determine target sequencing sequence, for arbitrary sequencing sequence, if multiple order-checking subsequences of this sequencing sequence, all can find corresponding identical reference subsequence reference sequences multiple in reference to subsequence, determine that this sequencing sequence is target sequencing sequence.

The first determining unit 40 is for according to the multiple order-checking subsequences of target sequencing sequence and multiple corresponding relations with reference to subsequence of reference sequences, determine the chromosome title that target sequencing sequence is corresponding, wherein, the each of reference sequences and reference sequences all has corresponding chromosome title with reference to subsequence.

The chromosomal pick-up unit that the embodiment of the present invention provides, by sequencing sequence and reference sequences are cut apart, and then utilize the order-checking subsequence after cutting apart and contrast detection with reference to subsequence, relatively in prior art, need tediously long sequence to contrast the mode detecting, reduce the sequence reduced time, realize raising sequence alignment speed, and then realize and improved chromosomal detection speed, solve the slow problem that in prior art, chromosome detects, and then reached the spending of minimizing time, improve the effect of detection speed.

Preferably, multiple sequencing sequences are the sequencing sequence from multiple samples to be detected, and each sequencing sequence all has sample identification, the quantity of cutting unit, comparing unit and the first determining unit is multiple, carry out the function of cutting unit 20, comparing unit 30 and the first determining unit 40 to adopt multi-process mode,, multiple sequencing sequences are cut apart simultaneously, compared and determine, thereby determine quickly the chromosome title that sequencing sequence is corresponding, reach the effect of further raising detection speed.

Preferably, the quantity of target sequencing sequence is multiple, and the pick-up unit of the embodiment of the present invention also comprises searches unit, the second determining unit, judging unit and filter element, wherein:

Search unit for searching and each target sequencing sequence C _ieach order-checking subsequence C _ijhave the reference subsequence of identical base, that is, check and accept out the reference subsequence with identical base and base distributing order, wherein, i gets 1 successively to i _max, j gets j successively _max, i _maxfor the quantity of target sequencing sequence; j _maxfor target sequencing sequence C _ithe quantity of order-checking subsequence.

The second determining unit is for determining and order-checking subsequence C _ijthe chromosome name with the reference subsequence of identical base is called order-checking subsequence C _ijchromosome title

Judging unit is for judging the first sequencing sequence C of multiple target sequencing sequences _{i '}each order-checking subsequence C _{i ' j}chromosome title whether all identical, wherein, i ' ∈ (1, i _max), that is, judge that whether the chromosome title of each order-checking subsequence of any one sequencing sequence is all identical.

Filter element is for judging the first sequencing sequence C of multiple target sequencing sequences _{i '}each order-checking subsequence C _{i ' j}the uneven identical situation of chromosome title under, from multiple target sequencing sequences, filter out the first sequencing sequence C _{i '}, that is, judging the chromosome title inequality identical (that is, not being same title) of each order-checking subsequence of certain sequencing sequence, this order-checking subsequence is deleted from target sequencing sequence.

If chromosome name corresponding to a part for certain target sequencing sequence order-checking subsequence is called p chromosome, chromosome name corresponding to another part order-checking subsequence is called q chromosome, to cause subsequent calculations to go out chromosome dosage occurs, carry out above-mentioned steps by searching unit, the second determining unit, judging unit and filter element, the chromosome title that will check order in subsequence is located afoul subsequence and is deleted, realize and accurately determined the chromosome title that target sequencing sequence is corresponding, and then guaranteed the accurate calculating of chromosome dosage.

Further, the first determining unit 40 mainly comprises searches module and determination module, searches module for searching and the second sequencing sequence C _{i ' '}arbitrary order-checking subsequence C _{i ' ' j '}there is the reference subsequence of identical base, wherein, the second sequencing sequence C _{i ' '}for filtering out the first sequencing sequence C _{i '}arbitrary target sequencing sequence, i ' ' ∈ (1, i _max), j ' ∈ (1, j ' _max), j ' _maxbe the second sequencing sequence C _{i ' '}the quantity of order-checking subsequence, and i ' ' ≠ i '; Determination module is for determining and order-checking subsequence C _{i ' ' j '}the chromosome name with the reference subsequence of identical base is called the second sequencing sequence C _{i ' '}chromosome title.

Preferably, pick-up unit also comprises storage unit, this storage unit is for being stored to shared drive with reference to the multiple of sequence with reference to subsequence, wherein, comparing unit 30 is for comparing with the multiple of reference sequences of shared drive multiple order-checking subsequences of each sequencing sequence with reference to subsequence, determine target sequencing sequence, the first determining unit 40, for according to multiple corresponding relations with reference to subsequence of the reference sequences of multiple order-checking subsequences of target sequencing sequence and shared drive, is determined the chromosome title that target sequencing sequence is corresponding.

By storage unit is set, be stored in shared drive with reference to subsequence with reference to the multiple of sequence, realize and utilized shared drive stored reference subsequence, and then in shared drive, determine target sequencing sequence and chromosome title corresponding to target sequencing sequence, realize chromosome is carried out in testing process, only need once read reference sequences, get final product the sequencing sequence of repeated detection arbitrary sample quantity, without comparing software and repeatedly carry out the input of reference sequences as calling third party in prior art, further reduce time spending, greatly improve detection speed.

Preferably, pick-up unit also comprises delete cells, take n3 base as step-length with reference to sequences segmentation multiple as fixed length as n2 base with reference to subsequence after, multiple repeat sequences with reference to subsequence and/or first subsequence of delete cells for deleting reference sequences, wherein, the first subsequence is the subsequence that comprises N base, 30 of comparing units are for comparing with the target of reference sequences multiple order-checking subsequences of each sequencing sequence with reference to subsequence, determine target sequencing sequence, wherein, target is to delete multiple with reference to subsequence after repeat sequence and/or the first subsequence with reference to subsequence.

Further, the pick-up unit of the embodiment of the present invention also comprises statistic unit, the first computing unit, the second computing unit and the 3rd determining unit, wherein:

Statistic unit is used for carrying out above-mentioned steps S11,, for the quantity of adding up the target sequencing sequence that each chromosome is corresponding, also be, it is No. 1 chromosome that statistics has the chromosome title of how many target sequencing sequences, the chromosome title that has how many target sequencing sequences is No. 2 chromosomes ..., the chromosome title that has how many target sequencing sequences is No. 23 chromosomes.

The first computing unit is used for carrying out above-mentioned steps S12,, for calculating each chromosomal dosage according to the quantity of target sequencing sequence corresponding to each chromosome, mainly the quantity of the target sequencing sequence that the each chromosome of homogenization is corresponding, obtain each chromosomal dosage, particularly, can be according to formula

The second computing unit is used for carrying out above-mentioned steps S13, that is, and and for according to the Z value of each chromosomal Rapid Dose Calculation target chromosome, particularly, can be according to formula

calculate the Z value of target chromosome, wherein, the dosage that W order is target chromosome, m and S are preset value, Z _orderfor the Z value of target chromosome, m can be according to the mean value of experiment or the negative sample target chromosome determined of experience, and S can be the standard deviation of the negative sample target chromosome determined according to experiment or experience.

The 3rd determining unit is used for carrying out above-mentioned steps S14,, for determining that according to the size of target chromosome Z value target chromosome times type is negative or positive, particularly, if target chromosome Z value is greater than assign thresholds, determine that this target chromosome times type is positive, on the contrary negative, wherein, assign thresholds can be determined according to experiment or experience.

Further, each position with reference to the residing chromosome position of subsequence is also known, , in embodiments of the present invention, pre-stored have each with reference to chromosome title corresponding to subsequence and each with reference to the residing chromosome position of subsequence, the pick-up unit of the embodiment of the present invention also comprises correcting unit, calculating after chromosome dosage, correcting unit is for utilizing the corresponding relation with reference to the metering of the reference subsequence of the residing chromosome position of subsequence and this position, the chromosome metering calculating in this detection method is proofreaied and correct, reach the effect of further raising chromosome accuracy in detection.Wherein, adopt respectively the chromosomal method of GC Preference and internal reference to proofread and correct, concrete bearing calibration can adopt any method of the prior art, repeats no more herein.

As can be seen from the above description, the present invention has realized raising sequence alignment speed, and then has realized and improved chromosomal detection speed, and then has reached the spending of minimizing time, improves the effect of detection speed.

It should be noted that, can in the computer system such as one group of computer executable instructions, carry out in the step shown in the process flow diagram of accompanying drawing, and, although there is shown logical order in flow process, but in some cases, can carry out shown or described step with the order being different from herein.

Obviously, those skilled in the art should be understood that, above-mentioned of the present invention each module or each step can realize with general calculation element, they can concentrate on single calculation element, or be distributed on the network that multiple calculation elements form, alternatively, they can be realized with the executable program code of calculation element, thereby, they can be stored in memory storage and be carried out by calculation element, or they are made into respectively to each integrated circuit modules, or the multiple modules in them or step are made into single integrated circuit module to be realized.Like this, the present invention is not restricted to any specific hardware and software combination.

The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims

1. a chromosomal detection method, is characterized in that, comprising:

Receiving step: receive reference sequences and multiple sequencing sequence;

Segmentation step: take n1 base as step-length, each described sequencing sequence is all divided into the multiple order-checking subsequences of fixed length as n2 base, and take n3 base as step-length, described reference sequences is divided into multiple with reference to subsequence as n2 base of fixed length, wherein, n1, n2 and n3 are positive integer, and n1≤n3;

Comparison step: multiple order-checking subsequences of each described sequencing sequence are compared with reference to subsequence with the multiple of described reference sequences, determine target sequencing sequence, wherein, multiple order-checking subsequences of described target sequencing sequence are all included in the multiple with reference in subsequence of described reference sequences; And

Determining step: according to multiple order-checking subsequences of described target sequencing sequence and multiple corresponding relations with reference to subsequence of described reference sequences, determine the chromosome title that described target sequencing sequence is corresponding, wherein, the each of described reference sequences and described reference sequences all has corresponding chromosome title with reference to subsequence.

2. detection method according to claim 1, it is characterized in that, multiple described sequencing sequences are the sequencing sequence from multiple samples to be detected, and each described sequencing sequence all has sample identification, and described detection method also comprises: carry out described segmentation step, described comparison step and described determining step in multi-process mode.

3. detection method according to claim 1, it is characterized in that, the quantity of described target sequencing sequence is multiple, multiple order-checking subsequences of each described sequencing sequence are being compared with reference to subsequence with the multiple of described reference sequences, after determining target sequencing sequence, and according to the multiple order-checking subsequences of described target sequencing sequence and multiple corresponding relations with reference to subsequence of described reference sequences, before determining the chromosome title that described target sequencing sequence is corresponding, described detection method also comprises:

Search and each target sequencing sequence C _ieach order-checking subsequence C _ijhave the reference subsequence of identical base, wherein, i gets 1 successively to i _max, j gets j successively _max, i _maxfor the quantity of described target sequencing sequence; j _maxfor described target sequencing sequence C _ithe quantity of order-checking subsequence;

Determine and described order-checking subsequence C _ijthe chromosome name with the reference subsequence of identical base is called described order-checking subsequence C _ijchromosome title;

Judge the first sequencing sequence C in multiple described target sequencing sequences _{i '}each order-checking subsequence C _{i ' j}chromosome title whether all identical, wherein, i ' ∈ (1, i _max); And

At described the first sequencing sequence C judging in multiple described target sequencing sequences _{i '}each order-checking subsequence C _{i ' j}the uneven identical situation of chromosome title under, from multiple described target sequencing sequences, filter out described the first sequencing sequence C _{i '}.

4. detection method according to claim 3, is characterized in that, determines in the following manner chromosome title corresponding to described target sequencing sequence:

Search and the second sequencing sequence C _{i ' '}arbitrary order-checking subsequence C _{i ' ' j '}there is the reference subsequence of identical base, wherein, the second sequencing sequence C _{i ' '}for filtering out described the first sequencing sequence C _{i '}arbitrary described target sequencing sequence, i ' ' ∈ (1, i _max), j ' ∈ (1, j ' _max), j ' _maxfor described the second sequencing sequence C _{i ' '}the quantity of order-checking subsequence, and i ' ' ≠ i '; And

Determine and described order-checking subsequence C _{i ' ' j '}the chromosome name with the reference subsequence of identical base is called described the second sequencing sequence C _{i ' '}chromosome title.

5. detection method according to claim 1, is characterized in that, take n3 base as step-length, described reference sequences is divided into fixed length as n2 base multiple with reference to subsequence after, described detection method also comprises:

The multiple of described reference sequences are stored in shared drive with reference to subsequence,

Wherein, multiple order-checking subsequences of each described sequencing sequence are compared with reference to subsequence with the multiple of described reference sequences in described shared drive, determine described target sequencing sequence, and according to multiple corresponding relations with reference to subsequence of the described reference sequences in multiple order-checking subsequences of described target sequencing sequence and described shared drive, determine the chromosome title that described target sequencing sequence is corresponding.

6. detection method according to claim 1, is characterized in that, take n3 base as step-length, described reference sequences is divided into fixed length as n2 base multiple with reference to subsequence after, described detection method also comprises:

Delete the multiple with reference to the repeat sequence in subsequence and/or the first subsequence of described reference sequences, wherein, described the first subsequence is the subsequence that comprises N base,

Wherein, multiple order-checking subsequences of each described sequencing sequence are compared with reference to subsequence with the multiple of described reference sequences, determining described target sequencing sequence comprises: multiple order-checking subsequences of each described sequencing sequence are compared with reference to subsequence with the target of described reference sequences, determine described target sequencing sequence, wherein, described target is to delete multiple described with reference to subsequence after described repeat sequence and/or described the first subsequence with reference to subsequence.

7. a chromosomal pick-up unit, is characterized in that, comprising:

Receiving element, for receiving reference sequences and multiple sequencing sequence;

Cutting unit, for each described sequencing sequence being all divided into take n1 base as step-length to the multiple order-checking subsequences of fixed length as n2 base, and take n3 base as step-length, described reference sequences is divided into multiple with reference to subsequence as n2 base of fixed length, wherein, n1, n2 and n3 are positive integer, and n1≤n3;

Comparing unit, for multiple order-checking subsequences of each described sequencing sequence are compared with reference to subsequence with the multiple of described reference sequences, determine target sequencing sequence, wherein, multiple order-checking subsequences of described target sequencing sequence are all included in the multiple with reference in subsequence of described reference sequences; And

The first determining unit, be used for according to multiple order-checking subsequences of described target sequencing sequence and multiple corresponding relations with reference to subsequence of described reference sequences, determine the chromosome title that described target sequencing sequence is corresponding, wherein, the each of described reference sequences and described reference sequences all has corresponding chromosome title with reference to subsequence.

8. pick-up unit according to claim 7, it is characterized in that, multiple described sequencing sequences are the sequencing sequence from multiple samples to be detected, and each described sequencing sequence all has sample identification, the quantity of described cutting unit, described comparing unit and described the first determining unit is multiple.

9. pick-up unit according to claim 7, is characterized in that, the quantity of described target sequencing sequence is multiple, and described pick-up unit also comprises:

Search unit, for searching and each target sequencing sequence C _ieach order-checking subsequence C _ijhave the reference subsequence of identical base, wherein, i gets 1 successively to i _max, j gets j successively _max, i _maxfor the quantity of described target sequencing sequence; j _maxfor described target sequencing sequence C _ithe quantity of order-checking subsequence;

The second determining unit, for determining and described order-checking subsequence C _ijthe chromosome name with the reference subsequence of identical base is called described order-checking subsequence C _ijchromosome title;

Judging unit, for judging the first sequencing sequence C of multiple described target sequencing sequences _{i '}each order-checking subsequence C _{i ' j}chromosome title whether all identical, wherein, i ' ∈ (1, i _max); And

Filter element, for judging the described first sequencing sequence C of multiple described target sequencing sequences _{i '}each order-checking subsequence C _{i ' j}the uneven identical situation of chromosome title under, from multiple described target sequencing sequences, filter out described the first sequencing sequence C _{i '}.

10. pick-up unit according to claim 9, is characterized in that, described the first determining unit comprises:

Search module, for searching and the second sequencing sequence C _{i ' '}arbitrary order-checking subsequence C _{i ' ' j '}there is the reference subsequence of identical base, wherein, the second sequencing sequence C _{i ' '}for filtering out described the first sequencing sequence C _{i '}arbitrary described target sequencing sequence, i ' ' ∈ (1, i _max), j ' ∈ (1, j ' _max), j ' _maxfor described the second sequencing sequence C _{i ' '}the quantity of order-checking subsequence, and i ' ' ≠ i '; And

Determination module, for determining and described order-checking subsequence C _{i ' ' j '}the chromosome name with the reference subsequence of identical base is called described the second sequencing sequence C _{i ' '}chromosome title.

11. pick-up units according to claim 7, is characterized in that, described pick-up unit also comprises:

Storage unit, for the multiple of described reference sequences are stored to shared drive with reference to subsequence,

Wherein, described comparing unit is for comparing with the multiple of described reference sequences of described shared drive multiple order-checking subsequences of each described sequencing sequence with reference to subsequence, determine described target sequencing sequence, described the first determining unit, for according to multiple corresponding relations with reference to subsequence of the described reference sequences of multiple order-checking subsequences of described target sequencing sequence and described shared drive, is determined the chromosome title that described target sequencing sequence is corresponding.

12. pick-up units according to claim 7, is characterized in that, described pick-up unit also comprises:

Delete cells, for deleting multiple repeat sequences with reference to subsequence and/or first subsequence of described reference sequences, wherein, described the first subsequence is the subsequence that comprises N base,

Wherein, described comparing unit is for comparing with the target of described reference sequences multiple order-checking subsequences of each described sequencing sequence with reference to subsequence, determine described target sequencing sequence, wherein, described target is to delete multiple described with reference to subsequence after described repeat sequence and/or described the first subsequence with reference to subsequence.