CN204440396U - For determining the kit of fetus mark - Google Patents

For determining the kit of fetus mark Download PDF

Info

Publication number
CN204440396U
CN204440396U CN201220583608.8U CN201220583608U CN204440396U CN 204440396 U CN204440396 U CN 204440396U CN 201220583608 U CN201220583608 U CN 201220583608U CN 204440396 U CN204440396 U CN 204440396U
Authority
CN
China
Prior art keywords
chromosome
sequence
interested
sample
normalization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
CN201220583608.8U
Other languages
Chinese (zh)
Inventor
里查德·P·拉瓦
阿奴巴玛·斯里尼瓦桑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Verinata Health Inc
Original Assignee
Verinata Health Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US13/445,778 external-priority patent/US9447453B2/en
Priority claimed from US13/482,964 external-priority patent/US20120270739A1/en
Priority claimed from US13/555,037 external-priority patent/US9260745B2/en
Application filed by Verinata Health Inc filed Critical Verinata Health Inc
Application granted granted Critical
Publication of CN204440396U publication Critical patent/CN204440396U/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1096Processes for the isolation, preparation or purification of DNA or RNA cDNA Synthesis; Subtracted cDNA library construction, e.g. RT, RT-PCR
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms

Abstract

The utility model discloses a kind of kit for determining fetus mark, described kit comprises a box body (1), be arranged in the multiple clamping slots for settling multiple bottles in this box body, comprise a bottle (2) of an internal positive control, comprise and be applicable to tracking and the bottle (3) determining a label nucleic acid of sample integrity, and comprise an a kind of bottle (4) of buffer solution, wherein said kit comprises multiple bottle further, each in wherein said multiple bottle comprises a kind of different internal positive control and/or a kind of different label nucleic acid.In enforcement Non-invasive Prenatal Diagnosis, diagnose and guard in cancer patient and give a kind of advantage.

Description

For determining the kit of fetus mark
Technical field
The utility model relates to a kind of kit, in particular to a kind of kit for determining fetus mark.
Background technology
One of key effort in physianthropy research has found the extremely important hereditary disorder of adverse health result.Under many circumstances, in genomic multiple part, have identified specific gene and/or key diagnostic label, they exist with abnormal copy number.Such as, in pre-natal diagnosis, whole chromosomal additionally or lose copy be recurrent genetic damage.In cancer, the copy disappearance of whole chromosome or chromosome segment or the higher levels of amplification of multiplication and specific genome area are common situations.
The most information made a variation about copy number has been provided by allowing to identify structural abnormal cytogenetics resolution characteristic.Multiple conventional program for genetic screening and biological dosimetry make use of invasive program (such as amniocentesis) and has obtained cell for karyotyping.Recognizing the needs to the rapider method of testing not needing cell chulture, having have developed the molecular cytogenetics method that fluorescence in situ hybridization (FISH), quantitative fluorescence PCR (QF-PCR) and array-comparative genome hybridization (array-CGH) are used as making a variation for analyzing copy number.
Allowing the discovery of within a short period of time to the appearance of the technology that whole genome checks order and circulation Cell-free DNA (cfDNA) to provide chance has chromosomal genetic material to be compared to compare with the chromosome of another inhereditary material by being derived from one, and the risk relevant to invasive sampling process.But, the multiple restriction (they comprise the susceptibility of the deficiency of the cfDNA coming from limited levels) of Existing methods and the order-checking deviation of technology of intrinsic property coming from genomic information determine the continuous drive for non invasive method, these non invasive methods will to provide any one of specificity, susceptibility and applicability or all, reliably to determine the change of copy number in various clinical environment.
Embodiment disclosed here meets some in above demand, and particularly give a kind of advantage providing in a kind of reliable method, the method is at least applicable to implement Non-invasive Prenatal Diagnosis and is applicable to diagnose and the metastatic of guarding in cancer patient is in progress.
Utility model content
Mother body D NA background in maternal sample all has the performance constraint of susceptibility to any detection attempting to distinguish fetal chromosomal from the maternal DNA group of sample.Therefore, for diagnosis and the conventional sense of the quantitative differences relied between fetus and maternal DNA group and/or essence difference, fetus mark is the important parameter needing to consider.The invention provides a kind of method for determining the fetus mark in maternal sample.The function of fetus mark as normalization chromosome value or normalization chromosomal region segment value obtains by the method.The present invention is for determining that the method for fetus mark can be combined with additive method, such as combine with the method that fetus mark is obtained as the function of polymorphism allelic unbalance information, the copy number variation of the fetal chromosomal in maternal sample or chromosome segment is classified.Present invention also offers the equipment and kit of implementing described method.
Supplied multiple method for determining copy number variation (CNV) of sequence interested in the test sample comprising mixtures of nucleic acids, these nucleic acid are known or under a cloud is different in the amount of interested one or more sequence.This method comprises a kind of statistical, and the cumulative bad variability of the variability between that be correlated with from process, interchromosomal and sequence is taken into account by this statistical method.The method is applicable to the CNV determining any fetus aneuploidy, and known or that suspection is relevant to plurality of medical condition multiple CNV.Any one or multiple trisomys in chromosome 1-22, X and Y or monosomy is comprised according to the confirmable CNV of this method, other chromosomal polysomies, and the disappearance of any one or multiple sections in these chromosomes and/or copy, these can detect by only carrying out once sequencing to the nucleic acid of test sample.The order-checking information obtained from only the carrying out once sequencing of nucleic acid by test sample can determine any aneuploidy.
Provide a kind of method in one embodiment, the method is used for determining that any four kinds of presence or absence or more plants different, complete fetal chromosomal aneuploidy in the parent test sample comprising fetus and maternal nucleic acids.The step of the method comprises: (a) obtain in parent test sample fetus with the sequence information of maternal nucleic acids; B () uses this sequence information to identify the sequence label of some for each being selected from interested any four or more chromosome of chromosome 1-22, X and Y, and for identifying the sequence label of some for each normalization chromosome sequence in described interested any four or more chromosome; C () uses for the number of each the described sequence label identified in described interested any four or more chromosome and the number of described sequence label that identifies for each described normalization chromosome sequence in described interested any four or more chromosome, each calculates a monosome dosage; And (d) by for each described monosome dosage of each in described interested any four or more chromosome with compare for each threshold value in described interested any four or more chromosome, and determine the fetal chromosomal aneuploidy that any four kinds of presence or absence or more kind is complete, different in this parent test sample thus.Step (a) can comprise checking order at least partially in these nucleic acid of a test sample, to obtain for the test fetus of sample and the described sequence information of maternal nucleic acids molecule.In some embodiments, step (c) comprises and calculates a monosome dosage for each described interested chromosome, as the number of this sequence label identified for each described interested chromosome and the ratio of this sequence label number identified for each described interested chromosomal described normalization chromosome sequence.In some other embodiments, step (c) comprising: (i) calculates a sequence label density ratio by making the number of this sequence label identified for each described interested chromosome in step (b) carry out associating with each described interested chromosomal length for each described interested chromosome; (ii) a sequence label density ratio is calculated by making the number of this sequence label identified for each described normalization chromosome sequence in step (b) carry out associating with the length of each described normalization chromosome sequence for each described normalization chromosome sequence; And (iii) be used in these sequence label density ratios calculated in step (i) and (ii) and calculate a monosome dosage for each described interested chromosome, wherein this chromosome dosage calculates as each described interested chromosomal sequence label density ratio and the ratio for the sequence label density ratio of each described interested chromosomal described normalization chromosome sequence.
Provide a kind of method in another embodiment for determining that any four kinds of presence or absence or more plants different, complete fetal chromosomal aneuploidy in the parent test sample comprising fetus and maternal nucleic acids.The step of the method comprises: (a) obtains the sequence information for the fetus in parent test sample and maternal nucleic acids, (b) use described sequence information for each being selected from interested any four or more chromosome of chromosome 1-22, X and Y identify some sequence label and for the sequence label identifying some for each normalization chromosome sequence in described interested any four or more chromosome, c () uses for the number of each the described sequence label identified in described interested any four or more chromosome and the number of described sequence label that identifies for each described normalization chromosome sequence in described interested any four or more chromosome, each calculates a monosome dosage, and (d) by for each described monosome dosage of each in described interested any four or more chromosome with compare for each threshold value in described interested any four or more chromosome, and determine that any four kinds of presence or absence or more is planted complete in this parent test sample thus, different fetal chromosomal aneuploidy, wherein be selected from chromosome 1-22, X, and described interested any four or more chromosome of Y comprises and is selected from chromosome 1-22, X, and at least two of Y ten chromosomes, and it is different wherein to determine presence or absence at least two ten kinds, complete fetal chromosomal aneuploidy.Step (a) can comprise checking order at least partially in these nucleic acid of test sample, to obtain the described sequence information of fetus for this test sample and maternal nucleic acids molecule.In some embodiments, step (c) comprises and calculates a monosome dosage for each described interested chromosome, as the number of this sequence label identified for each described interested chromosome and the ratio of this sequence label number identified for each described interested chromosomal described normalization chromosome sequence.In some other embodiments, step (c) comprising: (i) calculates a sequence label density ratio by making the number of this sequence label identified for each described interested chromosome in step (b) carry out associating with each described interested chromosomal length for each described interested chromosome; (ii) a sequence label density ratio is calculated by making the number of this sequence label identified for each described normalization chromosome sequence in step (b) carry out associating with the length of each described normalization chromosome sequence for each described normalization chromosome sequence; And (iii) be used in these sequence label density ratios calculated in step (i) and (ii) and calculate a monosome dosage for each described interested chromosome, wherein said chromosome dosage calculates as each described interested chromosomal sequence label density ratio and the ratio for the sequence label density ratio of each described interested chromosomal described normalization chromosome sequence.
Provide a kind of method in another embodiment, for determining that any four kinds of presence or absence or more plants different, complete fetal chromosomal aneuploidy in the parent test sample comprising fetus and maternal nucleic acids.The step of the method comprises: (a) obtains the sequence information for the described fetus in parent test sample and maternal nucleic acids, b () uses described sequence information to identify the sequence label of some for each being selected from interested any four or more chromosome of chromosome 1-22, X and Y, and for identifying the sequence label of some for each normalization chromosome sequence in described interested any four or more chromosome, c () uses for the number of each the described sequence label identified in described interested any four or more chromosome and the number of described sequence label that identifies for each described normalization chromosome sequence in described interested any four or more chromosome, each calculates a monosome dosage, and (d) by for each described monosome dosage of each in described interested any four or more chromosome with compare for each threshold value in described interested any four or more chromosome, and determine that any four kinds of presence or absence or more is planted complete in described sample thus, different fetal chromosomal aneuploidy, wherein be selected from chromosome 1-22, X, and described interested any four or more chromosome of Y is all chromosome 1-22, X and Y, and wherein determine the whole chromosome 1-22 of presence or absence, X, with the complete fetal chromosomal aneuploidy of Y.Step (a) can comprise checking order at least partially in these nucleic acid of test sample, to obtain the described sequence information of fetus for this test sample and maternal nucleic acids molecule.In some embodiments, step (c) comprises and calculates a monosome dosage for each described interested chromosome, as the number of this sequence label identified for each described interested chromosome and the ratio of this sequence label number identified for each described interested chromosomal described normalization chromosome sequence.In some other embodiments, step (c) comprising: (i) calculates a sequence label density ratio by making the number of this sequence label identified for each described interested chromosome in step (b) carry out associating with each described interested chromosomal length for each described interested chromosome; (ii) a sequence label density ratio is calculated by making the number of this sequence label identified for each described normalization chromosome sequence in step (b) carry out associating with the length of each described normalization chromosome sequence for each described normalization chromosome sequence; And (iii) be used in these sequence label density ratios calculated in step (i) and (ii) and calculate a monosome dosage for each described interested chromosome, wherein this chromosome dosage calculates as each described interested chromosomal sequence label density ratio and the ratio for the sequence label density ratio of each described interested chromosomal described normalization chromosome sequence.
In officely how go up in embodiment, this normalization chromosome sequence can be a kind of monosome being selected from chromosome 1-22, X and Y.Alternately, this normalization chromosome sequence is the group chromosome being selected from chromosome 1-22, X and Y.
Provide a kind of method in another embodiment, for determining presence or absence any one or multiple different, complete fetal chromosomal aneuploidy in the parent test sample comprising fetus and maternal nucleic acids.The step of the method comprises: (a) obtains the sequence information for described fetus in the sample to which and maternal nucleic acids; B () uses described sequence information to identify the sequence label of some for each being selected from any one or more chromosomes interested of chromosome 1-22, X and Y, and for identifying the sequence label of some for each normalization chromosome sequence in described any one or more chromosomes interested; C () uses for the number of each the described sequence label identified in described any one or more chromosomes interested and the number of described sequence label that identifies for each described normalization sector sequence that each calculates a monosome dosage in described any one or more chromosomes interested; And (d) by for each described monosome dosage in described any one or more chromosomes interested with compare for each threshold value in described interested one or more chromosome, and determine any one or more complete, different fetal chromosomal aneuploidy of presence or absence in described sample thus.Step (a) can comprise checking order at least partially in these nucleic acid of test sample, to obtain the described sequence information of fetus for this test sample and maternal nucleic acids molecule.
In some embodiments, step (c) comprises and calculates a monosome dosage for each described interested chromosome, as the number of this sequence label identified for each described interested chromosome and the ratio of this sequence label number identified for each described interested chromosomal described normalization chromosome sequence.In some other embodiments, step (c) comprising: (i) calculates a sequence label density ratio by making the number of this sequence label identified for each described interested chromosome in step (b) carry out associating with each described interested chromosomal length for each in each described interested chromosome; (ii) a sequence label density ratio is calculated by making the number of this sequence label identified for each described normalization sector sequence in step (b) carry out associating with the chromosomal length of each described normalization for each described normalization sector sequence; And (iii) use the sequence label density ratio calculated in step (i) and (ii) to calculate the monosome dosage of each in interested described chromosome, wherein said chromosome dosage is calculated as the ratio of the sequence label density ratio of the sequence label density ratio of each in interested chromosome and the normalization sector sequence of each in interested chromosome.
Provide a kind of method in another embodiment, for determining presence or absence any one or multiple different, complete fetal chromosomal aneuploidy in the parent test sample comprising fetus and maternal nucleic acids.The step of the method comprises: (a) obtains the sequence information for fetus in the sample to which and maternal nucleic acids, b () uses described sequence information to identify the sequence label of some for each being selected from any one or more chromosomes interested of chromosome 1-22, X and Y, and for identifying the sequence label of some for each normalization chromosome sequence in described any one or more chromosomes interested, c () uses for the number of each the described sequence label identified in described any one or more chromosomes interested and the number of described sequence label that identifies for each described normalization sector sequence that each calculates a monosome dosage in described any one or more chromosomes interested, and (d) by for each in described any one or more chromosomes interested each described monosome dosage with compare for each threshold value in described any one or more chromosomes interested, and one or more are complete to determine in described sample presence or absence thus, different fetal chromosomal aneuploidy, wherein be selected from chromosome 1-22, X, and any one or more chromosomes described interested of Y comprise and are selected from chromosome 1-22, at least two ten chromosomes of X and Y, and wherein determine the complete fetal chromosomal aneuploidy that presence or absence at least two ten kinds is different.Step (a) can comprise checking order at least partially in these nucleic acid of test sample, to obtain the described sequence information of fetus for this test sample and maternal nucleic acids molecule.In some embodiments, step (c) comprises and calculates a monosome dosage for each described interested chromosome, as the number of this sequence label identified for each described interested chromosome and the ratio of this sequence label number identified for each described interested chromosomal described normalization chromosome sequence.In some other embodiments, step (c) comprising: (i) calculates a sequence label density ratio by making the number of this sequence label identified for each described interested chromosome in step (b) carry out associating with each described interested chromosomal length for each described interested chromosome; (ii) a sequence label density ratio is calculated by making the number of this sequence label identified for each described normalization sector sequence in step (b) carry out associating with the chromosomal length of each described normalization for each described normalization sector sequence; And (iii) be used in these sequence label density ratios calculated in step (i) and (ii) and calculate a monosome dosage for each described interested chromosome, wherein said chromosome dosage calculates as each described interested chromosomal sequence label density ratio and the ratio for the sequence label density ratio of each described interested chromosomal described normalization sector sequence.
Provide a kind of method in another embodiment, for determining presence or absence any one or multiple different, complete fetal chromosomal aneuploidy in the parent test sample comprising fetus and maternal nucleic acids.The step of the method comprises: (a) obtains the sequence information for fetus in the sample to which and maternal nucleic acids, b () uses described sequence information to identify the sequence label of some for each being selected from any one or more chromosomes interested of chromosome 1-22, X and Y, and for identifying the sequence label of some for each normalization sector sequence in described any one or more chromosomes interested, c () uses for the number of each the described sequence label identified in described any one or more chromosomes interested and the number of described sequence label that identifies for each described normalization sector sequence that each calculates a monosome dosage in described any one or more chromosomes interested, and (d) by for each in described any one or more chromosomes interested each described monosome dosage with compare for each threshold value in described any one or more chromosomes interested, and one or more are complete to determine in described sample presence or absence thus, different fetal chromosomal aneuploidy, wherein be selected from chromosome 1-22, X, and any one or more chromosomes described interested of Y are whole chromosome 1-22, X and Y, and wherein determine the whole chromosome 1-22 of presence or absence, X, with the complete fetal chromosomal aneuploidy of Y.Step (a) can comprise checking order at least partially in these nucleic acid of test sample, to obtain the described sequence information of fetus for this test sample and maternal nucleic acids molecule.In some embodiments, step (c) comprises and calculates a monosome dosage for each described interested chromosome, as the number of this sequence label identified for each described interested chromosome and the ratio of this sequence label number identified for each described interested chromosomal described normalization chromosome sequence.In some other embodiments, step (c) comprising: (i) calculates a sequence label density ratio by making the number of this sequence label identified for each described interested chromosome in step (b) carry out associating with each described interested chromosomal length for each described interested chromosome; (ii) a sequence label density ratio is calculated by making the number of this sequence label identified for each described normalization sector sequence in step (b) carry out associating with the chromosomal length of each described normalization for each described normalization sector sequence; And (iii) be used in these sequence label density ratios calculated in step (i) and (ii) and calculate a monosome dosage for each described interested chromosome, wherein said chromosome dosage calculates as each described interested chromosomal sequence label density ratio and the ratio for the sequence label density ratio of each described interested chromosomal described normalization sector sequence.
Above embodiment any one in, these different complete chromosome aneuploidy are selected from complete chromosome trisomy, complete chromosome monosomy and complete chromosome polysomy.These coloured differently body aneuploidy are selected from the complete aneuploidy of any one in chromosome 1-22, X and Y.Such as, the fetal chromosomal aneuploidy that described difference is complete is selected from trisomy 2, trisomy 8, trisomy 9, trisomy 20, trisomy 21, trisomy 13, trisomy 16, trisomy 18, trisomy 22,47, XXX, 47, XYY and monosomy X.
Above embodiment any one in, step (a)-(d) is repeated for the test sample from different female subject, and the method comprises to be determined in each test sample, the chromosome aneuploidy of any four or more of a presence or absence different complete fetus.
Above embodiment any one in, the method may further include and calculates a normalization chromosome value (NCV), wherein said NCV makes described chromosome dosage associate to the mean value of the corresponding chromosome dosage in one group of qualified samples, as:
NCV ij = x ij - μ ^ j σ ^ j
Wherein with the estimation mean value for the chromosome dosage of the jth in one group of qualified samples and standard deviation respectively accordingly, and x ijfor the viewed jth of a test sample i chromosome dosage.
Provide a kind of method in another embodiment, for determine in the parent test sample comprising fetus and maternal nucleic acids presence or absence different, the fetal chromosomal aneuploidy of part.The step of the method comprises: (a) obtains the sequence information for fetus in the sample to which and maternal nucleic acids; B () uses described sequence information identify the sequence label of some for each any one or more any one or more sections chromosomal interested being selected from chromosome 1-22, X and Y and identify the sequence label of some for the normalization sector sequence of each described any one or more any one or more sections chromosomal interested; The number of c described sequence label that () uses the number of the described sequence label identified for each described any one or more any one or more sections chromosomal interested and identify for each described normalization sector sequence calculates a monosome dosage for each in described any one or more any one or more sections chromosomal interested; And (d) compare for each described single section dosage in each described any one or more any one or more sections chromosomal interested with for a threshold value of each described any one or more any one or more sections chromosomal interested, and determine one or more fetal chromosomal aneuploidy that are different, part of in described sample presence or absence thus.Step (a) can comprise checking order at least partially in these nucleic acid of test sample, to obtain the described sequence information of fetus for this test sample and maternal nucleic acids molecule.
In some embodiments, step (c) comprises and calculates a single section dosage for each described any one or more any one or more sections chromosomal interested, the ratio of the number of this sequence label that the number as this sequence label identified for each described any one or more any one or more sections chromosomal interested identifies with the described normalization sector sequence for each described any one or more any one or more sections chromosomal interested.In some other embodiments, step (c) comprising: (i) calculates a sequence label density ratio by making to carry out associating for the number of this sequence label identified in each described each section interested and the length of each described interested section in step (b) for each described interested section; (ii) a sequence label density ratio is calculated by making the number of this sequence label identified for each described normalization sector sequence in step (b) carry out associating with the length of each described normalization sector sequence for each described normalization sector sequence; And (iii) be used in these sequence label density ratios calculated in step (i) and (ii) and calculate a monosome dosage for each described interested section, wherein said section dosage calculates as the sequence label density ratio for each described interested section and the ratio for the sequence label density ratio of the described normalization sector sequence of each described interested section.The method may further include and calculates a normalization section value (NSV), and wherein said NSV makes described section dosage associate to the mean value of the corresponding section dosage in one group of qualified samples, as:
NSV ij = x ij - μ ^ j σ ^ j
Wherein with accordingly the estimation mean value for the section dosage of the jth in one group of qualified samples and standard deviation, and x ijit is the viewed jth section dosage for test sample i.
In multiple embodiments of illustrated method, use normalization sector sequence to determine chromosome dosage or section dosage thus, this normalization sector sequence can be a single section any one or more in chromosome 1-22, X and Y.Alternately, this normalization sector sequence can be one group of section any one or more in chromosome 1-22, X and Y.
Multiple test samples from different female subject are recycled and reused for step (a)-(d) of the method for the fetal chromosomal aneuploidy determining presence or absence part, and the method comprise determine in each described sample presence or absence different, the fetal chromosomal aneuploidy of part.The aneuploidy of the part of any chromosomal any fragment is comprised according to the fetal chromosomal aneuploidy of the confirmable part of the method.The aneuploidy of these parts can be selected from the disappearance of the copying of part, the multiplication of part, the insertion of part and part.The partial monosomy of chromosome 1, the partial monosomy of chromosome 4, the partial monosomy of chromosome 5, the partial monosomy of chromosome 7, the partial monosomy of chromosome 11, the partial monosomy of chromosome 15, the partial monosomy of chromosome 17, the partial monosomy of chromosome 18 and the partial monosomy of chromosome 22 is comprised according to the example of the confirmable part aneuploidy of the method.
Above-mentioned embodiment any one in, this test sample can be the maternal sample being selected from blood, blood plasma, serum, urine and saliva sample.These embodiments any one in, this test sample can be plasma sample.These nucleic acid molecules of maternal sample be fetus with the Cell-free DNA molecule of parent.Order-checking (NGS) of future generation can be used to check order to these nucleic acid.In some embodiments, order-checking is the extensive parallel order-checking using the synthetic method by reversible dye-terminators to check order.In other embodiments, order-checking is connection method order-checking.Still in other embodiments, order-checking is single-molecule sequencing.Optionally, before order-checking, an amplification step is carried out.
Provide a kind of method in another embodiment, for determining that any 20 kinds of presence or absence or more plants different, complete fetal chromosomal aneuploidy in the Maternal plasma test sample comprising fetus and the potpourri of the Cell-free DNA molecule of parent.The step of the method comprises: (a) is to checking order to obtain the sequence information of the Cell-free DNA molecule for the fetus in this sample and parent at least partially in Cell-free DNA molecule; B () uses described sequence information identify the sequence label of some for each interested any 20 that are selected from chromosome 1-22, X and Y or more chromosomes and identify the sequence label of some for each described interested 20 or more individual chromosomal normalization chromosomes; C () uses the number of the described sequence label identified for each described interested 20 or more chromosomes and the number of described sequence label that identifies for each described normalization chromosome calculates a monosome dosage for each described interested 20 or more individual chromosomes; And (d) will compare with for each described interested 20 or more chromosomal threshold values for each described interested 20 or more chromosomal each described monosome dosage, and determine that any 20 kinds of presence or absence or more plants different, complete fetal chromosomal aneuploidy in described sample thus.
In another embodiment, the invention provides the method for copy number variation (CNV) for identifying an interested sequence (such as relevant clinically sequence) in the test sample, the method comprises the following steps: (a) obtains a test sample and multiple qualified sample, described test sample comprises test nucleic acid molecules and described multiple qualified sample, and described multiple qualified sample comprises qualified nucleic acid molecules; (b) obtain described in described sample fetus with the sequence information of the nucleic acid of parent; C (), based on the qualified sequence dosage of interested described qualified sequence in described order-checking calculating multiple qualified samples described in each of described qualified nucleic acid molecules, wherein said calculating qualified sequence dosage comprises the parameter determining interested described qualified sequence and at least one qualified normalization sequence; D () identifies at least one qualified normalization sequence based on described qualified sequence dosage, wherein described in described multiple qualified samples, at least one qualified normalization sequence has minimum variability and/or maximum resolvability; E () is based on the described order-checking of nucleic acid molecules described in described test sample, calculate the cycle tests dosage of interested described cycle tests, wherein said calculating cycle tests dosage comprises the parameter determining described interested cycle tests and at least one normalization cycle tests, and at least one normalization cycle tests described corresponds at least one qualified normalization sequence described; (f) more described cycle tests dosage and at least one threshold value; And (g) evaluates the described copy number variation of interested sequence described in described test sample based on the result of step (f).In one embodiment, for described interested qualified sequence and the parameter of at least one qualified normalization sequence, this multiple sequence label being mapped to described interested qualified sequence is associated with this multiple label being mapped to described qualified normalization sequence, and wherein interested described cycle tests and the described parameter of at least one normalization cycle tests make this multiple sequence label being mapped to described interested cycle tests associate with this multiple label being mapped to described normalization cycle tests.In some embodiments, step (b) comprises checking order at least partially to these in nucleic acid molecules that is qualified and test, and wherein order-checking comprises the sequence label and an interested qualified sequence that are provided for the multiple mappings tested and at least one test and at least one qualified normalization sequence; The fetus of this test sample and the sequence information of maternal nucleic acids molecule are obtained to checking order at least partially in the described nucleic acid molecules of test sample.Employ sequence measurement of future generation in some embodiments to carry out this sequencing steps.In some embodiments, this sequence measurement can be extensive parallel sequence measurement, and wherein this sequence measurement uses the synthetic method by reversible dye-terminators to check order.In other embodiments, this sequence measurement is connection method order-checking.In some embodiments, order-checking comprises and once increasing.In other embodiments, order-checking is single-molecule sequencing.The CNV of interested sequence is a kind of aneuploidy, and it can be a chromosomal or partial aneuploidy.In some embodiments, this chromosome aneuploidy is selected from trisomy 2, trisomy 8, trisomy 9, trisomy 20, trisomy 16, trisomy 21, trisomy 13, trisomy 18, trisomy 22, Ge Laifude Cotard (klinefelter ' s syndrome), 47, XXX, 47, XYY and monomer X.In other embodiments, the aneuploidy of this part is that a chromosome dyad disappearance or a chromosome dyad insert.In some embodiments, by a kind of chromosomal or partial aneuploidy that the CNV of the method identification is relevant to cancer.In some embodiments, what these were tested is biological fluid sample with qualified sample, such as: the plasma sample deriving from conceived experimenter (human experimenter as pregnancy).In other embodiments, test is derive from known or suspect the experimenter suffering from cancer with qualified biological fluid sample (such as plasma sample).
For determining that some method of presence or absence fetal chromosomal aneuploidy can comprise following operation in parent test sample: (a) provides the sequence reads from the fetus in this parent test sample and maternal nucleic acids, and wherein these sequence reads provide in electronic format; B () uses a calculation element these sequence reads and one or more chromosome reference sequences to be compared, and provide the multiple sequence labels corresponding with these sequence reads thus; C () identifies the number of these sequence labels from one or more interested chromosome or interested chromosome segment in the mode calculated, and identify the number of these sequence labels of at least one normalization chromosome sequence of each in this or these interested chromosome or interested chromosome segment or normalization chromosome segment sequence in the mode calculated; D () uses for the number of each the described sequence label identified in described one or more interested chromosome or interested chromosome segment and the number for each the described sequence label identified in described normalization chromosome sequence or normalization chromosome segment sequence, calculate for each monosome in described one or more interested chromosome or interested chromosome segment or section dosage in the mode calculated; And (e) use described calculation element by the described monosome dosage for each in one or more interested chromosome or interested chromosome segment each with compare for each respective threshold in described one or more interested chromosome or interested chromosome segment, and in described test sample, determine presence or absence at least one fetus aneuploidy thus.In some implementation, the number for each sequence label identified in this or these interested chromosome or interested chromosome segment is at least about 10,000 or at least about 100,000.Disclosed embodiment also provides a kind of computer program, this computer program comprises a non-transitory computer-readable media, and this non-transitory computer-readable media provides the programmed instruction for performing described operation and other calculating operations described here.
In certain embodiments, chromosome reference sequences has multiple region be excluded, and these regions be excluded are present in chromosome natively but they do not affect the number of its sequence label for any chromosome or chromosome segment.In certain embodiments, one method comprises in addition: (i) determines whether that the reading of paying attention to is compared with a site on a chromosome reference sequences, and another reading carrying out test sample in this site had previously carried out comparison; And (ii) determine whether among the number that the reading of this being paid attention to is included in for the sequence label of an interested chromosome or an interested chromosome segment.Chromosome reference sequences can be stored on computer-readable media.
In certain embodiments, a kind of method comprises in addition to checking order at least partially, so that the described sequence information of the described fetus obtained for described test sample and maternal nucleic acids molecule in the described nucleic acid molecules of described parent test sample.Order-checking can comprise carries out extensive parallel order-checking to produce sequence reads to the parent from this parent test sample with fetal nucleic acid.
In certain embodiments, a kind of method is included in further to provide in the patient medical record card of the human experimenter of this parent test sample and purpose processor is recorded automatically as presence or absence fetal chromosomal aneuploidy determined in (d).Record can be included in computer-readable media and record chromosome dosage and/or the diagnosis based on described chromosome dosage.In some cases, patient medical record card is preserved by laboratory, doctor's office, hospital, HMO, insurance company or IMR's card website.A kind of method can comprise further prescribes to the human experimenter obtaining this parent test sample, starts treatment and/or change treatment.Additionally or alternatively, the method can comprise reservation and/or perform one or more other tests.
The normalization chromosome sequence of the interested chromosome of some method identification disclosed here or chromosome segment or normalization chromosome segment sequence.Method described in some comprises following operation: (a) provides the multiple qualified samples for interested chromosome or chromosome segment; B () uses multiple potential normalization chromosome sequence or normalization chromosome segment sequence to come for interested chromosome or chromosome segment double counting chromosome dosage, wherein a this double counting calculation element performs; And (c) individually or in one combination normalization chromosome sequence or normalization chromosome segment sequence are selected, thus in the dosage calculated for interested chromosome or chromosome segment, provide minimum variability and/or large resolvability.
The normalization chromosome sequence selected or normalization chromosome segment sequence can be the part of the combination of normalization chromosome sequence or normalization chromosome segment sequence, or can provide separately, instead of with other normalization chromosome sequences or normalization chromosome segment sequence in combination.
The embodiment disclosed provides a kind of and to make a variation the method for classifying to the copy number in Fetal genome.The operation of the method comprises: (a) receives the sequence reads from the fetus in a parent test sample and maternal nucleic acids, and wherein these sequence reads provide in electronic format; B () uses a calculation element these sequence reads and one or more chromosome reference sequences to be compared, and provide the multiple sequence labels corresponding with these sequence reads thus; C () identifies the number from one or more interested these sequence labels chromosomal by using this calculation element in the mode calculated, and determine that a first interested chromosome in this fetus makes a variation with copy number; D () calculates a first fetus fractional value by a kind of first method, this first method does not use the information from this first interested chromosomal label; E () calculates a second fetus fractional value by a kind of second method, this second method uses the information from the label of this first chromosome; And (f) this first fetus fractional value compared with this second fetus fractional value and use this to compare and the copy number of this first chromosome made a variation and classify.In certain embodiments, the Cell-free DNA that the method comprises further to testing sample from this parent checks order to provide these sequence reads.In certain embodiments, the method comprise further from a conceived biosome obtain this parent test sample.In certain embodiments, operate (b) and comprise use calculation element comparison at least about 1,000,000 readings.In certain embodiments, operate (f) can comprise and determine these two fetus fractional values whether approximately equal.
In certain embodiments, operation (f) can comprise further determines this two fetus fractional value approximately equals, and determines that the ploidy hypothesis implied in this second method is real thus.In certain embodiments, this ploidy hypothesis implied in this second method is that this first interested chromosome has complete chromosome aneuploidy.In some these embodiment, this first interested chromosomal complete chromosome aneuploidy is monosomy or trisomy.
In certain embodiments, whether not operation (f) can comprise determines these two fetus fractional values approximately equal, and comprise further and analyze this first interested chromosomal label information to determine that (i) this first interested chromosome is with a kind of part aneuploidy, or (ii) this fetus is a chimera.
In certain embodiments, this operation can also comprise this first interested chromosomal sequence is cased into multiple part; Whether any one determining in described part comprises the nucleic acid significantly more or significantly more less than other parts one or more; And if any one in described part comprises the nucleic acid significantly more or significantly more less than other parts one or more, then determine that this first interested chromosome is with part aneuploidy.In one embodiment, this operation can comprise further determine to comprise the nucleic acid significantly more or significantly more less than other parts one or more this first interested chromosomal part with part aneuploidy.
In one embodiment, operate (f) can also comprise this first interested chromosomal sequence is cased into multiple part; Whether any one determining in described part comprises the nucleic acid significantly more or significantly more less than other parts one or more; And if in described part, do not comprise the nucleic acid significantly more or significantly more less than other parts one or more, then determine that this fetus is a chimera.
Operation (e) can comprise: (a) calculates number from the sequence label of this first interested chromosome and at least one normalization chromosome sequence to determine chromosome dosage; And (b) use the second method from this chromosome Rapid Dose Calculation fetus fractional value.In certain embodiments, this operation comprises the normalized chromosome value (NCV) of calculating further, wherein this second method uses this normalized chromosome value, and wherein the average of this chromosome dosage to the corresponding chromosome dosage in one group of qualified samples is associated by this NCV, as:
NCV iA = R iA - R ιU ‾ σ iU
Wherein and σ iUthe estimation average for i-th chromosome dosage in this group qualified samples and standard deviation respectively, and R iAthe chromosome dosage calculated for interested chromosome.In another embodiment, operate (d) to comprise the first method further and use the information from allele one or more polymorphisms unbalanced represented in this parent test fetus of sample and maternal nucleic acids to calculate the first fetus fractional value.
In different embodiments, if the first fetus fractional value and the second fetus fractional value not approximately equal, then the method comprise further (i) determine copy number variation be caused by part aneuploidy or chimera; And if (ii) copy number variation is caused by part aneuploidy, then determine the locus of the part aneuploidy on this first interested chromosome.In certain embodiments, the locus of the part aneuploidy determined on this first interested chromosome comprises and these first interested these sequence labels chromosomal is divided into nucleic acid data box in this first interested chromosome or matrix; And these map tags in each data box are counted.
Operation (e) can comprise by calculating fetus fractional value to following formula evaluation further:
ff=2×|NCV iACV iU|
Wherein ff is the second fetus fractional value, NCV iAthe normalized chromosome value in an influenced sample on i-th chromosome, and CV iUit is the coefficient of variation of the interested chromosomal dosage determined in these qualified samples.
In any one above embodiment, this first interested chromosome is selected from lower group, and this group is made up of chromosome 1 to 22, X and Y.In any one above embodiment, copy number variation can be categorized into the classification being selected from lower group by operation (f), and this group is made up of the following: complete chromosome inserts, complete chromosome disappearance, chromosome dyad copy and chromosome dyad lacks and chimera.
Disclosed embodiment also provides a kind of computer program, this computer program comprises a non-transitory computer-readable media, this non-transitory computer-readable media provides for the programmed instruction of classifying that makes a variation to the copy number in Fetal genome.This computer program can comprise: (a), for receiving the code of the sequence reads from the fetus in parent test sample and maternal nucleic acids, wherein these sequence reads provide in electronic format; B () uses a calculation element to be used for being compared with one or more chromosome reference sequences by these sequence reads and providing the code of the multiple sequence labels corresponding with these sequence reads thus; C () is used for identifying the number from one or more interested these sequence labels chromosomal in the mode calculated and determining the code that a first interested chromosome in this fetus makes a variation with copy number by using this calculation element; D (), for being calculated the code of the first fetus fractional value by a kind of first method, this first method does not use the information from this first interested chromosomal label; E (), for being calculated the code of the second fetus fractional value by a kind of second method, this second method uses the information from the label of this first chromosome; And (f) to make a variation the code of classifying to the copy number of this first chromosome for being compared with this second fetus fractional value by this first fetus fractional value and using this to compare.In certain embodiments, this computer program comprises the code for the different operation in any one above embodiment of disclosed method and method.
The embodiment disclosed also provides a kind of and to make a variation the system of classifying to the copy number in Fetal genome.This system comprises: (a) for receive from the fetus in parent test sample and maternal nucleic acids at least about 10, an interface of 000 sequence reads, wherein these sequence reads provide in electronic format; B () is at least temporarily storing the storer of multiple described sequence reads; (c) processor, this processor is designed or is configured to multiple programmed instruction, these programmed instruction are used for: these sequence reads and one or more chromosome reference sequences are compared by (i), and provide the multiple sequence labels corresponding with these sequence reads thus; (ii) identify a number from one or more interested these sequence labels chromosomal, and determine that a first interested chromosome in this fetus makes a variation with copy number; (iii) calculate a first fetus fractional value by a kind of first method, this first method does not use the information from this first interested chromosomal label; (iv) calculate a second fetus fractional value by a kind of second method, this second method uses the information from the label of this first chromosome; And (v) this first fetus fractional value compared with this second fetus fractional value and use this to compare and the copy number of this first chromosome made a variation and classify.According to different embodiments, the first interested chromosome is selected from lower group, and this group is made up of chromosome 1 to 22, X and Y.In certain embodiments, programmed instruction for (c) (v) comprises the programmed instruction for the variation of this copy number being categorized into the classification being selected from lower group, and this group is made up of the following: complete chromosome inserts, complete chromosome disappearance, chromosome dyad copy and chromosome dyad lacks and chimera.According to different embodiments, this system can comprise the programmed instruction checking order to provide these sequence reads to the Cell-free DNA from this parent test sample.According to some embodiment, the programmed instruction for operating (c) (i) comprises use calculation element for the programmed instruction of comparison at least about 1,000,000 readings.
In certain embodiments, this system also comprises a sequenator, and this sequenator is configured to for checking order to the fetus in a parent test sample and maternal nucleic acids and provide sequence reads in electronic format.In different embodiments, this sequenator is arranged in this processor the facility separated, and this sequenator is connected by network with this processor.
In different embodiments, system also comprises the device for obtaining parent test sample from a pregnant mothers further.According to some embodiment, be arranged in facility out of the ordinary for this device and this processor obtaining parent test sample.In different embodiments, system also comprises for the device from parent test sample extraction Cell-free DNA.In certain embodiments, be arranged in same facility for this device and this sequenator extracting Cell-free DNA, and be arranged in a far-end facility for this device obtaining parent test sample.
According to some embodiment, the programmed instruction for this first fetus fractional value and this second fetus fractional value being compared also comprises for determining the whether approximately equalised programmed instruction of these two fetus fractional values.
In certain embodiments, this system also comprises for determining that the ploidy hypothesis implied in the second method is real programmed instruction when this two fetus fractional value approximately equals.In certain embodiments, the ploidy hypothesis implied in the second method is that this first interested chromosome has complete chromosome aneuploidy.In certain embodiments, this first interested chromosomal complete chromosome aneuploidy is monosomy or trisomy.
In certain embodiments, this system also comprises for analyzing this first interested chromosomal label information to determine that (i) this first interested chromosome is with a kind of part aneuploidy, still (ii) this fetus is a chimeric programmed instruction, the programmed instruction be configured in for this first fetus fractional value and this second fetus fractional value are compared of these programmed instruction wherein for analyzing indicate these two fetus fractional values not approximately equal time perform.In certain embodiments, the programmed instruction for analyzing this first interested chromosomal label information comprises: for this first interested chromosomal sequence being cased into the programmed instruction of multiple part; The programmed instruction of the nucleic acid significantly more or significantly more less than other parts one or more whether is comprised for any one determining in described part; And if comprise the nucleic acid significantly more or significantly more less than other parts one or more for any one in described part, then determine the programmed instruction of this first interested chromosome with a kind of part aneuploidy.In certain embodiments, this system comprises the programmed instruction of this first interested chromosomal part with this part aneuploidy for determining to comprise the nucleic acid significantly more or significantly more less than other parts one or more further.
In certain embodiments, the programmed instruction for analyzing this first interested chromosomal label information comprises: for this first interested chromosomal sequence being cased into the programmed instruction of multiple part; The programmed instruction of the nucleic acid significantly more or significantly more less than other parts one or more whether is comprised for any one determining in described part; And if for not comprising the nucleic acid significantly more or significantly more less than other parts one or more in described part, then determine that this fetus is a chimeric programmed instruction.
According to different embodiments, this system can comprise the programmed instruction of the second method for calculating fetus fractional value, and these programmed instruction comprise: (a) is for calculating number from the sequence label of this first interested chromosome and at least one normalization chromosome sequence to determine the programmed instruction of chromosome dosage; (b) for using the second method from the programmed instruction of this chromosome Rapid Dose Calculation fetus fractional value.
In certain embodiments, this system comprises the programmed instruction for calculating normalized chromosome value (NCV) further, wherein comprise the programmed instruction for using this normalized chromosome value for the programmed instruction of the second method, and the average of this chromosome dosage to the corresponding chromosome dosage in one group of qualified samples is associated by the programmed instruction wherein for this NCV, as:
NCV iA = R iA - R ιU ‾ σ iU
Wherein and σ iUthe estimation average for i-th chromosome dosage in this group qualified samples and standard deviation respectively, and R iAthe chromosome dosage calculated for interested chromosome.In different embodiments, comprise the programmed instruction for using the information from allele one or more polymorphisms unbalanced represented in the fetus of this parent test sample and maternal nucleic acids to calculate the first fetus fractional value for the programmed instruction of this first method.
According to different embodiments, the programmed instruction for the second method calculating fetus fractional value comprises for the programmed instruction to following formula evaluation:
ff=2×|NCV iACV iU|
Wherein ff is the second fetus fractional value, NCV iAthe normalized chromosome value in an influenced sample on i-th chromosome, and CV iUit is the coefficient of variation of the interested chromosomal dosage determined in these qualified samples.
According to different embodiments, this system comprises further: (i) is for determining that the variation of this copy number is the programmed instruction caused by a kind of part aneuploidy or a chimera; If (ii) caused by a kind of part aneuploidy for the variation of this copy number, then determine the programmed instruction of the locus of this part aneuploidy on this first interested chromosome, these programmed instruction these programmed instruction be configured in for this first fetus fractional value and this second fetus fractional value are compared wherein in (i) and (ii) determine this first fetus fractional value and this second fetus fractional value not approximately equal time perform.
In certain embodiments, the programmed instruction for the first interested chromosomal sequence label being divided into nucleic acid data box in the first interested chromosome or matrix is comprised for the programmed instruction of the locus determining the part aneuploidy on the first interested chromosome; With the programmed instruction for counting these map tags in each data box.
In certain embodiments, be provided for identifying the method that cancer exists and/or risk of cancer increases in mammal (such as the mankind), wherein these methods comprise: (a) provides the sequence reads from the nucleic acid in a described mammiferous test sample, wherein said test sample can comprise from cancer cell or precancerous cell genomic nucleic acids with from the genomic nucleic acids forming (germline) cell, wherein these sequence reads provide in electronic format, b () uses a calculation element these sequence reads and one or more chromosome reference sequences to be compared, and provide the multiple sequence labels corresponding with these sequence reads thus, c () identifies from one or more known amplification or disappearance and the interested chromosome that joins of related to cancer or known amplification in the mode calculated or lacks and the number of the fetus of interested chromosome segment that related to cancer joins and the sequence label of maternal nucleic acids, wherein said chromosome or chromosome segment are selected from chromosome 1 to 22, X and Y and its section, and the number of the sequence label at least one the normalization chromosome sequence of each in this or these interested chromosome or interested chromosome segment or normalization chromosome segment sequence is identified in the mode calculated, wherein for the number of each sequence label identified in this or these interested chromosome or interested chromosome segment at least about 2, 000, or at least about 5, 000, or at least about 10, 000, d () uses for the number of each the described sequence label identified in described one or more interested chromosome or interested chromosome segment and the number for each the described sequence label identified in described normalization chromosome sequence or normalization chromosome segment sequence, calculate for each monosome in described one or more interested chromosome or interested chromosome segment or section dosage in the mode calculated, and (e) use described calculation element by the described monosome dosage for each in one or more interested chromosome or interested chromosome segment each with compare for each respective threshold in described one or more interested chromosome or interested chromosome segment, and determine presence or absence aneuploidy thus in described sample, wherein said aneuploidy exists and/or describedly increases instruction for each the sequence label number identified in this or these interested chromosome or interested chromosome segment and there is cancer and/or risk of cancer increases.In certain embodiments, risk increase compares with the same experimenter of different time (such as early stage), compare with reference group's (such as optionally adjusting for sex and/or race and/or age etc.), compare etc. with the similar experimenter without certain risk factor.In certain embodiments, interested chromosome or interested chromosome segment comprise amplification and/or lack known to cancer (such as described by this) related whole chromosome.In certain embodiments, interested chromosome or interested chromosome segment comprise amplification or lack the known chromosome segment joined with one or more related to cancer.In certain embodiments, chromosome segment comprises whole chromosome arm (such as described by this) in fact.In certain embodiments, chromosome segment comprises whole chromosome aneuploidy.In certain embodiments, whole chromosome aneuploidy comprises loss, and in certain other embodiments, whole chromosome aneuploidy comprises acquisition (acquisition such as shown in table 1 or loss).In certain embodiments, interested chromosome segment is the fragment of essence upper arm level, comprises galianconism any one or more in chromosome 1 to 22, X and Y or long-armed.In certain embodiments, aneuploidy comprises the amplification of the horizontal fragment of chromosomal essence arm or the disappearance of the horizontal fragment of chromosomal essence arm.In certain embodiments, interested chromosome segment comprises in fact the one or more arms being selected from lower group, and this group is made up of the following: 1q, 3q, 4p, 4q, 5p, 5q, 6p, 6q, 7p, 7q, 8p, 8q, 9p, 9q, 10p, 10q, 12p, 12q, 13q, 14q, 16p, 17p, 17q, 18p, 18q, 19p, 19q, 20p, 20q, 21q and/or 22q.In certain embodiments, aneuploidy comprises the amplification of the one or more arms being selected from lower group, and this group is made up of the following: 1q, 3q, 4p, 4q, 5p, 5q, 6p, 6q, 7p, 7q, 8p, 8q, 9p, 9q, 10p, 10q, 12p, 12q, 13q, 14q, 16p, 17p, 17q, 18p, 18q, 19p, 19q, 20p, 20q, 21q, 22q.In certain embodiments, aneuploidy comprises the disappearance of the one or more arms being selected from lower group, and this group is made up of the following: 1p, 3p, 4p, 4q, 5q, 6q, 8p, 8q, 9p, 9q, 10p, 10q, 11p, 11q, 13q, 14q, 15q, 16q, 17p, 17q, 18p, 18q, 19p, 19q, 22q.In certain embodiments, interested chromosome segment is the fragment comprising table 3 and/or table 5 and/or table 4 and/or the region shown in table 6 and/or gene.In certain embodiments, aneuploidy comprises the amplification of the region shown in table 3 and/or table 5 and/or gene.In certain embodiments, aneuploidy comprises the disappearance of the region shown in table 4 and/or 6 and/or gene.In certain embodiments, interested chromosome segment is the known fragment containing one or more oncogenes and/or one or more tumor suppressor genes.In certain embodiments, aneuploidy comprises the amplification in the one or more regions being selected from lower group, and this group is made up of the following: 20Q13,19q12,1q21-1q23,8p11-p12 and ErbB2.In certain embodiments, aneuploidy comprises one or more amplification comprising the region of the gene being selected from lower group, and this group is made up of the following: MYC, ERBB2 (EFGR), CCND1 (cycle element D1), FGFR1, FGFR2, HRAS, KRAS, MYB, MDM2, CCNE, KRAS, MET, ERBB1, CDK4, MYCB, ERBB2, AKT2, MDM2 and CDK4 etc.In certain embodiments, cancer is selected from the cancer of lower group, and this group is made up of the following: leukaemia, ALL, the cancer of the brain, breast cancer, colorectal cancer, dedifferentiated liposarcoma, esophageal adenocarcinoma, esophageal squamous cell carcinoma, GIST, glioma, HCC, hepatocellular cancer, lung cancer, lung NSC, lung SC, medulloblastoma, melanoma, MPD, myeloproliferative disorders, cervix cancer, oophoroma, prostate cancer and kidney.In certain embodiments, biological sample comprises the sample being selected from lower group, and this group is made up of the following: whole blood, clot, saliva/saliva, urine, biopsy, liquor pleurae, pericardial fluid, brains liquid and peritoneal fluid.In certain embodiments, chromosome reference sequences has multiple region be excluded, and these regions be excluded are present in chromosome natively but they do not affect the number of its sequence label for any chromosome or chromosome segment.In certain embodiments, the method comprises the reading determining whether to pay attention to further and compares with a site on a chromosome reference sequences, and has previously carried out comparison at another reading of this site; And determine whether that among the number that the reading of this being paid attention to is included in for the sequence label of an interested chromosome or an interested chromosome segment, wherein two determination operations all perform with this calculation element.In different embodiments, the method comprises at least temporary transient sequence information stored in a kind of computer-readable media (such as non-transitory media) for nucleic acid described in described sample further.In certain embodiments, step (d) comprise in interested section select one ratio calculating the number of the sequence label that section dosage identifies as the number of the sequence label identified for this selected interested section and at least one normalization chromosome sequence corresponding or the normalization chromosome segment sequence of the interested section selected for this in the mode calculated.In certain embodiments, described one or more interested chromosome segment comprises at least 5 or at least 10 or at least 15 or at least 20 or at least 50 or at least 100 different interested sections.In certain embodiments, at least 5 or at least 10 or at least 15 or at least 20 or at least 50 or at least 100 different aneuploidy are detected.In certain embodiments, at least one normalization chromosome sequence comprises one or more chromosomes being selected from lower group, and this group is made up of chromosome 1 to 22, X and Y.In certain embodiments, for each section, at least one normalization chromosome sequence described comprises the chromosome corresponding with the chromosome that described section is positioned at.In certain embodiments, for each section, at least one normalization chromosome sequence described comprises the chromosome segment corresponding with the chromosome segment be just normalized.In certain embodiments, at least one normalization chromosome sequence or normalization chromosome segment sequence be for a kind of interested chromosome of being associated or section a chromosome selecting or section, this carries out in the following manner, that is: (i) identifies the multiple qualified samples for this interested section; (ii) use multiple potential normalization chromosome sequence or normalization chromosome segment sequence come for this chromosome double counting chromosome dosage selected; And (iii) individually or in one combination this normalization chromosome segment sequence is selected, thus provide minimum variability and/or maximum resolvability in calculated chromosome dosage.In certain embodiments, the method comprises the normalized section value (NSV) of calculating further, and wherein as described in this, described section dosage is associated with the average of respective section dosage in one group of qualified samples by described NSV.In certain embodiments, normalization sector sequence is a single section any one or more in chromosome 1 to 22, X and Y.In certain embodiments, normalization sector sequence is one group of section any one or more in chromosome 1 to 22, X and Y.In certain embodiments, normalization sector sequence comprises arms any one or more in chromosome 1 to 22, X and Y in fact.In certain embodiments, the method comprises checking order at least partially in the described nucleic acid molecules of described test sample further, to obtain described sequence information.In certain embodiments, order-checking comprises and checks order to provide sequence information to the Cell-free DNA carrying out test sample.In certain embodiments, order-checking comprises and checks order to provide sequence information to the cell DNA carrying out test sample.In certain embodiments, order-checking comprises extensive parallel order-checking.In certain embodiments, (these) method of being somebody's turn to do is included in further to provide in the patient medical record card of the human experimenter of test sample and automatically records as presence or absence one aneuploidy determined in (d), and wherein this record makes purpose processor to perform.In certain embodiments, record is included in a kind of computer-readable media and records chromosome dosage and/or the diagnosis based on described chromosome dosage.In different embodiments, patient medical record card is preserved by laboratory, doctor's office, hospital, HMO, insurance company or IMR's card website.In certain embodiments, determine that aneuploidy described in presence or absence and/or number comprise a kind of for the factor of in the antidiastole of cancer.In certain embodiments, the detection of aneuploidy instruction positive findings, and described method comprises further and prescribes to the human experimenter getting test sample, start treatment and/or change treatment.In certain embodiments, prescribe to the human experimenter getting test sample, start treatment and/or change to treat to comprise and prescribe and/or perform diagnosis further with existence and/or the order of severity of determining cancer.In certain embodiments, diagnosis comprises for cancer biomarkers thing further, screens the sample from described experimenter, and/or for cancer, carries out imaging to described experimenter.In certain embodiments, when described method indicate there is neoplastic cell in described mammal time, treat described mammal or described mammal is treated, to remove described neoplastic cell and/or to suppress growth or the propagation of described neoplastic cell.In certain embodiments, treat mammal to comprise by superfluous natural disposition (such as tumour) cell of operation removing.In certain embodiments, treatment mammal comprises described mammal execution radiotherapy or makes described mammal perform radiotherapy, to kill neoplastic cell.In certain embodiments, treat mammal to comprise and give or make described mammal to be given anticarcinogen (such as horse trastuzumab (matuzumab), Erbitux (erbitux), Wei Ke replaces than (vectibix), Buddhist nun's trastuzumab (nimotuzumab), horse trastuzumab, Victibix (panitumumab), fluorouracil (flourouracil), capecitabine (capecitabine), 5-trifluoromethyl-2 '-BrdU (5-trifluoromethy1-2 '-deoxyuridine), methotrexate (MTX) (methotrexate), Raltitrexed (raltitrexed), pemetrexed (pemetrexed), cytarabine (cytosine arabinoside), Ismipur (6-mercaptopurine), imuran (azathioprine), 6-thioguanine (6-thioguanine), Pentostatin (pentostatin), fludarabine (fludarabine), Cladribine (cladribine), floxuridine (FUDR) (floxuridine), endoxan (cyclophosphamide), knob sand (neosar), ifosfamide (ifosfamide), thiotepa (thiotepa), two (2-the chloroethyl)-1-nitroso ureas of 1,3-, 1-(2-chloroethyl)-3-cyclohexyl-1-nitroso ureas, hemel (hexamethylmelamine), busulfan (busulfan), procarbazine (procarbazine), dacarbazine (dacarbazine), Chlorambucil (chlorambucil), melphalan (melphalan), cis-platinum (cisplatin), NSC-241240 (carboplatin), oxaliplatin (oxaliplatin), bendamustine (bendamustine), BCNU (carmustine), mustargen (chloromethine), dacarbazine, Fotemustine (fotemustine), lomustine (lomustine), mannosulfan (mannosulfan), Nedaplatin (nedaplatin), Nimustine (nimustine), prednimustine (prednimustine), Ranimustine (ranimustine), Satraplatin (satraplatin), Semustine (semustine), streptozotocin (streptozocin), Temozolomide (temozolomide), Treosulfan (treosulfan), triethyleneiminobenzoquinone (triaziquone), triethylenemelamine (triethylene melamine), thiotepa (thiotepa), four nitric acid three platinum (triplatin tetranitrate), trofosfamide (trofosfamide), uracil mastard (uramustine), little red mould (doxorubicin), daunomycin (daunorubicin), mitoxantrone (mitoxantrone), Etoposide (etoposide), Hycamtin (topotecan), Teniposide (teniposide), Irinotecan (irinotecan), Ka Motuosha (camptosar), camptothecine (camptothecin), Belotecan (belotecan), rubitecan (rubitecan), vincristine (vincristine), vinblastine (vinblastine), vinorelbine (vinorelbine), eldisine (vindesine), taxol (paclitaxel), Docetaxel (docetaxel), Ah cloth Kern (abraxane), Ipsapirone (ixabepilone), La Ruotaxi (larotaxel), Ao Tataxi (ortataxel), Te Saitaxi (tesetaxel), vinflunine (vinflunine), imatinib mesylate (imatinib mesylate), Sunitinib malate (sunitinib malate), Sorafenib Tosylate (sorafenib tosylate), AMN107 hydrochloride monohydrate/, Ta Sina (tasigna), Sai Makeni (semaxanib), ZD6474 (vandetanib), PTK787 (vatalanib), retinoic acid (retinoic acid), retinoic acid derivatives etc.).
In another embodiment, a kind of computer program for determining that in mammal cancer exists and/or risk of cancer increases is provided.This computer program typically comprises: (a) is for providing the code of the sequence reads from the nucleic acid in a described mammiferous test sample, wherein said test sample can comprise from cancer cell or precancerous cell genomic nucleic acids with from the genomic nucleic acids forming (germline) cell, wherein these sequence reads provide in electronic format, b () uses a calculation element to be used for being compared with one or more chromosome reference sequences by these sequence reads and providing the code of the multiple sequence labels corresponding with these sequence reads thus, (c) in the mode calculated for from one or more known amplifications or disappearance and the interested chromosome that joins of related to cancer or known amplification or to lack and interested chromosome segment that related to cancer joins identifies the number of the sequence label from fetus and maternal nucleic acids, wherein said chromosome or chromosome segment are selected from chromosome 1 to 22, X and Y and its section, and several destination codes of the sequence label of at least one normalization chromosome sequence of each in this or these interested chromosome or interested chromosome segment or normalization chromosome segment sequence are identified in the mode calculated, number wherein for each sequence label identified in this or these interested chromosome or interested chromosome segment is at least about 10, 000, d () uses for the number of each the described sequence label identified in described one or more interested chromosome or interested chromosome segment and the number for each the described sequence label identified in described normalization chromosome sequence or normalization chromosome segment sequence, calculate the code for each monosome in described one or more interested chromosome or interested chromosome segment or section dosage in the mode calculated, and (e) uses described calculation element each respective threshold in each and described one or more interested chromosome of the described monosome dosage for each in one or more interested chromosome or interested chromosome segment or interested chromosome segment compared and determine the code of presence or absence aneuploidy thus in described sample, wherein said aneuploidy exists and/or describedly increases instruction cancer for each the sequence label number identified in this or these interested chromosome or interested chromosome segment and to exist and/or risk of cancer increases.In different embodiments, code is provided for the instruction of the diagnostic method performed as above as described in (with hereafter).
The method of Therapeutic cancer experimenter is also provided.In certain embodiments, these methods comprise execution a kind of method for identifying that in mammal cancer exists and/or risk of cancer increases as described herein, the method use from experimenter a sample or receive result to these class methods that this sample performs; And when the method is individually or with combined from other indexs one or more of a kind of antidiastole for cancer and when showing to there is neoplastic cell in described experimenter, treatment experimenter, or experimenter is treated, to remove growth or the propagation of neoplastic cell and/or suppression neoplastic cell.In certain embodiments, treat described experimenter to comprise by operation removing cell.In certain embodiments, treat experimenter to comprise and perform radiotherapy to experimenter or make execution radiotherapy, to kill described neoplastic cell.In certain embodiments, treat experimenter to comprise and give or make experimenter to be given anticarcinogen (such as horse trastuzumab, Erbitux, Wei Ke for than, Buddhist nun's trastuzumab, horse trastuzumab, Victibix, fluorouracil, capecitabine, 5-trifluoromethyl-2 '-BrdU, methotrexate (MTX), Raltitrexed, pemetrexed, cytarabine, Ismipur, imuran, 6-thioguanine, Pentostatin, fludarabine, Cladribine, floxuridine (FUDR), endoxan, knob is husky, ifosfamide, thiotepa, two (2-the chloroethyl)-1-nitroso ureas of 1,3-, 1-(2-chloroethyl)-3-cyclohexyl-1-nitroso ureas, hemel, busulfan, procarbazine, dacarbazine, Chlorambucil, melphalan, cis-platinum, NSC-241240, oxaliplatin, bendamustine, BCNU, mustargen, dacarbazine, Fotemustine, lomustine, mannosulfan, Nedaplatin, Nimustine, prednimustine, Ranimustine, Satraplatin, Semustine, streptozotocin, Temozolomide, Treosulfan, triethyleneiminobenzoquinone, triethylenemelamine, thiotepa, four nitric acid three platinum, trofosfamide, uracil mastard, little red mould, daunomycin, mitoxantrone, Etoposide, Hycamtin, Teniposide, Irinotecan, Ka Motuosha, camptothecine, Belotecan, rubitecan, vincristine, vinblastine, vinorelbine, eldisine, taxol, Docetaxel, Ah cloth Kern, Ipsapirone, La Ruotaxi, Ao Tataxi, Te Saitaxi, vinflunine, imatinib mesylate, Sunitinib malate, Sorafenib Tosylate, AMN107 hydrochloride monohydrate/, Ta Sina, Sai Makeni, ZD6474, PTK787, retinoic acid, retinoic acid derivatives etc.).
The method of the treatment of monitoring oncological patients is also provided.In different embodiments, these methods comprise before the treatment or treatments period performs a kind of for identifying the method that cancer exists and/or risk of cancer increases or the result received these class methods that this sample performs in mammal as described herein to a sample from experimenter; And after the time slightly late or treatment of treatments period, the method performed again to second sample from experimenter or receive the result to these class methods that this second sample performs; Wherein second time is measured the number of aneuploidy in (such as with first time measures compare) or the order of severity and is reduced (such as aneuploidy frequency reduces and/or some aneuploidy reduces or do not exist) and indicate the positive course for the treatment of and measure the number of (such as with measure for the first time compare) middle aneuploidy for the second time or the order of severity is identical or increase and indicate feminine gender course for the treatment of, and when described instruction is negative, described therapeutic scheme is adjusted to and has more invasive therapeutic scheme and/or palliative therapy scheme.
Also be provided in the method for the mark determining fetal nucleic acid in the maternal sample of the potpourri comprising fetus and maternal nucleic acids.In one embodiment, described for determining that in a maternal sample method of fetus mark comprises: (a) receives the sequence reads from the fetus in this parent test sample and maternal nucleic acids, b these sequence reads and one or more chromosome reference sequences are compared by (), and provide the multiple sequence labels corresponding with these sequence reads thus, c () is identified to come from and is selected from chromosome 1 to 22, a number of the one or more interested chromosome of X and Y and its section or those sequence labels of interested chromosome segment, and a number of those sequence labels from least one normalization chromosome sequence or normalization chromosome segment sequence is identified for each in this or these interested chromosome or interested chromosome segment, to determine a chromosome dosage or chromosome segment dosage, wherein, described one or more interested chromosome or interested chromosome segment have copy number variation, (d) to use and the described copy number identified in step (c) makes a variation corresponding described chromosome dosage or chromosome segment dosage to determine described fetus mark.In some embodiments, described copy number variation is by the dosage of each chromosome in described one or more interested chromosome or interested chromosome segment or chromosome segment and a respective threshold for each chromosome in described one or more interested chromosome or interested chromosome segment or chromosome segment being compared, determining.Copy number variation can be selected from lower group, and this group is made up of the following: complete chromosome copies, complete chromosome lacks, partial replication, partly multiplication, partial insertion and excalation.
In certain embodiments, the ratio of the number of the sequence label that the number that the chromosome in step (c) or section Rapid Dose Calculation are the sequence label identified for described selected interested chromosome or section identifies at least one corresponding normalization chromosome sequence or the normalization chromosome segment sequence for selected interested chromosome or section.In some embodiments, the chromosome in step (c) or section Rapid Dose Calculation are sequence label density ratio and each described selected interested chromosome of described selected interested chromosome or section or the ratio of at least one corresponding normalization chromosome sequence of section or the sequence label density ratio of normalization chromosome segment sequence.
In certain embodiments, the method comprises further and calculates a normalization chromosome value (NCV), wherein calculates this NCV and this chromosome dosage is associated to the mean value of the corresponding chromosome dosage in one group of qualified samples, as:
NCV iA = R iA - R ιU ‾ σ iU
Wherein and σ iUthe estimation mean value for i-th chromosome dosage in this group qualified samples and standard deviation accordingly, and R iAbe the chromosome dosage calculated for i-th chromosome in test sample, wherein said i-th chromosome is described interested chromosome.Then fetus mark is determined according to following formula:
ff=2×|NCV iACV iU|
Wherein ff is fetus fractional value, NCV iAthe normalized chromosome value in an influenced sample on i-th chromosome, and CV iUbe the coefficient of variation of i-th the chromosomal dosage determined in described qualified samples, wherein said i-th chromosome is described interested chromosome.
In certain embodiments, this fetus mark uses a normalization section value (NSV) to determine, wherein this NSV makes this chromosome segment dosage associate to the mean value of the corresponding chromosome segment dosage in one group of qualified samples, as:
NSV iA = R iA - R ιU ‾ σ iU
Wherein and σ iUthe estimation mean value for i-th chromosome segment dosage in this group qualified samples and standard deviation accordingly, and R iAbe the chromosome segment dosage calculated for i-th chromosome segment in test sample, wherein said i-th chromosome segment is described interested chromosome segment.Then fetus mark is determined according to following formula:
ff=2×|NSV iACV iU|
Wherein ff is fetus fractional value, NSV iAthe normalized chromosomal region segment value in an influenced sample on i-th chromosome segment, and CV iUbe the coefficient of variation of the dosage of i-th chromosome segment determined in described qualified samples, wherein said i-th chromosome segment is described interested chromosome segment.
In certain embodiments, described interested chromosome is any one chromosome of the X chromosome of chromosome 1-22 or male fetus, and described interested chromosome segment is selected from the X chromosome of chromosome 1-22 or male fetus.
In certain embodiments, for determine this at least one normalization chromosome sequences of multiple embodiments of the method for fetus mark or normalization chromosome segment sequence be for a kind of interested chromosome of being associated or section a chromosome selecting or section, this carries out in the following manner, that is: (i) identifies the multiple qualified samples for this interested chromosome or section; (ii) use multiple potential normalization chromosome sequence or normalization chromosome segment sequence come for this chromosome selected or section double counting chromosome dosage or chromosome segment dosage; And (iii) individually or in one combination this normalization chromosome sequence or normalization chromosome segment sequence are selected, thus provide minimum variability or maximum resolvability in calculated chromosome dosage or chromosome segment dosage.Normalization chromosome sequence can be a monosome any one or more in chromosome 1 to 22, X and Y.Alternately, normalization chromosome sequence can be that in chromosome 1 to 22, X and Y, any chromosomal group chromosome is same, and normalization sector sequence can be a single section any one or more in chromosome 1 to 22, X and Y.Alternately, normalization sector sequence can be one group of section any one or more in chromosome 1 to 22, X and Y.
In certain embodiments, described determine the method for fetus mark can also comprise by as described in the fetus mark that obtains compare with the determined fetus mark of information that can use from representing allele one or more polymorphisms unbalanced that this parent is tested in these fetuses of sample and maternal nucleic acids.For determining that the unbalanced method of allele is described in other places of the application, and the polymorphic difference (including but not limited to the difference detected in SNP or STR sequence) comprised between use fetus and maternal gene group determines fetus mark.
In certain embodiments, the method comprises at least temporarily storage sequence reading further.
Provide a kind of by the additional method of the copy number variation classification in Fetal genome.This extra method comprises: (a) obtains the sequence reads from the fetus in a parent test sample and maternal nucleic acids; B these sequence reads and one or more chromosome reference sequences are compared by (), and provide the multiple sequence labels corresponding with these sequence reads thus; C () identifies the number from one or more interested these sequence labels chromosomal, and determine that a first interested chromosome in this fetus makes a variation with a kind of copy number; D () calculates a first fetus fractional value by a kind of first method, this first method does not use the information from these first interested these labels chromosomal; E () calculates a second fetus fractional value by a kind of second method, this second method uses the information from these labels of this first chromosome; And (f) this first fetus fractional value compared with this second fetus fractional value and use this to compare and this copy number of this first chromosome made a variation and classify.
In certain embodiments, the first method calculating fetus fractional value described in the step (d) of as extra in this method comprises: use information from allele one or more polymorphisms unbalanced represented in this parent test fetus of sample and maternal nucleic acids to calculate this first fetus fractional value; The second method calculating fetus fractional value described in the step (e) of method as extra in this comprises: (a) calculates number from the sequence label of this first interested chromosome and at least one normalization chromosome sequence to determine chromosome dosage; And (b) use this second method from this fetus fractional value of this chromosome Rapid Dose Calculation.
In certain embodiments, the information that this first method uses comprises by carrying out the sequence label obtained that checks order to predetermined polymorphic sequence, each of described polymorphic sequence comprises described one or more polymorphic site.in certain embodiments, the information that this first method uses is obtained by non-sequence measurement, such as, obtained by methods such as qPCR, digital pcr, mass spectroscopy or capillary gel electrophoresises.
In certain embodiments, this first method comprises the chromosome or this first fetus fractional value of tag computation of chromosome segment that use and come from and do not have copy number variation.For example, when this first interested chromosome is chromosome 21, can compare using the sequence label determined fetus mark coming from chromosome 21 with according to the determined fetus mark of sequence label coming from the chromosome x in male fetus.Knownly occur or determined by any method described here not to be that any chromosome of aneuploid (such as by calculating itself NCV or NSV to determine) or chromosome segment may be used to determine the first fetus mark with aneuploid state.
In certain embodiments, the ratio of the number of the chromosome that in step (e), the second method is determined or the sequence label that the number that section Rapid Dose Calculation is the sequence label identified for described selected interested chromosome or section identifies at least one corresponding normalization chromosome sequence or the normalization chromosome segment sequence for selected interested chromosome or section.In certain embodiments, the described chromosome dosage determined in step (e) or section Rapid Dose Calculation are sequence label density ratio and each described selected interested chromosome of described selected interested chromosome or section or the ratio of at least one corresponding normalization chromosome sequence of section or the sequence label density ratio of normalization chromosome segment sequence.
Some embodiment of this extra method comprises the normalized chromosome value (NCV) of calculating one further, wherein this second method uses this normalized chromosome value, and wherein calculate this NCV the average of this chromosome dosage to the corresponding chromosome dosage in one group of qualified samples is associated, as:
NCV iA = R iA - R ιU ‾ σ iU
Wherein and σ iUthe estimation mean value for i-th chromosome dosage in this group qualified samples and standard deviation accordingly, and R iAbe the chromosome dosage calculated for i-th chromosome in test sample, wherein said i-th chromosome is described interested chromosome.
In certain embodiments, this second method calculating this fetus fractional value comprises following formula evaluation:
ff=2×|NCV iACV iU|
Wherein ff is fetus fractional value, NSV iAthe normalized chromosome value in an influenced sample or test sample on i-th chromosome, and CV iUbe the coefficient of variation of i-th the chromosomal dosage determined in described qualified samples, wherein said i-th chromosome is described interested chromosome.
In certain embodiments, first method of described calculating fetus mark comprises (a) and calculates from the sequence label number of described non-described first chromosomal chromosome interested with at least one normalization chromosome sequence, to determine this non-described first chromosomal chromosome dosage interested; And (b) by this first method from this first fetus fractional value of this chromosome Rapid Dose Calculation; Described second method comprises: (a) calculates sequence label number from this first interested chromosome and at least one normalization chromosome sequence to determine a chromosome dosage; And (b) by this second method from this second fetus fractional value of this chromosome Rapid Dose Calculation.
Preferably, the ratio of the number of the sequence label that the number that chromosome or section Rapid Dose Calculation are the sequence label identified for described selected interested chromosome or section identifies at least one corresponding normalization chromosome sequence or the normalization chromosome segment sequence for selected interested chromosome or section; Or chromosome dosage or section Rapid Dose Calculation are sequence label density ratio and each described selected interested chromosome of described selected interested chromosome or section or the ratio of at least one corresponding normalization chromosome sequence of section or the sequence label density ratio of normalization chromosome segment sequence.
Preferably, this extra method being used for classifying copy number variation also comprises and calculates corresponding normalization chromosome value (NCV), and the first method and the second method use corresponding NCV.Calculate NCV the mean value of the chromosome dosage determined to the corresponding chromosome dosage in one group of qualified samples is associated, as:
NCV iA = R iA - R ιU ‾ σ iU
Wherein and σ iUthe estimation mean value for i-th chromosomal dosage in this group qualified samples and standard deviation respectively, and R iAi-th chromosomal dosage in the test sample calculated.First method and the second method can use NCV to calculate fetus mark, by following formula evaluation:
ff=2×|NCV iACV iU|
Wherein ff is fetus fractional value, NCV iAthe normalized chromosome value in described test sample on i-th chromosome, and CV iUit is the coefficient of variation of i-th chromosomal dosage in described qualified samples.In above-mentioned formula, for first method, described i-th chromosome is not described first interested chromosome; For for this second method, described i-th chromosome is described first interested chromosome.
This first interested chromosome is selected from lower group, and this group is made up of chromosome 1 to 22, X and Y.Described non-described first chromosomal chromosome interested can be chromosome 1 to 22 any one, or be X chromosome when fetus is the male sex.
In certain embodiments, step (f) comprises and determines these two fetus fractional values whether approximately equal.In certain embodiments, step (f) comprises further: determine that when this two fetus fractional value approximately equals a kind of ploidy hypothesis implied in this second method is real.This ploidy hypothesis implied in second method can be that this first interested chromosome has a kind of complete chromosome aneuploidy.Such as, the first interested chromosomal complete chromosome aneuploidy is a kind of monosomy or a kind of trisomy.
In certain embodiments, additional method for copy number variation of classifying comprises a step (g) further: analyze this first interested this label information chromosomal, with determine whether (i) first interested chromosome with a kind of part aneuploidy, or (ii) these two fetus fractional values not approximately equal time, this fetus is a chimera.
In certain embodiments, wherein said first method comprises the information that uses from allele one or more polymorphisms unbalanced represented in this parent test fetus of sample and maternal nucleic acids to calculate this first fetus fractional value, and described polymorphism is present in non-described first chromosomal chromosome interested; Comprise the information that uses from allele one or more polymorphisms unbalanced represented in this parent test fetus of sample and maternal nucleic acids to calculate this second fetus fractional value with described second method, described polymorphism is present in described first interested chromosome.Step (f) for comparing can comprise: determine that described first interested chromosome is dliploid when the ratio of described second fetus fractional value and the first fetus fractional value is approximately 1; Determine that described first interested chromosome is triploid when the ratio of described second fetus fractional value and the first fetus fractional value is approximately 1.5; With, determine that described first interested chromosome is monoploid when the ratio of described second fetus fractional value and the first fetus fractional value is approximately 0.5.The ratio that can further include when the second fetus fractional value and the first fetus fractional value for the additional method of copy number variation of classifying is not when being approximately 1,1.5 or 0.5, analyze the step (g) of described first interested chromosomal label information, with determine whether (i) first interested chromosome with a kind of part aneuploidy, or (ii) this fetus is a chimera.
In certain embodiments, the information utilizing the first method of polymorphism and the second method to use comprises by carrying out the sequence label obtained that checks order to predetermined polymorphic sequence, and each of described polymorphic sequence comprises described one or more polymorphic site.Or the information utilizing the first method of polymorphism and the second method to use is not obtained by sequence measurement, such as, obtained by the non-sequence measurement such as qPCR, digital pcr, mass spectroscopy or capillary gel electrophoresis.
In certain embodiments, the step (g) analyzing the first interested chromosomal label information comprising: this first interested this sequence chromosomal is cased into multiple part by (a); (b) determine in described part any one whether comprise the nucleic acid significantly more or significantly more less than other parts one or more; Further, (c) compared with other parts one or more, if when any one of described part contains significantly more or significantly less nucleic acid, determine that this first interested chromosome is with a kind of part aneuploidy; Or compared with other parts one or more, if when described part does not all comprise significantly more or significantly less nucleic acid, determine that this fetus is a chimera.Therefore, this extra method may further include and determines that the first interested chromosomal part comprising the nucleic acid significantly more or significantly more less than other parts one or more is with part aneuploidy.
Step (f) for the method for classifying is carried out in copy number variation comprises the variation of this copy number is categorized into the classification being selected from lower group, and this group is made up of the following: complete chromosome copy or double, complete chromosome disappearance, chromosome dyad copy and chromosome dyad disappearance and chimera.
Determine that in the first fetus fractional value and the not approximately equalised embodiment of the second fetus fractional value, the method comprises further in the step (f) the first fetus fractional value and the second fetus fractional value compared:
I () determines that the variation of this copy number is caused by part aneuploidy or chimera; And
(ii) when the variation of this copy number is caused by part aneuploidy, the locus of the part aneuploidy on this first interested chromosome is determined.
In certain embodiments, the locus of the part aneuploidy determined on this first interested chromosome comprises and these first interested these sequence labels chromosomal is divided into nucleic acid case in this first interested chromosome or matrix; And these map tags in each case are counted.
In certain embodiments, in (b), the step of comparison comprises comparison at least about 100 ten thousand readings.
Any method described here can comprise further to parent test sample in fetus and maternal nucleic acids (such as Cell-free DNA) check order to obtain sequence reads.Parent from parent test sample is checked order to produce sequence reads with fetal nucleic acid and comprises extensive parallel order-checking.In certain embodiments, extensive parallel order-checking is synthetic method order-checking.Synthetic method order-checking can use reversible dye-terminators to realize.In other embodiments, extensive parallel order-checking is connection method order-checking.In other other embodiments, extensive parallel order-checking is single-molecule sequencing.
Can according to method described here for determining that the maternal sample of fetus mark comprises blood, blood plasma, serum or urine samples.In certain embodiments, maternal sample is plasma sample.In other embodiments, maternal sample is whole blood sample.
Additionally provide multiple different equipment, comprise the equipment for carrying out medical analysis (such as maternal sample) to sample, and these equipment multiple steps to perform the above method, such as individually for determining that copy number makes a variation, for determining fetus mark, or for copy number variation is classified.
Additionally provide kit, these kits comprise can individually or with for determine in two genomes one on derive from these two genomic nucleic acid potpourri impact (the fetus mark in such as maternal sample) Combination of Methods in for determining the reagent that copy number makes a variation.These kits can be combined with equipment described here.
Although these these examples relate to the mankind and these wording mainly for human problem, concept described here is also applicable to the genome from any plant or animal.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of method 100, and the method is used for determining that presence or absence copy number makes a variation in the test sample of potpourri comprising nucleic acid.
Fig. 2 describes to prepare the technological process of sequencing library according to do not abridge scheme, simple scheme (ABB), two-step approach and single stage method of Yi Luna as described in this." P " represents purification step; And " X " instruction does not comprise purification step and/or DNA repairs.
Fig. 3 describes the technological process of the embodiment being used for the method preparing sequencing library on a solid surface.
Fig. 4 shows the process flow diagram of an embodiment 400 of the method for the integrality for verifying the sample carrying out multistep single channel order-checking biological test.
Fig. 5 shows the process flow diagram of an embodiment 500 of the method for the integrality for verifying the multiple samples carrying out multistep multiple order-checking biological test.
Fig. 6 is the process flow diagram for testing the method 600 determining presence or absence aneuploidy and fetus mark in sample at the parent of the potpourri comprising fetus and maternal nucleic acids simultaneously.
Fig. 7 uses the size of extensive parallel sequencing or polymorphic nucleotide sequence to be separated, and tests the process flow diagram of the method 700 determining fetus mark in sample at the parent of the potpourri comprising fetus and maternal nucleic acids.
Fig. 8 is the process flow diagram for determining the method 800 of presence or absence fetus aneuploidy and fetus mark in the Maternal plasma test sample of the polymorphic nucleic acid of enrichment simultaneously.
Fig. 9 tests for the parent purifying cfDNA at the polymorphic nucleic acid of enrichment the process flow diagram simultaneously determining the method 900 of presence or absence fetus aneuploidy and fetus mark in sample.
Figure 10 is for from determining the process flow diagram of the method 1000 of presence or absence fetus aneuploidy and fetus mark in the sequencing library that constructs of the fetus of the polymorphic nucleic acid of enrichment and maternal nucleic acids derived from parent test sample simultaneously.
Figure 11 summarizes by the extensive parallel order-checking shown in Fig. 7, determines the process flow diagram of the alternate embodiment of the method for fetus mark.
Figure 12 is the column diagram of the identification shown in order to the fetus and parent polymorphic sequence (SNP) determining fetus mark in the test sample.Show the sum (Y-axis) being mapped to the sequence reads of the SNP sequence identified by rs number (X-axis), and the relative content of fetal nucleic acid (*).
Figure 13 is the block diagram describing the fetus of set genomic locations and the classification of parent distribution type state.
Figure 14 shows the comparison of the result using mixture model and known fetal mark and estimation fetus mark to produce.
Figure 15 illustrates the estimation of error that the order-checking base positions by using on 30 paths with the Yi Luna GA2 data that the Eland of default parameter and human genome HG18 compares is made.
Figure 16 shows that use machine error rate can make upper inclined minimizing point as known parameters.
Figure 17 shows that use machine error rate is as known parameters, and the simulated data of intensive conditions 1 and 2 error model makes the upper of fetus mark lower than 0.2 partially considerably reduce to less than a point.
Figure 18 is the process flow diagram described by comparing the method for being classified by CNV with the fetus fractional value that two kinds of different technologies calculate.
Figure 19 tests sample and the block diagram of finally making the discrete system of diagnosis for processing.
When processing test sample, how much different operations can be processed by the different elements of system Figure 20 schematic presentation in groups.
Figure 21 A and 21B shows the electrophoretogram of the cfDNA sequencing library prepared according to the scheme (Figure 21 B) described in the simple scheme described in example 2a (Figure 21 A) and example 2b.
Figure 22 A to 22C provides displaying to work as according to simple scheme (ABB; When ◇) preparing sequencing library and when according to without repairing two-step approach (INSOL; Mean value (the n=16) (%ChrN of the total number percent of everyone chromosomal sequence label is mapped to when) preparing sequencing library; Figure 22 A) and sequence label number percent as the figure of the function (Figure 22 B) of chromosome size.The label that Figure 22 C maps when showing and use two-step approach to prepare library and the function of ratio percentage as chromosomal GC content using the label obtained when simply (ABB) legal system makes library.
Figure 23 A and 23B shows provides the average of label number percent and the column diagram of standard deviation, chromosome x (Figure 23 A that these label mappings check order obtained to 10 samples from the cfDNA to the plasma purification from 10 pregnant woman; %ChrX) with Y (Figure 23 B; %ChrY).Figure 23 A shows when using the number of tags without being mapped to X chromosome time restorative procedure (two steps) larger than the number of tags using simple method (ABB) to obtain.Figure 23 B shows to use not to be had different without being mapped to the label number percent of Y chromosome when repairing two-step approach from label number percent when using simple method (ABB).
Figure 24 shows number with reference to the upper non-excluded site (NE site) of genome (hg18) and the ratio of sum of label in non-excluded site being mapped to 5 samples each, cfDNA prepare from these samples and according in simple scheme (ABB) (solid post) of description in example 2, solution without recovery scenario (two steps; Open column) and solid surface without a recovery scenario (step; Grey post) in order to construct sequencing library.
Figure 25 A and 25B shows to work as according to simple scheme (ABB; When ◇) preparing sequencing library on a solid surface, when according to without to repair when two-step approach () prepares sequencing library and when according to mean value (the n=5) (%ChrN being mapped to the total number percent of everyone chromosomal sequence label when preparing library without reparation single stage method (Δ); Figure 25 A) and sequence label number percent as the figure of the function (Figure 25 B) of chromosome size.From according to simple scheme (ABB; ◇) and solid surface without recovery scenario (two steps; The regression coefficient of the map tags that sequencing library) prepared obtains.Figure 25 C shows from according to the sequence label of each the chromosomal mapping obtained without the sequencing library prepared of the two step schemes of reparation and the function (◇) of the ratio percentage of each the chromosomal label obtained from the sequencing library prepared according to simple scheme (ABB) as each chromosomal GC percentage composition, and from function () as each chromosomal GC percentage composition of each the chromosomal sequence of mapping label obtained according to the sequencing library prepared without the step scheme of reparation and the ratio percentage of each chromosomal label that obtains from the sequencing library prepared according to simple scheme (ABB).
Figure 26 A and 26B shows the average of label number percent and the comparison of standard deviation, and these label mappings check order chromosome x (Figure 26 A) and Y (Figure 26 B) to according to ABB method, two-step approach and single stage method obtained from 5 samples of the cfDNA to the plasma purification from 5 pregnant woman.Figure 26 A shows when using the number of tags without being mapped to X chromosome time restorative procedure (two steps and a step) larger than the number of tags using simple method (ABB) to obtain.Figure 26 B shows to use not to be had different without being mapped to the label number percent of Y chromosome when repairing two-step approach and single stage method from label number percent when using simple method.
Figure 27 A and 27B shows for 61 clinical samples (Figure 27 A) using ABB method to prepare in the solution and uses 35 study samples (Figure 27 B) prepared without reparation solid surface (SS) single stage method, the amount of the purifying cfDNA in order to prepare sequencing library is associated with the amount of gained library production.
Figure 28 shows the amount of the cfDNA in order to manufacture library and the correlativity of the amount of the library production using two steps (), ABB (◇) and a step (Δ) method to obtain.
Figure 29 shows the acquisition when use one step (open column) and two steps (solid post) prepare index library and the number percent of the index sequence reading checked order as 6 clumps (6 index sample/flow cell paths).
Figure 30 A and 30B is mapped to average (the n=42) (%ChrN of the total number percent of everyone chromosomal sequence label when being and showing and to prepare on a solid surface according to single stage method when index sequencing library and to check order as 6 clumps; Figure 30 A) and gained sequence label number percent as the figure of the function (Figure 30 B) of chromosome size.
Figure 31 shows that the sequence label number percent (ChrY) being mapped to Y chromosome is relative to the label number percent (ChrX) being mapped to X chromosome.
Figure 32 A and 32B illustrates the distribution of the chromosome dosage from the determined chromosome 21 that checks order to cfDNA, and cfDNA extracts from one group of 48 blood sample, and these samples are obtained from the human experimenter nourishing sex fetus separately.For chromosome 1-12 and X (Figure 32 A) and for chromosome 1-22 and X (Figure 32 B), be depicted as testing sample for the dosage of qualified (that is: normal for chromosome 21 (O)) chromosome 21 and trisomy 21 (Δ).
Figure 33 illustrates the distribution of the chromosome dosage from the determined chromosome 18 that checks order to cfDNA, and cfDNA extracts from one group of 48 blood sample, and these samples are obtained from the human experimenter nourishing sex fetus separately.To show for qualified (that is: normal for chromosome 18 (the O)) dosage of chromosome 18 and the test sample of trisomy 18 (Δ) for chromosome 1-22 and X (Figure 33 B) for chromosome 1-12 and X (Figure 33 A).
Figure 34 A and 34B illustrates the distribution of the chromosome dosage from the determined chromosome 13 that checks order to cfDNA, and cfDNA extracts from one group of 48 blood sample, and these samples are obtained from the human experimenter nourishing sex fetus separately.For chromosome 1-12 and X (Figure 34 A), and show for qualified (that is: normal for chromosome 13 (the O)) dosage of chromosome 13 and the test sample of trisomy 13 (Δ) for chromosome 1-22 and X (Figure 34 B).
Figure 35 A and 35B illustrates the distribution of the chromosome dosage from the determined chromosome x that checks order to cfDNA, and cfDNA extracts from one group of 48 test blood samples, and these samples are obtained from the human experimenter nourishing sex fetus separately.Show for the male sex (46, XY for chromosome 1-12 and X (Figure 35 A) for chromosome 1-22 and X (Figure 35 B); (O)), women (46, XX; (Δ)) chromosome x dosage, monosomy X (45, X; (+)), and the sample of complex karyotype (Cplx (X)).
Figure 36 A and 36B illustrates the distribution of the chromosome dosage from the determined chromosome Y that checks order to cfDNA, and cfDNA extracts from one group of 48 test blood samples, and these samples are obtained from the human experimenter nourishing sex fetus separately.Show for the male sex (46, XY for chromosome 1-12 (Figure 36 A) for chromosome 1-22 (Figure 36 B); (Δ)), women (46, XX; (O) chromosome Y dosage), monosomy X (45, X; (+)), and the sample of complex karyotype (Cplx (X)).
Figure 37 shows for from Figure 32 A and 32B, 33A and 33B, and the dosage that illustrates respectively of 34A and 34B determine chromosome 21 (■), 18 (●) and 13 (▲) the coefficient of variation (CV).
Figure 38 shows the coefficient of variation (CV) of chromosome x (■) and the Y (●) determined for the dosage illustrated respectively from Figure 35 A and 35B and 36A and 36B.
Figure 39 shows the cumulative bad distribution of the GC part of human chromosomal.Longitudinal axis representative has the chromosomal frequency of the GC content lower than the value that transverse axis illustrates.
Figure 40 illustrates the sequence dosage (Y-axis) for the section from determined chromosome 11 (81000082-103000103bp) of checking order to cfDNA, and cfDNA extracts from one group 7 qualified samples (O) obtained and 1 test sample (◆) from pregnant human experimenter.Identify the sample from an experimenter, this experimenter nourishes a fetus with a kind of part aneuploidy of chromosome 11 (◆).
Figure 41 A-41E illustrates, relative to the standard deviation of the mean value (Y-axle) of the homologue in unaffected sample, for the distribution of the normalized chromosome dosage of chromosome 21 (41A), chromosome 18 (41B), chromosome 13 (41C), chromosome x (41D) and chromosome Y (41E).
Figure 42 shows the normalization chromosome used as described in example 12, for the normalized chromosome value at the chromosome 21 (O) coming to determine in the sample in self-training group 1,18 (Δs) and 13 ().
Figure 43 shows the normalization chromosome used as described in example 12, for the normalized chromosome value at the chromosome 21 (O) coming to determine in the sample in self-test group 1,18 (Δs) and 13 ().
Figure 44 shows and uses the method for normalizing of the people such as Chiu (Zhao) (to be normalized with the number remaining the sequence label that chromosome obtains in the sample to which the number of interested chromosome institute recognition sequence label, example 13 see in other places of the application), for the normalized chromosome value of the chromosome 21 (O) coming to determine in the sample of self-test group 1 and 18 (Δs).
Figure 45 shows the normalization chromosome (as described in example 13) that use is systematically determined, for the normalized chromosome value of the chromosome 21 (O) coming to determine in the sample of self-training group 1,18 (Δs) and 13 ().
Figure 46 shows the normalized chromosome value of chromosome x (X-axis) and Y (Y-axis).Arrow points as described in example 13,5 (Figure 46 A) identifying in training set and test set respectively and 3 (Figure 46 B) X monosomic sample.
Figure 47 shows the normalization chromosome (as described in example 13) that use is systematically determined, for the normalized chromosome value of the chromosome 21 (O) coming to determine in the sample of self-test group 1,18 (Δs) and 13 ().
Figure 48 shows the normalization chromosome (as described in example 13) that use is systematically determined, for the normalized chromosome value of the chromosome 9 (O) coming to determine in the sample of self-test group 1.
Figure 49 shows the normalization chromosome (as described in example 13) that use is systematically determined, for the normalized chromosome value of the chromosome 1-22 coming to determine in the sample of self-test group 1.
Figure 50 shows the design (A) of the research described in example 16 and the process flow diagram of random sampling scheme (B).
Figure 51 A to 51F shows the process flow diagram of gender analysis (being Figure 51 D to 51F respectively) of the analysis (being Figure 51 A to 51C respectively) of chromosome 21,18 and 13 and women, the male sex and X monosomy.Ellipse comprises the result from the order-checking information acquisition from laboratory, and rectangle comprises results of karyotype, and the rectangle with fillet shows the comparative result in order to determine test performance (sensitivity and selectivity).Dotted line in Figure 51 A and 51B represents the relation between the mosaic sample of T21 (n=3) and T18 (n=1), and these samples are inspected by the analysis of chromosome 21 and 18 respectively, but correctly determine as described in example 16.
Figure 52 show needle is to the test sample of the research described in example 16, and the normalized chromosome value (NCV) of chromosome 21 (●), 18 (■) and 13 (▲) contrasts caryogram classification relation.Circular sample represents the unfiled sample with trisomy caryogram.
The normalized chromosome value (NCV) that Figure 53 shows the chromosome x of the test sample of the research described in example 16 contrasts the caryogram classification relation of Gender Classification.Show the sample (zero) with female karyotype, the sample (●) with male sex's caryogram, there is the sample () of 45, X and there is the sample (■) of other caryogram (i.e. XXX, XXY and XYY).
Figure 54 shows the test sample for the clinical research described in example 16, the figure of the normalized chromosome value relation of the normalized chromosome value counterstain body X of chromosome Y.Show euploid masculinity and femininity sample (zero), XXX sample (●), 45, X samples (X), XYY sample (■) and XXY sample (▲).Dash lines show as described in example 16 for the threshold value by sample classification.
An embodiment of Figure 55 schematic presentation CNV defining method described here.
Figure 56 shows from example 17, is comprising in the synthesis maternal sample (1) from the DNA of the child with trisomy 21 " ff " number percent (ff using the dosage of chromosome 21 to determine 21) as " ff " number percent (ff using the dosage of chromosome x to determine x) the figure of function.
Figure 57 shows from example 17, is comprising middle " ff " number percent (ff using the dosage of chromosome 7 to determine of synthesis maternal sample (2) carrying the DNA of the child of chromosome 7 excalation from euploid mother and its 7) as " ff " number percent (ff using the dosage of chromosome x to determine x) the figure of function.
Figure 58 shows from example 17, is comprising middle " ff " number percent (ff using the dosage of chromosome 15 to determine of synthesis maternal sample (3) from euploid mother and its with the DNA of 25% mosaic child of chromosome 15 partial replication 15) as " ff " number percent (ff using the dosage of chromosome x to determine x) the figure of function.
Figure 59 shows from example 17, uses " ff " number percent (ff that the dosage of chromosome 22 is determined in Artificial sample (4) 22) and from the figure of its NCV obtained, this Artificial sample comprises 0% child DNA (i), with from the known 10%DNA (ii) without the uninfluenced twin son of chromosome 22 chromosome dyad aneuploidy, and from the known 10%DNA (iii) with the influenced twin son of chromosome 22 chromosome dyad aneuploidy.
Figure 60 shows from example 18, is comprising the figure of the CNffx contrast CNff21 relation determined in the sample of fetus T21 trisomy.
Figure 61 shows from example 18, is comprising the figure of the CNffx contrast CNff18 relation determined in the sample of fetus T18 trisomy.
Figure 62 shows from example 18, is comprising the figure of the CNffx contrast CNff13 relation determined in the sample of fetus T13 trisomy.
Figure 63 shows from example 19, the figure of the NCV value of chromosome 1 to 22 and X in the test sample.
Figure 64 shows in example 18 for the fetus mark that the sample with the female child suffering from T21 obtains.
Figure 65 shows an a kind of embodiment of medical analysis equipment, and this medical analysis equipment is for determining the fetus mark of the function as copy number existing in Fetal genome variation.
Figure 66 shows for determining that fetus mark is to carry out an embodiment of a kind of medical analysis equipment of classifying by the copy number variation in Fetal genome.
Figure 67 shows a kind of kit, this kit comprise inspection contrast agents with for following the trail of the reagent with the integrality verifying the parent cfDNA sample carrying out extensive parallel order-checking.
Figure 68 shows a kind of kit, and this kit comprises blood collection device, DNA extracts reagent and for checking the contrast agents of maternal DNA sample.
Figure 69 (A, B, C) shows and to scheme for the make a variation NCV of the inherent positive control [ ] checked and maternal sample [◇] of the copy number of chromosome 13,18 and 21.
Embodiment
Disclosed embodiment relates to multiple method, equipment and system for determining copy number variation (CNV) of interested sequence in the test sample comprising mixtures of nucleic acids, known or suspect that these nucleic acid are different in the amount of interested one or more sequence.The interested sequence of > comprises such as scope from kilobase (kb) to megabasse (Mb) to whole chromosomal genomic segment sequence, known or suspect that these sequences are associated with Genetic conditions or disease event.The example of interested sequence comprises the chromosome (such as trisomy 21) be associated with the aneuploidy known and the chromosomal section increased in disease (as cancer), such as, partial trisomy 8 in acute myelocytic leukemia.Autosome 1-22 and sex chromosome X and Y is comprised (such as: 45 according to the confirmable CNV of this method, X, 47, XXX, 47, XXY and 47, XYY) any one or more monosomy in and trisomy, other chromosome polysomies, namely tetrasomy and five body constituents (include but are not limited to: xXXX, xXXXX, xXXXYwith xYYYY), and the disappearance of any one or multiple sections in these chromosomes and/or copy.
The method is a kind of statistical method, that this statistical method is implemented on the one or more processors and the cumulative bad variability of the variability of relevant for the process that is derived from, interchromosomal (same to round) and (between round) between order-checking process taken into account.These methods are applicable to determine the CNV of any fetus aneuploidy and known or that suspection is relevant to plurality of medical patient's condition CNV.
Except as otherwise noted, enforcement of the present invention relates to the routine techniques and device that are generally used for molecular biology, microbiology, protein purification, protein engineering, albumen and DNA sequencing and recombinant DNA field, and these are all in the technology of this area.This type of technology and device are known for those of ordinary skill in the art, and be illustrated in numerous file and reference works (such as, see the people such as Sambrook (Pehanorm Brooker), " Molecular Cloning:A Laboratory Manual (Molecular Cloning: A Laboratory guide) ", the third edition (Cold Spring Harbor (cold spring port)), [2001]); And the people such as Ausubel (Su Beier difficult to understand), " Current Protocols in Molecular Biology (up-to-date experimental methods of molecular biology compilation) " [1987].
Numerical range comprises the numerical value limiting this scope.Run through each greatest measure limit that this instructions provides comprise each lower numerical limitation being intended that of this, clearly write out at this as this type of lower numerical limitation.Run through each minimum value limit that this instructions provides and will comprise each higher numerical limitation, clearly write out at this as this type of high value limit.Run through each numerical range that this instructions provides and will comprise each the narrower numerical range dropped in this type of wider numerical range, all write out clearly as this type of narrower numerical range herein.
The title provided at this is not intended to limit this disclosure.
Unless defined separately at this, the identical meanings usually understood of the those of ordinary skill all had in field belonging to the present invention with the term of science of all technology as used herein.The different science dictionaries including the term comprised at this are know and be obtainable for those skilled in the art.Although similar or be equivalent to any method of those methods described herein and material and material in enforcement or test in embodiment disclosed here and have found purposes, only illustrate some preferred method and materials.
Namely the term directly defined hereinafter is illustrated more completely by being consulted as a whole by this instructions.Should be understood that this disclosure content is not limited to illustrated concrete grammar, code and reagent, because these can change, they to be got off use according to its situation by those skilled in the art.
definition
As used in this, the term " " of odd number, " one " and " being somebody's turn to do " comprise plural reference, unless context clearly indicates in addition.Except as otherwise noted, accordingly, nucleic acid from left to right to write by 5 ' to 3 ' direction and amino acid sequence from left to right writes to carboxyl direction by amino.
Term " assessment " refers to that when using when this CNV at analysis of nucleic acids sample by the state representation of chromosome or section aneuploidy be one of three types judgement: " normally " or " uninfluenced ", " influenced " and " without judging ".Judge that normal and affected threshold value is typically arranged.Parameter relevant with aneuploidy in sample is measured, and these measured values and threshold value are compared.For the aneuploidy of copy type, if chromosome or section dosage (or other measured values of sequence content) exceed for defining threshold value set by influenced sample, so judge influenced.For these aneuploidy, if chromosome or section dosage are lower than for the threshold value set by normal sample, so judge normal.By contrast, for the aneuploidy of deletion type, if chromosome or section dosage define threshold value lower than influenced sample, so judge influenced, and if chromosome or section dosage exceed for the threshold value set by normal sample, so judge normal.For example, under trisomy exists, the reliability thresholds defined lower than user by the such as isoparametric value of test chromosome dosage, determines that " normally " judges, and exceed by parameters such as such as test chromosome dosage the reliability thresholds that user defines, determine that " influenced " judges.Being positioned between the threshold value that " normally " or " influenced " judge by parameters such as such as test chromosome dosage, determining the result of " without judging ".Term " without judging " exchanges with " unfiled " and uses.
Term " copy number variation " refers to compared with the copy number of the nucleotide sequence existed in qualified samples at this, the change of the copy number of the nucleotide sequence existed in test sample.In certain embodiments, nucleotide sequence is 1kb or larger.In some cases, nucleotide sequence is whole chromosome or its pith." copy number variant " refers to by being compared by the expection content of interested sequence and interested sequence in test sample, finds the nucleotide sequence of copy number difference.For example, the content of the interested sequence existed in the content of interested sequence in test sample and qualified samples is compared.Copy number variant/variation comprises disappearance (comprising micro-deleted), inserts (comprising micro-insertion), copies, doubles, inversion, transposition and the variation of complicated multiposition.CNV contains chromosomal aneuploidy and part aneuploidy.
Term " aneuploidy " refers to by losing or obtain whole chromosome or a chromosomal part and the imbalance of the inhereditary material caused at this.
Term " chromosome aneuploidy " and " complete chromosome aneuploidy " refer to the imbalance of the inhereditary material caused by losing or obtain whole chromosome at this, and comprise germline aneuploidy and mosaic aneuploidy.
Term " part aneuploidy " and " chromosome dyad aneuploidy " refer to by losing or obtain a chromosomal part (such as at this, partial monoploidy and partial trisomy) and the imbalance of the inhereditary material caused, and contain the imbalance caused by transposition, deletion and insertion.
At this, term " aneuploidy sample " refers to that the chromosome content showing an experimenter is not an euploid sample, that is: this sample shows the abnormal copy number of an experimenter with chromosome or chromosomal section.
Term " aneuploidy chromosome " refers to a kind of chromosome at this, it known or be determined to be present in an abnormal copy number sample among.
Term " multiple/multiple " refer to more than one at this.For example, this term at this in order to refer to that the number of nucleic acid molecules or sequence label is using the marked difference being enough to identify copy number variation (such as chromosome dosage) in test sample and qualified samples under method disclosed here.In some embodiments, for each test sample obtain be included between about 20 and 40bp reading at least about 3 x 10 6individual sequence label, at least about 5 x 10 6individual sequence label, at least about 8 x 10 6individual sequence label, at least about 10 x 10 6individual sequence label, at least about 15 x 10 6individual sequence label, at least about 20 x 10 6individual sequence label, at least about 30 x 10 6individual sequence label, at least about 40 x 10 6individual sequence label or at least about 50 x 10 6individual sequence label.
Term " polynucleotide ", " nucleic acid " and " nucleic acid molecules " are used interchangeably, and refer to a covalently bound nucleotide sequence (i.e. the ribonucleotide of RNA and the deoxyribonucleotide of DNA), 3 ' position of the pentose of one of them nucleotide is connected on 5 ' position of the pentose of next nucleotide by a phosphodiester group, this comprises the sequence of any type of nucleic acid, including, but not limited to RNA and DNA molecular, such as cfDNA molecule.Term " polynucleotide " comprise and be not limited to strand with the polynucleotide of double-strand.
Term " partly (portion) " is used to the amount of the sequence information mentioning fetus and maternal nucleic acids molecule in a biological sample at this, this amount adds up to the sequence information being less than a human genome.
Term " test sample " this refer to comprise comprise at least one by for copy number variation carry out the nucleic acid of the nucleotide sequence screened or the sample of mixtures of nucleic acids, typically derived from biological fluid, cell, tissue, organ or biosome.In certain embodiments, sample comprises at least one and suspects the nucleotide sequence that its copy number has made a variation.These samples include but not limited to saliva/saliva, amniotic fluid, blood, clot or fine-needle biopsy samples (such as surgical biopsy, fine needle biopsy etc.), urine, peritoneal fluid, liquor pleurae etc.Although human experimenter (such as patient) often taken from by sample, inspection can be used for copy number variation (CNV) from including but not limited in any mammiferous samples such as dog, cat, horse, goat, sheep, ox, pig.Sample directly can use when obtaining from biogenetic derivation, or uses after changing sample characteristic in pre-service.For example, this pre-service can comprise and prepare blood plasma, dilution viscous fluid etc. from blood.Pretreated method can also include but not limited to filter, precipitation, dilution, distillation, mixing, centrifugal, freezing, freeze-drying, concentrated, amplification, nucleic acid fragment, interfering component deactivation, add reagent, dissolving etc.If these pretreated methods are used for sample, so these pretreated methods typically can make one or more associated nucleic acids preferably retain in the test sample with the concentration proportional with the concentration in untreated test sample (such as namely not carrying out the sample of any such preprocess method).For method described here, still think that the sample that these carry out " process " or " processing " is biological " test " sample.
Term " qualified samples " refers to the sample of the potpourri comprising the nucleic acid testing the known copy number existence that the nucleic acid in sample compares at this, and for interested sequence, this sample is normal sample, is not namely aneuploid sample.In certain embodiments, qualified samples is for identifying chromosomal one or more normalization chromosome or section of paying attention to.For example, qualified samples can be used for the normalization chromosome identifying chromosome 21.In the case, the sample of qualified samples to be one be not trisomy 21 sample.Qualified samples can also be used for the threshold value determining to judge influenced sample.
Term " training group " refers to one group of sample at this, and they can comprise affected and unaffected sample and be used to develop a kind of model for analytical test sample.In training group, unaffected sample can be used as qualified samples and identify normalization sequence, such as normalization chromosome, and the chromosome dosage of unaffected sample is used to as each the setting threshold value in these interested sequences (such as chromosome).These affected samples in a training group can be used to verify that affected test sample can easily distinguish out from unaffected sample.
Term " qualified nucleic acid " uses interchangeably with " qualified sequence ", and this is the sequence that a cycle tests or test nucleic acid compare with it.Qualified sequence is a kind of sequence be preferably present in by known expression (namely the amount of qualified sequence is known) in biological sample.In general, qualified sequence is present in the sequence in " qualified samples "." interested qualified sequence " is to the known qualified sequence of one of its amount in qualified samples, and it is the sequence that the species diversity in expressing with the sequence of the individuality with a kind of medical condition is associated.
Term " interested sequence " refers to a kind of nucleotide sequence at this, it express with the sequence contrasting diseased individuals in health in a species diversity be associated.Interested sequence can be the sequence on a kind of chromosome, it under disease or hereditary conditions by false demonstration, that is: process LAN or express not enough.An interested sequence can be a chromosomal part (i.e. chromosome segment) or a chromosome.Such as, an interested sequence can be a kind of chromosome (it is process LAN in aneuploidy situation), or a kind of gene (it is encoded to expressing not enough a kind of tumor suppressor in cancer).Interested sequence to be included in total group of the cell of experimenter or subgroup process LAN or to express not enough sequence.One " interested qualified sequence " is the interested sequence in qualified samples.One " interested cycle tests " is interested sequence in the test sample.
Term " normalization sequence " refers to the normalized sequence of number in order to the sequence label by being mapped to the interested sequence be associated with this normalization sequence at this.In certain embodiments, the display of normalization sequence is mapped to the variability of number in sample and order-checking round of the sequence label of normalization sequence, this variability is used as the variability of the interested sequence of normalized parameter close to normalization sequence, and influenced sample and one or more uninfluenced sample can be distinguished open.In some implementation, compared with other potential normalization sequences such as such as other chromosomes, this normalization sequence is best or effectively influenced sample and one or more uninfluenced sample are distinguished out." normalization chromosome " or " normalization chromosome sequence " is that the example " normalization chromosome sequence " of " normalization sequence " can be made up of a monosome or a group chromosome." one " normalization section " is another example of " normalization sequence ".One " normalization sector sequence " can be made up of a chromosomal single section, or it can be made up of identical or different two or more sections chromosomal.In certain embodiments, normalization sequence is used to be normalized for the variability such as variability of (between round) between the variability of the relevant variability of such as technique, interchromosomal (same to round) and order-checking.
Term " resolvability " is the chromosomal feature of a kind of normalization when this refers to, this enables it pick out one or more unaffected (namely normal) sample from one or more affected (i.e. aneuploidy) sample.
Term " sequence dosage " refers at this parameter be associated with the number of the sequence label for normalization recognition sequence by number of the sequence label for interested recognition sequence.In some cases, sequence dosage is the number of the sequence label identified for interested sequence and the ratio of the number of the sequence label identified for normalization sequence.In some cases, sequence dosage refers to the parameter sequence label density of interested sequence be associated with the label densities of normalization sequence." cycle tests dosage " is a parameter, and it makes the sequence label density of an interested sequence (such as chromosome 21) associate with the sequence label density of the normalization sequence (such as chromosome 9) of testing to determine in sample at.Similarly, one " qualified sequence dosage " is a parameter, and it makes the sequence label density of an interested sequence associate with the label densities of the normalization sequence determined in a qualified samples.
Term " sequence label density " refers to the number of sequence reads at this, these readings are mapped to one with reference on genome sequence, such as, the sequence label density for chromosome 21 is the number that the back of the body produced by sequence measurement is mapped to reference to the sequence reads on genomic chromosome 21.Term " sequence label density ratio " refers at this sequence label number and the ratio with reference to the chromosomal length of genome that are mapped to reference to genomic chromosome (such as chromosome 21)
At this, term " order-checking (NGS) of future generation " refers to that permission carries out the sequence measurement of extensive parallel order-checking to the molecule of clonal expansion and single nucleic acid molecules.The limiting examples of NGS comprises the synthetic method order-checking and connection method order-checking that use reversible dye-terminators.
Term " parameter " refers to a kind of numerical relation of characterizing physical characteristic at this.Often, parameter characterizes the numerical relation quantized between data set and/or quantized data collection in number.Such as, be mapped to the number of the sequence label on a chromosome and these labels the ratio (or function of ratio) being mapped to above between chromosomal length be exactly a parameter.
Term " threshold value " and " qualified threshold value " refer at this and are used as cut-off to characterize such as containing any number suffering from the samples such as the test sample of the nucleic acid of a kind of biosome of Medical Condition from suspection.Threshold value can compare with parameter value, whether shows that this biosome suffers from this Medical Condition with the sample determining to produce this parameter value.In certain embodiments, use qualified data set to calculate qualified threshold value, and serve as the boundary of the copy number such as such as aneuploidy variation in diagnosis biosome.If the result obtained from method disclosed here has exceeded a threshold value, so experimenter can be diagnosed with copy number variation, such as, and three bodies 21.Appropriate threshold value for method described herein can be identified by analyzing the normalized value (such as chromosome dosage, NCV or NSV) calculated for the sample of a training group.Use qualified (namely unaffected) sample comprised in the training group of qualified (namely unaffected) sample and affected sample can recognition threshold.May be used for confirming that these threshold values selected are useful (see at this these examples) picking out in affected sample from the unaffected sample in test group at known these samples (i.e. affected sample) had in the training group of chromosome aneuploidy.The selection of threshold value depends on that user wishes the confidence level making classification obtained.In some embodiments, for identify the training group of appropriate threshold value comprise at least 10, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 200, at least 300, at least 400, at least 500, at least 600, at least 700, at least 800, at least 900, at least 1000, at least 2000, at least 3000, at least 4000, or more a qualified samples.Maybe advantageously use the qualified samples of larger group to improve the diagnosis effectiveness of threshold value.
Term " normalized value " refers to a numerical value at this, and this numerical value makes the sequence label number identified for interested sequence (such as chromosome or chromosome segment) associate with the sequence label number identified for normalization sequence (such as normalization chromosome or normalization chromosome segment).Such as, " normalized value " can be other the local chromosome dosage illustrated in the application, or it can be other the local NCV (normalized chromosome value) illustrated in the application, or it can be other the local NSV (normalized section value) illustrated in the application.
Term " reading " refers to the sequence reads from a part of nucleic acid samples.Typically, but not necessarily, reading represents the short data records that in sample, adjacent base is right.Reading symbolically represents by the base-pair sequence (ATCG) of samples part.This reading can store in the storage device, and deals with on the merits of each case, to determine that whether this reading mates with reference sequences or reach other indexs.Reading can directly obtain from sequencing device, or obtains from the storage sequence information indirect of relevant sample.In some cases, term " reading " refer to one section of long enough (such as at least 30bp) can be used for identify larger sequence or the DNA sequence dna in region, than if compare and comparison targetedly with a chromosome or a genome area or a gene.
Term " sequence label " uses interchangeably at this and term " sequence label of mapping ", refers to the sequence reads being distributed to (i.e. mapping to) larger sequence (such as with reference to genome) by comparison definitely.The sequence label mapped is mapped to uniquely with reference to genome, and namely they are assigned to and put with reference to genomic unit.Label can be used as data structure or other data acquisition provides.In certain embodiments, label comprises the relevant information of reading sequence and this reading, such as, in the genome position of sequence, such as, position on chromosome.In certain embodiments, position illustrates with positive chain direction.Can to label define with reference to genome alignment time limited amount mispairing is provided.Can mapping can not included in analysis with reference to the label (label namely do not mapped uniquely) of position more than one in genome.
As used in this, term " comparison (aligned, alignment or aligning) " refers to and reading or label and reference sequences is compared and determine whether this reference sequences comprises the process of this reading sequence thus.If this reference sequences comprises this reading, so this reading can be mapped to reference sequences, or in certain embodiments, is mapped to the particular location in reference sequences.In some cases, comparison informs that whether reading is member's (namely reading exists or is not present in reference sequences) of concrete reference sequences simply.For example, the reference sequences of reading and human chromosome 13 is compared, will inform whether this reading is present in the reference sequences of chromosome 13.There is provided the instrument of this information can be determined set member's identity tester.In some cases, comparison indicates the position that in reference sequences, reading or label map in addition.For example, if reference sequences is whole mankind's genome sequence, so comparison can indicate reading to be present on chromosome 13, and can indicate reading further on the concrete stock and/or site of chromosome 13.
The reading of comparison or label are the order according to its nucleic acid molecules, are identified as and the one or more sequences carried out the genomic known array of self-reference and mate.Comparison can manually be carried out, but comparison realizes typically via computerized algorithm, because for realizing method disclosed here, is impossible at comparison reading within reasonable time.An example for the algorithm of aligned sequences be few nucleotide according to effective Local Alignment (ELAND) computer program, this programme distribution is a part of Yi Luna genomics analysis conduit (Illumina Genomics Analysis pipeline).As an alternative, Bloom filter (Bloom filter) or similar set member's identity tester can be used for reading to compare with reference to genome.Be illustrated in the U.S. Patent Application No. 61/552,374 submitted on October 27th, 2011, this patented claim is incorporated into this in full with it by reference.The coupling of comparison time series reading can be 100% sequences match or be less than 100% (imperfect coupling).
As used in this, term " with reference to genome " or " reference sequences " refer to any concrete known group sequence (no matter being part or complete) of any biosome or virus, and it may be used for carrying out reference to the sequence of the identification from an experimenter.Such as, be found in the National Center for Biotechnology Information (American National Biotechnology Information center) for human experimenter together with the reference genome of a lot of other biological body, www.ncbi.nlm.nih.gov." genome " refers to the entire genetic information of a biosome or virus, and this expresses in nucleotide sequence.
In different embodiments, reference sequences is obviously greater than the reading of comparing with it.For example, it can be as big as few about 100 times, or large at least about 1000 times, or large at least about 10,000 times, or large at least about 10 5times, or large at least about 10 6times, or large at least about 10 7doubly.
In an example, reference sequences is the sequence of total length human genome.These sequences can be described as genome reference sequences.In another example, reference sequences is limited to concrete human chromosome, such as chromosome 13.These sequences can be described as chromosome reference sequences.Other examples of reference sequences comprise genome and the chromosome of any species, the sub-chromosomal region (such as stock) etc. of other species.
In different embodiments, reference sequences is derived from the consensus of multiple individuality or other combinations.But in some application, reference sequences can take from a concrete individuality.
Term " made Target sequence gene group " refers at this allelic known array group containing known polymorphic site.For example, " SNP is with reference to genome " is the made Target sequence gene group comprising the allelic sequence group containing known SNP.
Term " clinically relevant sequence " refers to a nucleotide sequence at this, this sequence is known be or under a cloud be associated or implication with it with the situation of a kind of heredity or disease.When determining a kind of diagnosis of medical condition or confirming the diagnosis of this medical condition or when providing for a kind of disease progression prediction, determine that the sequence that presence or absence is correlated with clinically can be useful.
When using term " derivative " under the background at a kind of nucleic acid or a mixtures of nucleic acids, refer to that the source must originated from from this or these nucleic acid obtains mode that is this or these nucleic acid at this.Such as, in one embodiment, the potpourri derived from the nucleic acid of two different genes groups refers to that these nucleic acid (such as cfDNA) are naturally discharged by the process (as downright bad or apoptosis) of natural generation by cell.In another embodiment, refer to that these nucleic acid extract from the dissimilar cell of two kinds from an experimenter derived from the potpourri of the nucleic acid of two different genes groups.
Term " Patient Sample A " refers at this biological sample obtained from patient (i.e. the recipient of medical aid, nursing or treatment).Patient Sample A can be any sample described here.In certain embodiments, Patient Sample A is obtained by Noninvasive program, such as periphery blood sample or fecal specimens.Method described here is not necessarily limited to the mankind.Therefore, contain different veterinary applications, in the case, Patient Sample A can be the sample (such as cat, pig, horse, ox etc.) from non-human mammal.
Term " biased sample " refers to containing the sample derived from the mixtures of nucleic acids of different genes group at this.
Term " maternal sample " refers at this biological sample obtained from pregnant subject (such as women).
Term " biological fluid " refers at this liquid of taking from biogenetic derivation and comprises such as blood, serum, blood plasma, saliva, irrigating solution, cerebrospinal fluid, urine, seminal fluid, sweat, tears, saliva etc.As used in this, term " blood ", " blood plasma " and " serum " clearly contain its part or processing part.Equally, when sample takes from biopsy, cotton swab, smear etc., " sample " contains processing part derived from biopsy, cotton swab, smear etc. or part clearly.
Term " maternal nucleic acids " and " fetal nucleic acid " refer to the nucleic acid of the fetus entrained by the nucleic acid of pregnant female experimenter and this pregnant female respectively at this.
As used in this, term " with ... corresponding " sometimes refer in the genome being present in different experimenter, and without the need to having identical sequence in all genomes, but in order to the nucleotide sequence such as such as gene or chromosome etc. of the identity but not hereditary information that provide interested sequence such as such as gene or chromosome etc.
As used in this, term " acellular in fact " is contained and is removed sample formulation needed for the cellular component that is usually attached thereto from required sample.For example, by removing the haemocyte that such as red blood cell etc. is connected with blood plasma usually, make plasma sample acellular in fact.In certain embodiments, process cell-free sample in fact, to remove cell, otherwise these cells have an impact treating the inhereditary material desired by carrying out testing for CNV.
As used in this, term " fetus mark " refers to the mark comprising the fetal nucleic acid existed in the sample of fetus and maternal nucleic acids.Fetus mark is often in order to characterize the cfDNA in mother's blood.
As used in this, term " chromosome " refers in living cells the genophore bearing heredity, and it is derived from chromatin and comprise DNA and protein component (especially histone).Conventional one or two people's genoid group chromosome numbering system of generally acknowledging in the world is adopted at this.
As used in this, term " polynucleotide length " to refer in sequence or with reference to the absolute number in genomic region nucleic acid molecule (nucleotide).Term " chromosome length " refers to the known chromosome length in units of base-pair, such as found in WWW genome.ucsc.edu/cgi-bin/hgTracks? thered is provided in the NCBI36/hg18 set of the human chromosome on hgsid=167155613 & chromInfoPage=.
Term " experimenter " refers to human experimenter and nonhuman subjects at this, such as mammal, invertabrate, vertebrate, fungi, yeast, bacterium and virus.Although this example relate to the mankind and language mainly for human problem, concept disclosed here is applicable to the genome from any plant or animal, and is applicable to veterinary science, Animal Science, research laboratory etc. field.
Term " symptom " refers to " Medical Condition " at this, as the term of broad sense, it comprises all diseases and illness, also can comprise the normal health such as [damage] and such as pregnancy, it may affect the health of a people, and that benefits from medical aid or have a therapeutic treatment contains meaning.
Term " complete " uses when mentioning chromosomal aneuploidy at this, refers to whole chromosomal acquisition or loss.
When term " part " uses when mentioning chromosomal aneuploidy, refer to acquisition or the loss of a chromosomal part (i.e. section) at this.
There are two kinds of cell colonys with different caryogram in term " chimera " this refers to the individuality that expression one comes from single fertilization egg development.Mosaic may be caused by the sudden change only spreading to adult cell's subset between the puberty.
Term " non-chimera " refers at this biosome comprising the cell with a kind of caryogram, such as human foetus.
When term " use chromosome " uses when mentioning and determining chromosome dosage, refer at this sequence information using and obtain for chromosome, namely for the number of the sequence label of chromosome acquisition.
Term as used in this " sensitivity " equals the number of true positives divided by true positives and false negative sum.
Term as used in this " selectivity " equals the number of true negative divided by true negative and false positive sum.
Term " hypodiploid " refers to a chromosome number at this, and the normal haploid number that it is compared to the genome feature of these species wants little one or more.
" polymorphic site " is the locus that nucleotide sequence difference occurs.Locus may diminish to a base-pair.Schematic label has at least two allele, and each frequency occurred is greater than 1% of selected colony, and is more typically greater than 10% or 20%.Polymorphic site can be the site of single nucleotide polymorphism (SNP), on a small scale many base deletions or insertion, polynucleotide polymorphism (MNP) or Short tandem repeatSTR (STR).Term " polymorphic locus " and " polymorphic site " exchange at this and use.
" polymorphic sequence " refers at this nucleotide sequence comprising one or more polymorphic site (such as one SNP or series connection SNP), such as DNA sequence dna.Can be used for specifically parent and non-maternal allele in the maternal sample comprising fetus and maternal nucleic acids potpourri being distinguished out according to the polymorphic sequence of this technology.
As used in this, " single nucleotide polymorphism " (SNP) appears on the polymorphic site that mononucleotide occupies, and this site is the site of morphing between allelic sequence.With the sequence (being such as less than the sequence changed in colony 1/100 or 1/1000 member) being allele high conservative below before this site is usual.SNP is usually because a nucleotide on polymorphic site is replaced by another nucleotide and produces.Conversion be a purine replaced by another purine or a pyrimidine by another cytosine.Transversion is that purine is replaced by purine by cytosine or pyrimidine.SNP also can cause by inserting relative to the nucleotide deletion of reference allele or nucleotide.Single nucleotide polymorphism (SNP) is the situation that in human colony, two alternative bases occur with considerable frequency (> 1%), and is the human genetic variation of most common type.
Term " series connection SNP " refers at this two or more SNP existed in a polymorphic target nucleic acid sequence.
As used in this, term " Short tandem repeatSTR " or " STR " refer to when two or more nucleotide pattern repeat and repetitive sequence direct adjacent one another are time occur a class polymorphism.The length of this pattern can at the base-pair (bp) (in such as genome area (CATG) from 2 to 10 n) in scope, and typically in non-coding introne region.On set locus, there are how many specific STR sequences to repeat by checking several str locus seats and counting, likely set up the genetic profile of individual uniqueness.
As used in this, term " miniSTR " refers at this four or more base-pair tandem sequence repeats crossed over and be less than about 300 base-pairs, be less than about 250 base-pairs, be less than about 200 base-pairs, be less than about 150 base-pairs, be less than about 100 base-pairs, be less than about 50 base-pairs or be less than about 25 base-pairs." miniSTR " is can from the STR of cfDNA template amplification.
Term " polymorphic target nucleic acid ", " polymorphic sequence ", " polymorphic target nucleic acid sequence " and " polymorphic nucleic acid " exchange at this and use, and refer to the nucleotide sequence (such as DNA sequence dna) comprising one or more polymorphic site.
Term " multiple polymorphic target nucleic acid " refers at this large amount of nucleotide sequences respectively comprising at least one polymorphic site (such as a SNP), make 1,2,3,4,5,6,7,8,9,10,15,20,25,30,40 or more different polymorphic sites from this polymorphic target nucleic acid amplification, to identify and/or to quantize to comprise the foetal allele existed in the maternal sample of fetus and maternal nucleic acids.
Term " enrichment " refers to the polymorphic target nucleic acid amplification comprised in a maternal sample part and the process combined with the remainder of the maternal sample of this part of removing by institute's amplified production at this.For example, the remainder of maternal sample can be original parents sample.
Term " original parents sample " this refer to from serve as remove a part with the pregnant subject (such as women) in the source of the polymorphic target nucleic acid that increases acquisition non-enrichment biological sample." primary sample " can be any sample and its processing part that obtain from pregnant subject, the purifying cfDNA sample such as extracted from Maternal plasma sample.
As used in this, term " primer " refer to when be placed in cause under condition that the primer extension product that compensates with nucleic acid stock synthesizes time (under the initiating agents such as nucleotide and such as DNA polymerase exist and at applicable temperature and pH value), the separate oligonucleotides of synthesizing starting point can be served as.For most efficiently increasing, primer is preferably sub-thread, but as an alternative, can be bifilar.If bifilar, so first process to be separated its stock to primer before for the preparation of extension products.Primer is preferably oligodeoxyribonucleotide.Primer is sufficiently long, under existing at initiating agent, cause extension products synthesis.The precise length of primer will depend on many factors, comprise temperature, Primer Source, the use of method and the parameter for design of primers.
Phrase " has behavior (cause) to be taken " and refers to medical profession (such as doctor) or control or the control instructing the people of experimenter's medical treatment and nursing to take and/or permit the action that one or more medicaments in question/one or more compounds give experimenter.Administration can comprise diagnosis and/or determine suitably treatment or prevention scheme, and/or outputs concrete medicament/compound for experimenter.This prescribe can comprise such as draft prescription composition, write case record etc.Equally, such as diagnostic routine " having pending behavior (cause) " refers to medical profession (such as doctor) or controls or the control of instructing the people of experimenter's medical treatment and nursing to take and/or permit performing experimenter the action of one or more diagnosis scheme.
introduction
There is disclosed herein the method for copy number variation (CNV) for determining to test different interested sequence in sample, equipment, system and kit, this test sample comprises derived from two different genes groups and known or suspect the potpourri of the nucleic acid that the amount of one or more interested sequence is different.Additionally provide the method for determining the mark contributed by the genome of two in mixtures of nucleic acids, equipment, system and kit.The copy number variation determined by the method and apparatus disclosed herein comprises whole chromosomal acquisition or loss, the change relating to microscopic very big chromosome segment and the size a large amount of submicroscopic copy number from kilobase (kb) to the DNA fragmentation of megabasse (Mb) makes a variation.In different embodiments, these methods comprise the statistical method that a kind of machine realizes, and this statistical method illustrates the variability naturally increased that the variability between variability, interchromosomal variability and sequence of being correlated with by technique causes.The method is applicable to the CNV determining any fetus aneuploidy, and known or that suspection is relevant with plurality of medical symptom CNV.The CNV that can determine according to the inventive method comprises the disappearance of trisomys any one or more in chromosome 1 to 22, X and Y and monosomy, other chromosome polysomies and any one or more chromosomal section and/or copies, by only to testing the nucleic acid sequencing of sample once, can detect.Any aneuploidy can from by only determining the order-checking information that once namely obtains of nucleic acid sequencing of test sample.
CNV appreciable impact mankind's diversity in human genome and to the neurological susceptibility of the disease (people such as Redon (thunder east), people .Genome Res (genome research) 19:1682-1690 [2009] such as Nature (nature) 23:444-454 [2006], Shaikh (Xie He).Known CNV forms genetic disease by different mechanisms, causes gene dosage imbalance in most cases also or gene disruption.Except they are directly relevant to genetic block, also known CNV mediation can be harmful phenotypic alternation.Recently, some research is reported, as compared with normal control, lack of proper care in complexity, such as, in self-closing disease, ADHD (hyperactivity) and schizophrenia, the burden of the increase of rare or CNV again, highlights potential pathogenic (people such as Sebat (Sai Baite), the 316:445-449 [2007] of rare or unique CNV; The people such as Walsh (Walsh), Science (science) 320:539-543 [2008].CNV from genome rearrangement rises, main because of lacking, copying, insert and unbalanced translocation events.
Method described here, equipment or device can adopt the sequencing technologies of future generation (NGS) carrying out extensive parallel order-checking.In certain embodiments, with the extensive parallel mode order-checking clone of flowing in groove the DNA profiling that increases or single DNA molecular (such as people such as Volkerding (Wo Keerding), Clin Chem (clinical chemistry) 55:641-658 [2009]; Described in Metzker (maze can) M, Nature Rev (naturally commenting on) 11:31-46 [2010]).Except high through-put sequence information, NGS provides quantitative information, and wherein each sequence reads is computable " sequence label ", and these sequence labels represent individual cloned DNA template or single DNA molecular.The order-checking that the sequencing technologies of NGS comprises Manganic pyrophosphate complex initiation, checks order by the synthetic method of reversible dye-terminators, connected by oligonucleotide probe and ionic semiconductor order-checking.Can check order from the DNA (i.e. singleplex order-checking) of independent sample individually, or when single order-checking round, as index genome molecules, DNA from multiple sample can be pooled together and be carried out check order (i.e. multiple order-checking), to produce the reading of DNA sequence dna up to some hundred million.The example of sequencing technologies is below described, may be used for obtaining the sequence information according to method of the present invention.
In some embodiments, method and apparatus disclosed here can adopt some or all operations of following order: obtain nucleic acid test sample (typically via Noninvasive program) from patient; Processing test sample, prepares to check order; The nucleic acid carrying out test sample is checked order, to produce a large amount of reading (such as at least 10,000); These readings and reference sequences/genomic part are compared, and determines the amount (number of such as reading) defining the DNA of part (such as defining chromosome or chromosome segment) being mapped to reference sequences; By be mapped to for define part the amount normalized mapping of DNA of one or more normalization chromosome of selecting or chromosome segment calculate one or more dosage defining part to the amount of the DNA defining part; Determine whether this dosage indicates this to define part " influenced " (such as aneuploidy or chimera); Report is determined and is optionally converted into diagnosis; Use this diagnosis or determine to develop the plan for the treatment of, monitoring or further test patient.
determine the normalization sequence in qualified samples: normalization chromosome sequence and normalization sector sequence
Use the qualified samples identification normalization sequence deriving from experimenter from a group, these experimenters are known comprises a normal copy number with interested any sequence (such as chromosome or its section).The determination of normalization sequence is outlined in the step 110,120,130,140 and 145 of the embodiment of the method described in FIG.The sequence information obtained from qualified samples is for meaningfully identifying the chromosomal aneuploidy (Fig. 1 step 165 and example) test sample statistically.
Fig. 1 is provided for flow process Figure 100 of an embodiment of the CNV determining the such as interested sequence such as chromosome or its section in biological sample.In some embodiments, obtain biological sample from experimenter, and this sample comprises the potpourri of the nucleic acid be made up of different genes group.Different genes group can be formed by two individual samples, such as, form different genes group by fetus and the parent of nourishing fetus.Alternately, genome can be formed by from the aneuploidy cancer cell of same subject and the sample (such as from the plasma sample of cancer patient) of normal multiple cell.
Except analyzing the test sample of patient, interested chromosomal one or more normalization chromosome that each also will be selected possible or one or more normalization chromosome segment.The identification of normalization chromosome or section and the proper testing of Patient Sample A is asynchronous carries out, both can carry out in a clinical setting.In other words, before test patient sample, normalization chromosome or section is identified.Store relevance between normalization chromosome or section and interested chromosome or section to use at test period.As described below, this relevance typically preserves the time period that the many samples of test are crossed over.The embodiment related to for the normalization chromosome or chromosome segment selecting indivedual interested chromosome or section is below discussed.
Obtain one group of qualified samples to identify qualified normalization sequence, and variation value is provided, for determining the statistically significant identification of the CNV tested in sample.In step 110, obtain multiple biology qualified samples from multiple experimenter, these experimenters known comprise the cell of the normal copy number with any one sequence interested.In one embodiment, obtain qualified samples from the parent of nourishing fetus, use cytogenetics means to confirm to have the chromosome of normal copy number.Biology qualified samples can be a kind of biological fluid, such as blood plasma, or any applicable sample as described below.In some embodiments, qualified samples contains the potpourri of nucleic acid molecules (such as cfDNA molecule).In some embodiments, qualified samples be containing fetus with the plasma sample of the parent of the potpourri of the cfDNA molecule of parent.By using any known sequence measurement, (such as fetus with the nucleic acid of parent) at least partially in these nucleic acid being checked order, obtains the sequence information of normalization chromosome and/or its part.Preferably, any one in other local next generation's order-checking (NGS) methods illustrated of the application be used to the fetus of the molecule as list or clonal expansion with the nucleic acid sequencing of parent.In different embodiments, qualified samples, as disclosed by following, is processed before order-checking and during order-checking.These samples can use equipment as in this disclosure, system and kit to process.
In step 120, be included in being sequenced at least partially of each of all qualified nucleic acid in qualified samples, to produce 1,000,000 sequence reads, such as 36bp reading, this with reference to genome, such as hg18 compares.In some embodiments, sequence reads comprises about 20bp, about 25bp, about 30bp, about 35bp, about 40bp, about 45bp, about 50bp, about 55bp, about 60bp, about 65bp, about 70bp, about 75bp, about 80bp, about 85bp, about90bp, about 95bp, about 100bp, about 110bp, about 120bp, about 130bp, about 140bp, about 150bp, about 200bp, about 250bp, about 300bp, about 350bp, about 400bp, about 450bp or about 500bp.Expect that technical advantage will make the single-ended reading that can carry out being greater than 500bp, when producing pairing end reading, this reading allows to the reading for being greater than about 1000bp.In one embodiment, the sequence reads of mapping comprises 36bp.In another embodiment, the sequence reads of mapping comprises 25bp.With with reference to the sequence reads of genome alignment, and unique mapping is to reference to genomic reading, known they as sequence label.In one embodiment, obtain at least about 3x10 from unique mapping with reference to genomic reading 6individual qualified sequence label, at least about 5x10 6individual qualified sequence label, at least about 8x10 6individual qualified sequence label, at least about 10x10 6individual qualified sequence label, at least about 15x10 6individual qualified sequence label, at least about 20x10 6individual qualified sequence label, at least about 30x10 6individual qualified sequence label, at least about 40x10 6individual qualified sequence label or at least about 50x10 6the individual qualified sequence label comprised between 20 and 40bp reading.
In step 130, counting derives from all labels of the nucleic acid in order-checking qualified samples, to determine qualified sequence label density.In one embodiment, sequence label density is confirmed as this multiple qualified sequence label with reference to corresponding to reference to interested sequence on genome.In another embodiment, qualified sequence label density is this multiple qualified sequence label being defined as being mapped to interested sequence, is normalized to the length of the interested qualified sequence that they map.Be confirmed as the sequence label density of label densities relative to the ratio of the length of interested sequence referred to here as label densities ratio.Do not need the length normalizing to interested sequence, and a step can be included as, reduce the figure place in a number, simplify it for manual interpretation.All qualified sequence labels are by mapping and count down to each qualified samples, the sequence label density of the interested sequence (such as relevant clinically sequence) in qualified samples is determined, order identifies the sequence label density of additional sequences (normalization sequence is from it) simultaneously.
In certain embodiments, interested sequence is the chromosome be associated with complete chromosome aneuploidy, such as chromosome 21, and qualified normalization sequence is not associated with chromosomal aneuploidy and the complete chromosome of the interested sequences (i.e. chromosome) such as the close such as chromosome 21 of the change of sequence label density.The normalization chromosome selected can be a chromosome closest to the sequence label variable density of interested sequence or a group chromosome.Any one or more in chromosome 1-22, X and Y can be interested sequences, and the normalization sequence of each in the one or more chromosome can be identified as in qualified samples any one chromosome 1-22, X, Y.Normalization chromosome can be independent chromosome, or it can be the group chromosome described in other places of the application.
In another embodiment, interested sequence is the chromosome segment be associated with part aneuploidy (such as chromosome deficiency or insertion or uneven chromosome translocation), and normalization sequence is not associated with part aneuploidy and a chromosome segment (or one group of section) of the close chromosome segment be associated with part aneuploidy of the change of sequence label density.The normalization chromosome segment selected can be one or more chromosome segments of the sequence label variable density closest to interested sequence.Any one or more sections of any one or more chromosome 1-22, X and Y can be interested sequences.
In other embodiments, interested sequence is the chromosome segment be associated with part aneuploidy, and normalization sequence is a whole chromosome or multiple whole chromosome.In other embodiments again, the whole chromosome that interested sequence right and wrong ortholoidy is associated and normalization sequence is the chromosome segment or multiple chromosome segment that are not associated with this aneuploidy.
No matter in qualified samples, simple sequence or one group of recognition sequence are the normalization sequence of any one or multiple interested sequence, can the change of Selective sequence label densities closest to or effectively close to the qualified normalization sequence of the interested sequence as determined in qualified samples.For example, qualified normalization sequence is when in order to be normalized interested sequence, produces the sequence of minimum variability between qualified samples, i.e. the variability of normalization sequence is near the variability of the interested sequence determined in qualified samples.In other words, qualified normalization sequence is the sequence being selected as making sequence dosage (interested sequence) change between qualified samples minimum.Therefore, this process choosing when being used as normalization chromosome, the sequence of the minimum variability in the chromosome dosage between the different batches that expectation can produce interested sequence.
The normalization sequence identified for any one or multiple interested sequence in qualified samples keeps being select for determining that the normalization sequence of presence or absence aneuploidy reaches a few days, several weeks, several months and may time of several years in the test sample, its condition is that program needs to produce sequencing library, and the order-checking carried out sample is substantially constant in time.As mentioned above, for determining that the normalization sequence that there is aneuploidy is selected because being mapped to its variability as the interested sequence of normalized parameter of the closest use of variability (and other reasons of possibility) of its sequence label number at sample room (such as different sample) and check order (the order-checking round of such as carrying out on the same day on the same day and/or not) between round.Impact is mapped to the number of the label of all sequences by the substantially modify of these programs, from and will determine which or which group sequence identical and/or different order-checking round, on the same day or not on the same day in the variability of sample room closest to the variability of interested sequence, this will need to determine this group normalization sequence again.The lab scenario that the substantially modify of program comprises for the preparation of sequencing library changes, comprise with for the preparation of multiple order-checking but not the relevant change of the sample that checks order of single channel; And the change of order-checking platform, comprise the change for the chemical substance checked order.
In some embodiments, normalization sequence is the sequence picking out one or more qualified samples from one or more affected sample best, this means that normalization sequence is the sequence with maximum resolvability, the resolvability of i.e. normalization sequence is like this, make to provide optimum differentiation to the interested sequence in affected test sample, be used for easily from other unaffected samples, picking out affected test sample.In other embodiments, normalization sequence is the sequence of the combination with minimum variability and maximum resolvability.
The level of resolvability can be determined to be in the statistical discrepancy between the sequence dosage (such as chromosome dosage or section dosage) in a group qualified samples and the one or more chromosome dosage in one or more test sample, as described below and shown in these examples.Such as, resolvability can be T test value by numeral, and it represents the statistical discrepancy between the chromosome dosage in a group qualified samples and the one or more chromosome dosage in one or more test sample.Z-score for chromosome doses as long as the distribution for the NCV is normal. < } 0{ >alternately, resolvability can be normalized chromosome value (NCV) by numeral, as long as the distribution of NCV is normal, it is exactly the z mark of chromosome dosage.Similarly, resolvability can be T test value by numeral, and it represents the statistical discrepancy between the section dosage in a group qualified samples and the one or more section dosage in one or more test sample.When chromosome segment is interested sequence, the resolvability of section dosage can be expressed as normalized section value (NSV) in number, this normalized section value is the z mark of chromosome segment dosage, as long as the distribution of NSV normally.Determining in z mark, mean value and the standard deviation of the dosage of the chromosomal or section in one group of qualified samples can be used in.Alternately, mean value and the standard deviation of the dosage of chromosomal in the training group comprising qualified samples and influenced sample or section can be used.In other embodiments, normalization sequence is the sequence of the best of breed with minimum variability and maximum resolvability or little variability and large resolvability.
The method identification has the sequence of similar characteristics inherently, and tends to the similar variation between sample and order-checking round, and it is for determining that the sequence dosage tested in sample is useful.
determine the sequence dosage (i.e. chromosome dosage or section dosage) in qualified samples
In step 140, based on the qualified label density calculated, the qualified sequence dosage (i.e. chromosome dosage or section dosage) of interested sequence is confirmed as the ratio of the sequence label density of interested sequence and the qualified sequence label density of additional sequences (identifying the normalization sequence from it subsequently in step 145).The normalization sequence identified is used to determine to test the sequence dosage in sample subsequently.
In one embodiment, sequence dosage in qualified samples is a chromosome dosage, and this chromosome dosage is calculated as the ratio of this sequence label number of the normalization chromosome sequence in interested this sequence label number chromosomal and qualified samples.Normalization chromosome sequence can be monosome, a group chromosome, a chromosomal section or one group of section from coloured differently body.Therefore, interested chromosomal chromosome dosage is confirmed as in the sample to which: (i) interested this multiple label chromosomal and the ratio of this multiple label of normalization chromosome sequence be made up of monosome, (ii) for interested chromosomal label number with for the ratio of number of label comprising two or more chromosomal normalization chromosome sequences; (iii) for interested chromosomal label number with for the ratio of number of label of normalization sector sequence comprising a chromosomal single section; (iv) for number and the ratio for the number of the label of the normalization sector sequence comprised from two or more sections chromosomal of interested chromosomal label; Or (v) for interested chromosomal label number with for the ratio of number of label of normalization sector sequence comprising two or more two or more sections chromosomal.According to (i)-(v), as follows for determining the example of interested chromosomal chromosome dosage: the chromosome dosage of interested chromosome (such as chromosome 21) is confirmed as the sequence label density of chromosome 21 and all remains the ratio of the sequence label density of each of chromosome (i.e. chromosome 1-20, chromosome 22, chromosome x and chromosome Y); I the chromosome dosage of () interested chromosome (such as chromosome 21) is confirmed as the sequence label density of chromosome 21 and the ratio of the chromosomal sequence label density that all may combine of two or more residues; (ii) the chromosome dosage of interested chromosome (such as chromosome 21) is confirmed as the ratio of the sequence label density of the sequence label density of chromosome 21 and the section of another chromosome (such as chromosome 9); (iii) the chromosome dosage of interested chromosome (such as chromosome 21) is confirmed as the ratio of the sequence label density of chromosome 21 and the sequence label density of another chromosomal two sections (two sections of such as chromosome 9); (iv) and the chromosome dosage of interested chromosome (such as chromosome 21) be confirmed as the ratio of the sequence label density of the sequence label density of chromosome 21 and two sections (section of such as chromosome 9 and the section of chromosome 14) of two coloured differently bodies.
In another embodiment, the sequence dosage in qualified samples is section dosage, and it is calculated as in qualified samples for the number of the sequence label of the chromosomal interested section of non-fully and the ratio for the number of the sequence label of normalization sector sequence.Normalization sector sequence can be such as a whole chromosome, one group of whole chromosome, a chromosomal section or one group of section from coloured differently body.For example, in qualified samples, this multiple label that the section dosage of interested section is confirmed as (i) interested section and the ratio of this multiple label of normalization sector sequence be made up of chromosomal single section, (ii) this multiple label of interested section and the ratio of this multiple label of normalization sector sequence that is made up of two or more sections chromosomal, or this multiple label of (iii) interested section and the ratio of this multiple label of normalization sector sequence that is made up of two or more two or more sections chromosomal.
In whole qualified samples, determine interested one or more chromosomal chromosome dosage, and in step 145, identify normalization chromosome sequence.Similarly, in whole qualified samples, determine the section dosage of interested one or more section, and in step 145, identify normalization sector sequence.
from qualified sequence dosage identification normalization sequence
In step 145, based on calculated sequence dosage, identify that the normalization sequence of interested sequence is the sequence of the variability such as making the sequence dosage of interested sequence minimum between all qualified samples.The method identification has the sequence of similar characteristics inherently, and tends to the similar variation of sample and order-checking round, and it is for determining that the sequence dosage tested in sample is useful.
In one group of qualified samples, the normalization sequence of interested one or more sequence can be identified, and the sequence identified in qualified samples can subsequently for calculating the sequence dosage (step 150) of interested one or more sequence in each test sample, to determine presence or absence aneuploidy in each test sample.When using different order-checking platform, and/or when there are differences in the preparation of the purifying and/or sequencing library of wanting sequencing nucleic acid, to interested chromosome or section, the normalization sequence of identification can be different.Use normalization sequence to provide single-minded and sensitive measurement for the copy number variation of chromosome or its section according to method described here, the order-checking platform of no matter sample preparation and/or use how.
In some embodiments, identify more than one normalization sequence, that is, different normalization sequence can be determined to an interested sequence, and to an interested sequence, multiple sequence dosage can be determined.Such as, when using the sequence label density of chromosome 14, the variation (such as the coefficient of variation) in the chromosome dosage of interested chromosome 21 is minimum.But, two, three, four, five, six, seven, eight or more normalization sequences can be identified, for using in the sequence dosage determining to test interested sequence in sample.As an example, chromosome 7, chromosome 9, chromosome 11 or chromosome 12 can be used as normalization chromosome sequence, determine the second dosage of the chromosome 21 in any one test sample, because these chromosomes all have the CV (see example 8 table 10) of the CV close to chromosome 14.Preferably, when selecting monosome as interested chromosomal normalization chromosome sequence, normalization chromosome will be a chromosome, and this chromosome causes interested chromosomal chromosome dosage to have minimum variability across all testing sample (such as qualified samples).
normalization chromosome sequence is as chromosomal normalization sequence
In other implementations, normalization chromosome sequence can be simple sequence, or it can be one group of sequence.Such as, in some embodiments, normalization sequence is one group of sequence of any one or more normalization sequence being identified as chromosome 1-22, X and Y, a such as group chromosome.Form this group chromosome of interested chromosomal normalization sequence (i.e. normalization chromosome sequence), can be one group two, three, four, five, six, seven, eight, nine, ten, 11,12,13,14,15,16,17,18,19,20,21 or 20 disomes, and one that comprises or get rid of in chromosome x and Y or both. >this group chromosome being identified as normalization chromosome sequence is such group chromosome, and they cause interested chromosomal chromosome dosage to have minimum variability across all testing sample (i.e. qualified samples).Preferably, test chromosome that is independent or many groups together, for the ability of the interested sequence of their best simulation, select them as normalization chromosome sequence for this reason.
In one embodiment, the normalization sequence of chromosome 21 is selected from chromosome 9, chromosome 1, chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 8, chromosome 10, chromosome 11, chromosome 12, chromosome 13, chromosome 14, chromosome 15, chromosome 16 and chromosome 17.In another embodiment, the normalization sequence of chromosome 21 is selected from chromosome 9, chromosome 1, chromosome 2, chromosome 11, chromosome 12 and chromosome 14.Alternately, the normalization sequence of chromosome 21 is the group chromosomes being selected from chromosome 9, chromosome 1, chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 8, chromosome 10, chromosome 11, chromosome 12, chromosome 13, chromosome 14, chromosome 15, chromosome 16 and chromosome 17.In another embodiment, this group chromosome is the group being selected from chromosome 9, chromosome 1, chromosome 2, chromosome 11, chromosome 12 and chromosome 14.
In some embodiments, by using normalization sequence to improve the method further, by individually and with all remain chromosomal all may combine in use the system-computed determination normalization sequence (see example 13) of each chromosomal whole chromosome dosage.Such as, by using any one in chromosome 1-22, X and Y, and two or more the combination in chromosome 1-22, X and Y is to determine which chromosome single or in groups is normalization chromosome, this normalization chromosome causes the minimum variability of the interested chromosomal chromosome dosage across one group of qualified samples, system-computed all may chromosome thus, the normalization chromosome (see example 13) can determined each interested chromosome certainty annuity.Therefore, in one embodiment, the group chromosome that is made up of chromosome 4, chromosome 14, chromosome 16, chromosome 20, and chromosome 22 of the normalization sequence of the system-computed of chromosome 21.To the whole chromosomes in genome, chromosome single or in groups can be determined.
In one embodiment, the normalization sequence of chromosome 18 is selected from chromosome 8, chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 9, chromosome 10, chromosome 11, chromosome 12, chromosome 13 and chromosome 14.Preferably, the normalization sequence of chromosome 18 is selected from chromosome 8, chromosome 2, chromosome 3, chromosome 5, chromosome 6, chromosome 12 and chromosome 14.In one embodiment, the normalization sequence of chromosome 18 is the group chromosomes being selected from chromosome 8, chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 9, chromosome 10, chromosome 11, chromosome 12, chromosome 13 and chromosome 14.Preferably, this group chromosome is the group being selected from chromosome 8, chromosome 2, chromosome 3, chromosome 5, chromosome 6, chromosome 12 and chromosome 14.
In another embodiment, by individually and by normalization chromosomal all may combinationally use each may normalization chromosome, thus system-computed all may chromosome dosage determination chromosome 18 normalization sequence (as the application other places explained).Therefore, in one embodiment, the normalization chromosome that the normalization sequence of chromosome 18 is made up of a group chromosome, this group chromosome is made up of chromosome 2, chromosome 3, chromosome 5 and chromosome 7.
In one embodiment, the normalization sequence of chromosome x is selected from chromosome 1, chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 8, chromosome 9, chromosome 10, chromosome 11, chromosome 12, chromosome 13, chromosome 14, chromosome 15 and chromosome 16.Preferably, the normalization sequence of chromosome x is selected from chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6 and chromosome 8.In one embodiment, the normalization sequence of chromosome x is the group chromosome being selected from chromosome 1, chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 8, chromosome 9, chromosome 10, chromosome 11, chromosome 12, chromosome 13, chromosome 14, chromosome 15 and chromosome 16.Preferably, this group chromosome is the group being selected from chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6 and chromosome 8.
In another embodiment, by individually and by normalization chromosomal all may combinationally use each may normalization chromosome, thus system-computed all may chromosome dosage determination chromosome x normalization sequence (as the application other places explained).Therefore, in one embodiment, the normalization chromosome that is made up of this group of chromosome 4 and chromosome 8 of the normalization sequence of chromosome x.
In one embodiment, the normalization sequence of chromosome 13 is the chromosome being selected from chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 8, chromosome 9, chromosome 10, chromosome 11, chromosome 12, chromosome 14, chromosome 18 and chromosome 21.Preferably, the normalization sequence of chromosome 13 is the chromosome being selected from chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, and chromosome 8.In another embodiment, the normalization sequence of chromosome 13 is the group chromosomes being selected from chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 8, chromosome 9, chromosome 10, chromosome 11, chromosome 12, chromosome 14, chromosome 18 and chromosome 21.Preferably, this group chromosome is the group being selected from chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6 and chromosome 8.
In another embodiment, normalization sequence for chromosome 13 uses each may normalization chromosome and the chromosomal all possible combination of normalization individually, by system-computed all may chromosome dosage determine (as the application other places explained).Therefore, in one embodiment, the normalization sequence of chromosome 13 is the normalization chromosome of this group comprising chromosome 4 and chromosome 5.In another embodiment, the normalization chromosome that is made up of this group of chromosome 4 and chromosome 5 of the normalization sequence of chromosome 13.
Independent of at which the normalization chromosome determining to use in chromosome Y dosage, the variation in the chromosome dosage of chromosome Y is greater than 30.Therefore, one group of two or more chromosome being selected from chromosome 1-22 and chromosome x can be used as the normalization sequence of chromosome Y.In one embodiment, the group chromosome that is made up of chromosome 1-22 and chromosome x of at least one normalization chromosome.In another embodiment, this group chromosome is made up of chromosome 2, chromosome 3, chromosome 4, chromosome 5 and chromosome 6.
In another embodiment, by individually and by normalization chromosomal all may combinationally use each may normalization chromosome, thus system-computed all may chromosome dosage determination chromosome Y normalization sequence (as the application other places explained).Therefore, in one embodiment, the normalization sequence of chromosome Y is the normalization chromosome comprising this group chromosome be made up of chromosome 4 and chromosome 6.In another embodiment, the normalization chromosome that the normalization sequence of chromosome Y is made up of a group chromosome, this group chromosome is made up of chromosome 4 and chromosome 6.
Can be identical for calculating the normalization sequence of the dosage of interested coloured differently body or interested different section, or respectively for coloured differently body or section, it can be different normalization sequence.Such as, normalization sequence, the normalization sequence (such as normalization chromosome) (one or one group) of interested chromosome A can be identical, or it can be different from the normalization sequence (such as normalization chromosome) (or a group) of interested chromosome B.
The normalization sequence of complete chromosome can be a complete chromosome or one group of complete chromosome, or it can be chromosomal section, or one or more chromosomal one group of section.
normalization sector sequence is as chromosomal normalization sequence
In another embodiment, chromosomal normalization sequence can be normalization sector sequence.Normalization sector sequence can be single section, or it can be chromosomal one group of section, or they can be the multiple sections from two or more coloured differently bodies.By the system-computed of whole combinations of sector sequence in genome, normalization sector sequence can be determined.For example, the normalization sector sequence of chromosome 21 can be the single section large or less than the size of the chromosome 21 of about 47Mbp (megabase to), and such as normalization section can be a section of chromosome 9, and it is about 140Mbp.As an alternative, the normalization sequence of chromosome 21 can be such as from the combination of the sector sequence of two coloured differently bodies (such as from chromosome 1 with from chromosome 12).
In one embodiment, for the normalization sequence of chromosome 21 be chromosome 1-20,22, a section of X and Y or a normalization sector sequence of one group of two or more section.In another embodiment, the normalization sequence for chromosome 18 is a section of chromosome 1-17,19-22, X ' and Y or organizes section more.In another embodiment, the normalization sequence for chromosome 13 is a section of chromosome 1-12,14-22, X ' and Y or organizes section more.In another embodiment, the normalization sequence for chromosome x is a section of chromosome 1-22 and Y or organizes section more.In another embodiment, for the normalization sequence of chromosome Y be a section or one group of section of chromosome 1-22 and X.Normalization sequence that is single or many groups section can be determined to the whole chromosomes in a genome.Two or more sections of normalization sector sequence can be from a chromosomal section, or these two or more sections can be the sections of two or more coloured differently bodies.Illustrated by normalization chromosome sequence, a normalization sector sequence can be identical for two or more coloured differently bodies.
normalization sector sequence is as the normalization sequence of chromosome segment
When interested sequence is a chromosomal section, the CNV of the interested sequence of presence or absence can be determined.Variation in the copy number of chromosome segment allows to determine a kind of chromosome dyad aneuploidy of presence or absence.What below illustrate is the example of the chromosome dyad aneuploidy be associated from different fetal abnormality and the state of an illness.Chromosomal section can have any length.Such as, it can scope from kilobase to several hundred million bases.Human genome only accounts for more than 3,000,000,000 DNA bases, and it can be divided into tens of, thousands of, hundreds thousand of and millions of sections with different size, and their copy number can be determined in the method in accordance with the invention.The normalization sequence of a chromosome segment is so a kind of normalization sector sequence, and it can be from single section any one in chromosome 1-22, X and Y, or it can be from one group of section any one in chromosome 1-22, X and Y.
Normalization sequence for an interested section is such sequence, and this sequence has across multiple chromosome and across the variability of multiple sample, this variability is closest to the variability of interested segment.When this normalization sequence is any one or more one group of section in chromosome 1-22, X and Y, can as described in be normalized the determination of sequence, for determining interested chromosomal normalization sequence.Calculation of sector dosage is carried out for the interested section in each sample of one group of qualified samples (namely known is the diplontic sample of interested section) as one of two or more sections of normalization sequence and all possible combination by using, the normalization sector sequence of one or one group section can be identified, and this normalization sequence is confirmed as the normalization sequence that there is provided a section dosage, this section dosage has minimum variability across whole qualified samples for this interested section, as the above explanation to normalization chromosome sequence.
Such as, to interested section, it is 1Mb (megabasse), residue in about 3Gb human genome 300 ten thousand sections (deducting interested 1mg section) can be combinationally used individually or mutually, to calculate the section dosage of interested section in the sample of qualified clusters, thus determine which or which group section is qualified in normalization sector sequence that the is sample of test by being used as.Interested section can change to tens million of bases from about 1000 bases.Normalization sector sequence can be made up of one or more sections identical with interested sequence size.In other embodiments, normalization sector sequence can by being different from interested sequence, and/or section different from each other is formed.Such as, for one 100, the normalization sequence of the sequence of 000 bases longs can be that 20,000 base is long, and can comprise such as at 7,000+8,000+5, the combination of the sequence of the different length of 000 base.As in other places of the application to illustrated by normalization chromosome sequence, by systematically calculating all possible chromosome and/or section dosage independently and with whole each possible normalization chromosome segment that may combinationally use of normalization section, can determine normalization sector sequence (as the application other places explained).To the whole section in genome and/or chromosome, section single or in groups can be determined.
Normalization sequence for the dosage calculating interested coloured differently tagma section can be identical, or it can be the different normalization sequences for different interested chromosome segments.Such as, for the normalization sequence of interested chromosome segment A, such as a normalization section (or a group) can be identical, or it can be different from the normalization sequence for interested chromosome segment B, a such as normalization section (or a group).
normalization chromosome sequence is as the normalization sequence of chromosome segment
In another embodiment, the copy number variation of chromosome segment can use normalization chromosome to determine, this normalization chromosome can be monosome as above or a group chromosome.Normalization chromosome sequence can be by systematically determining which or which group chromosome makes the variability of chromosome dosage in one group of qualified samples minimum, coming the normalization chromosome for chromosome identification interested in one group of qualified samples or chromosome group.For example, for determining the excalation of presence or absence chromosome 7, in one group of qualified samples, be first identified as the normalization chromosome of analysis part disappearance or chromosome group make the chromosome dosage of whole chromosome 7 is minimum normalization sequence chromosome or chromosome group.As in these other places for as described in interested chromosomal normalization chromosome sequence, by use each may the indivedual and normalization of normalization chromosome chromosomal likely calculate to combined system all possible chromosome dosage, determine chromosome segment normalization chromosome sequence (as in these other places explained).Monosome or chromosome group can be determined for chromosome segments all in genome.Illustrate that use normalization chromosome determines that there is the example that chromosome dyad lacks and chromosome dyad copies is provided as example 17 and 18.
In certain embodiments, section or data box by first interested chromosome being divided into again variable-length determine the CNV of chromosome segment.Data box length can be at least about 1kbp, at least about 10kbp, at least about 100kbp, at least about 1mbp, at least about 10mbp or at least about 100mbp.Data box length is less, and the resolution obtaining the CNV locating section in interested chromosome is higher.
Determine that the CNV of the interested chromosome segment of presence or absence realizes by the dosage of interested chromosomal data box each in test sample and the average of corresponding data case dosage determined for data box each of equal length in one group of qualified samples being compared.The normalized binary value of each data box can as being calculated as normalized binary value (NBV) for as described in normalized section value above, and the data box dosage in test sample is associated with the average of corresponding data case dosage in one group of qualified samples by this normalized binary value.This NBV is calculated as:
NBV ij = x ij - &mu; ^ j &sigma; ^ j
Wherein with the estimation average for a jth data box dosage in one group of qualified samples and standard deviation respectively, and x ijit is the jth data box dosage that test sample i is observed.
determine to test the aneuploidy in sample
Based on the one or more normalization sequences identified in qualified samples, a sequence dosage is determined for the interested sequence of in the test sample, this sample comprises mixtures of nucleic acids, and these nucleic acid are derived from genomes different in one or more interested sequence.
In step 115, obtain a test sample from suspection or the known experimenter carrying the clinical relevant CNV of interested sequence.This test sample can be a kind of biological fluid (such as blood plasma) or any applicable sample as described below.As described, sample can use such as simple blood drawing to wait Noninvasive program to obtain.In some embodiments, the potpourri that sample contains nucleic acid molecules (such as cfDNA molecule) is tested.In some embodiments, this test sample be containing fetus with a Maternal plasma sample of the potpourri of the cfDNA molecule of parent.
In step 125, as to the situation illustrated by qualified samples, the nucleic acid of test at least partially in this test sample is checked order, to produce millions of sequence reads (such as 36bp reading).As in the step 120, from produced reading that the nucleic acid this test sample is checked order be mapped to uniquely one with reference on genome or and one with reference to genome alignment to produce label.As described in the step 120, from mapping uniquely with reference to obtaining genomic reading at least about 3x106 qualified sequence label, at least about 5x106 qualified sequence label, at least about 8x106 qualified sequence label, at least about 10x106 qualified sequence label, at least about 15x106 qualified sequence label, at least about 20x106 qualified sequence label, at least about 30x106 qualified sequence label, at least about 40x106 qualified sequence label or at least about 50x106 qualified sequence label, these qualified sequence labels comprise the reading between 20 and 40bp.In certain embodiments, the reading produced by sequencing device is provided in electronic format.Calculation element as discussed below is used to complete comparison.The reference genome of individual readings with frequent greatly (millions of base-pairs) is compared, to identify reading corresponding site unique in reference genome.In certain embodiments, alignment programs allows mispairing limited between reading and reference genome.In some cases, allow 1,2 or 3 base-pairs and corresponding base-pair mismatch in reference genome in a reading, but still produce mapping.
In step 135, use calculation element as described below, check order obtained all or most of label counting to determine cycle tests label densities by the nucleic acid in test sample.In certain embodiments, by each reading with compare with reference to a genomic concrete region (being in most of the cases a chromosome or section), and by being attached on reading by site information, make reading be transformed into label.When this process spread, calculation element can keep carrying out rolling counting to being mapped to reference to the number of the label/reading of each region genomic (being in most of the cases chromosome or section).Store the counting of each interested chromosome or section normalization chromosome corresponding to each or section person.
In certain embodiments, have one or more region be excluded with reference to genome, this or these region that is excluded is a part for real biological genome, but not included in reference to not included in genome.The reading compared in the region that may be excluded with these is not counted.The example in the region be excluded comprises the region of long repetitive sequence, the zone similarity between X and Y chromosome etc.
In certain embodiments, the method is determined when whether multiple reading exceedes once label counting with when comparing with reference to the same site on genome or sequence.Two labels may be there are and there is identical sequence therefore identical with on reference sequences site when comparing.Same label derived from identical order-checking sample can repel outside counting by method in order to count label in some cases.If the label of disproportionate number is identical in given sample, so show in program, to there is huge deviation or other defect.Therefore, according to some embodiment, counting method does not count the label identical with the former label counted from this sample from given sample.
When ignoring identical label from simple sample, different indexs can be set for selecting.In certain embodiments, the counting label defining number percent must be unique.If the label more than this threshold value is not unique, so ignore these labels.For example, if it is unique for defining number percent requirement at least 50%, so until the number percent of the unique label of sample is more than 50%, identical label is just counted.In other embodiments, the chain-reacting amount of unique label is at least about 60%.In other embodiments, the critical percentage of unique label is at least about 75%, or at least about 90%, or at least about 95%, or at least about 98%, or at least about 99%.For chromosome 21, threshold value can be located at 90% time.If 30M label and chromosome 21 are compared, so at least the label of 27M must be unique.If 3M counting label is not unique and the 30th, 000,000 label is not unique, so it does not count interior.
Suitable statistical study can be used, select the concrete threshold value or other indexs that determine when not count label identical in addition.The factor affecting this threshold value or other standards is the amount of the genomic size that order-checking sample can be compared relative to label.Other factors comprise the size of reading and similar Consideration.
In one embodiment, the sequence label number be mapped in an interested sequence is normalized to them and is mapped in the known length of an interested sequence above, to provide a cycle tests label densities ratio.As described in these qualified samples, might not require to normalize in the known length of an interested sequence, and this can be included as a step and reduces the digit in a number thus simplified for human interpretation.Along with the cycle tests label all mapped in test sample is all counted, determined for the sequence label density of interested sequence (such as relevant clinically sequence) in these test samples, be sequence label density for additional sequences by what determine equally, these additional sequences correspond at least one the normalization sequence identified in these qualified samples.
In step 150, based on the identification of at least one the normalization sequence in these qualified samples, relevant cycle tests dosage is determined to an interested sequence in test sample.In different embodiments, cycle tests dosage is determined in the mode calculated by the sequence label density of operation interested sequence as described in this and corresponding normalization sequence.The calculation element electronics of this task responsible accesses the relevance between interested sequence normalization sequence associated with it, and it can be stored in database, table, chart or as code and be included in programmed instruction.
As illustrated by other places of the application, this at least one normalization sequence can be a simple sequence or one group of sequence.Be the ratio of sequence label density and the sequence label density of at least one the normalization sequence determined in this test sample determined interested sequence in this test sample in the test sample for the sequence dosage of an interested sequence, the normalization sequence wherein in this test sample corresponds in these qualified samples for the normalization sequence of interested concrete recognition sequence.Such as, if the normalization sequence identified for the chromosome 21 in these qualified samples is not confirmed as being a chromosome (such as chromosome 14), so just be confirmed as the ratio of the sequence label density for chromosome 21 and the sequence label density for chromosome 14 for the cycle tests dosage of chromosome 21 (interested sequence), each is determined in the test sample.Similarly, determine for chromosome 13,18, X, Y and other chromosomal chromosome dosage of being associated with multiple chromosome aneuploidy.Can be one or a group chromosome for interested chromosomal normalization sequence, or one or a group chromosome section.As mentioned above, an interested sequence can be a chromosomal part, such as a chromosome segment.Therefore, dosage for a chromosome segment can be confirmed as the sequence label density determined for this section in this test sample and the ratio for the sequence label density of the normalization chromosome segment in this test sample, and the normalization section wherein in this test sample corresponds in these qualified samples for the normalization section (single or one group of section) of interested concrete section identification.Chromosome segment can be that scope is from kilobase (kb) to megabasse (Mb) in size.(such as about 1kb to 10kb, or about 10kb to 100kb, or about 100kb to 1Mb). <}0{>
In step 155, derive multiple threshold value in the qualified sequence dosage determined multiple qualified samples and the standard deviation set up the known sequence dosage being the aneuploid sample of interested sequence is determined.Notice that the analysis asynchronous execution of sample is typically tested in this operation with patient.It can with such as select normalization sequence to perform from qualified samples simultaneously.Difference between Accurate classification depends on for the probability distribution of different classes of (that is: aneuploidy type).In some instances, from the experience distribution of the aneuploidy (such as trisomy 21) for each type, multiple threshold value is selected.As described in instances, possible threshold value is established for classifying to trisomy 13, trisomy 18, trisomy 21 and monosomy X aneuploidy, which illustrate for by the purposes extracting the method determining chromosome aneuploidy that to check order from the cfDNA of a maternal sample, this maternal sample comprise fetus with the potpourri of the nucleic acid of parent.To be confirmed as picking out for a kind of chromosomal non-multiple this threshold value of affected sample and to be confirmed as picking out for a kind of different aneuploidy that the threshold value of influenced sample can be identical or different.As shown in these examples, be determine from the variability the interested chromosomal dosage across multiple sample and multiple order-checking round for each interested chromosomal threshold value.Changeability for any interested any chromosomal chromosome dosage is less, narrower for the dispersion in the interested chromosomal dosage across whole uninfluenced sample, and these samples are used to set the threshold value for determining different aneuploidy.
Get back to and patient is tested to sample and to classify the technological process be associated, in step 160, by being compared with at least one threshold value set up from these qualified samples dosage by the cycle tests dosage for interested sequence, in this test sample, determine the copy number variation of interested sequence.This operation can be performed by the identical calculations device in order to measure sequence label density and/or calculation of sector dosage.
In step 165, by the dosage calculated for interested cycle tests be set as that the dosage of threshold value compares, and the selection of these threshold values is the reliability thresholds according to user definition, be " normally ", " affected " or " nothing judgement (no call) " with this by this sample classification.These " without judge " samples are the samples can not making reliability etiologic diagnosis really to it.The influenced sample of each type (such as trisomy 21,21 partial trisomy, X monosomy) all has its oneself threshold value, and one for judging normal (uninfluenced) sample and another is for judging influenced sample (although two threshold values overlap in some cases).As described by these other places, in some cases, if the fetus mark of test nucleic acids in samples is enough high, so judgement (influenced or normal) can be transformed into without judgement.The classification of cycle tests is reported by the calculation element of other operations for this technological process.In some cases, classification is reported in electronic format, and can show, sends e-mails, sends short messages to relevant people etc.
Some embodiment provides a kind of method, and the method is for being provided in a pre-natal diagnosis that is that comprise fetus and the fetus aneuploidy in the biological sample of the nucleic acid molecules of parent.This diagnosis is made based on following steps: obtain to the fetus derived from a Biological test sample (such as Maternal plasma sample) with the sequence information checked order at least partially in the nucleic acid molecules potpourri of parent; Calculate for one or more interested chromosomal normalization chromosome dosage and/or a normalization section dosage for one or more interested section from this sequencing data; And determine interested chromosomal chromosome dosage for this in this test sample accordingly and/or for the statistically significant difference of between the section dosage of this interested section and a threshold value of establishing in multiple qualified (normally) sample, and provide pre-natal diagnosis based on this statistical discrepancy.As described in the step 165 in the method, make a normal or affected diagnosis.When can not be confident make normal or affected diagnosis, provide one " without judge ".
sample and sample processing
sample
For determining that the sample of the such as CNV such as chromosomal aneuploidy, part aneuploidy can comprise the sample of the copy number variation by determining one or more interested sequence of taking from any cell, tissue or organ.Wish that these samples comprise and be present in nucleic acid in cell and/or " acellular " nucleic acid (such as cfDNA).
In certain embodiments, advantageously obtain acellular nucleic acid, such as Cell-free DNA (cfDNA).The acellular nucleic acid comprising Cell-free DNA by diverse ways as known in the art obtain from the biological sample including but not limited to blood plasma, serum and urine (see people such as such as models (Fan), institute of NAS periodical (Proc Natl Acad Sci) 105:16266-16271 [2008]; Little go out the people such as (Koide), pre-natal diagnosis (Prenatal Diagnosis) 25:604-607 [2005]; People such as old (Chen), Natural medicine (Nature Med.) 2:1033-1035 [1996]; The people such as Lu (Lo), lancet (Lancet) 350:485-487 [1997]; Baud pricks the people such as figure (Botezatu), clinical chemistry (Clin Chem.) 46:1078-1084,2000; With the people such as (Su) that revives, molecular diagnostics magazine (J Mol.Diagn.) 6:101-107 [2004]).For by Cell-free DNA in sample and cell separation, can diverse ways be used, include but not limited to that classification is separated, centrifugal (such as density gradient centrifugation), DNA specificity precipitation or high-flux cell sorting and/or other separation methods.Can obtain for artificial commercially available kit (Indianapolis city, state of Indiana Roche Diagnistics (the Roche Diagnostics with being automatically separated cfDNA, Indianapolis, IN), the triumphant outstanding person (Qiagen in Valencia, California city, Valencia, CA), Delaware State Di Lun city Mai Kairuinajieer (Macherey-Nagel, Duren, DE)).The biological sample comprising cfDNA, for the order-checking inspection by detecting chromosomal aneuploidy and/or different polymorphisms, is used in the inspection determining the chromosome abnormalities such as presence or absence such as trisomy 21.
In different embodiments, the cfDNA be present in sample can (such as before preparing sequencing library) specific enrichment or nonspecific enrichment before use.The nonspecific enrichment of sample DNA refers to the whole genome amplification of the genomic DNA fragment of sample, and it improves the content of sample DNA before being used in preparation cfDNA sequencing library.Nonspecific enrichment can be at the selective enrichment comprising one of two genomes existing in more than one genomic sample.For example, nonspecific enrichment can have selectivity to Fetal genome in maternal sample, and it realizes increasing in sample foetal DNA relative to the ratio of mother body D NA by known method.As an alternative, nonspecific enrichment can be two the genomic non-selective amplification existed in sample.For example, nonspecific amplification can be the amplification of fetus and mother body D NA in the sample of the potpourri comprising the DNA from fetus and maternal gene group.The method of whole genome amplification is known in the art.Degenerate oligonucleotide primed PCR (DOP), primer extension PCR technology (PEP) and multiple displacement amplification (MDA) are the examples of whole genome amplification method.In certain embodiments, the genomic cfDNA that the sample comprising the potpourri of the cfDNA from different genes group does not exist in enriched Mixture.In other embodiments, the not specific enrichment of sample comprising the potpourri of the cfDNA from different genes group is present in any one genome in sample.
The sample of what method described here was applied comprise nucleic acid typically comprises biological sample (" test sample "), such as above-described.In certain embodiments, carry out purifying by the either method in a large amount of well-known method or be separated the nucleic acid preparing to screen one or more CNV.
Therefore, in certain embodiments, sample comprises or it consists of polynucleotide through purifying or separation, maybe can comprise the samples such as such as tissue sample, biological fluid sample, cell sample.The biological fluid sample be applicable to includes but not limited to that blood, blood plasma, serum, sweat, tears, phlegm, urine, phlegm, ear effluent, lymph, saliva, brains liquid, irrigating solution (ravages), bone marrow floater liquid, vaginal fluid, transcervical irrigating solution, brain liquid, ascites, milk, respiratory tract, intestines and genitourinary tract secretion, amniotic fluid, milk and leucocyte penetrate sample.In certain embodiments, sample crosses program easily obtainable sample, such as blood, blood plasma, serum, sweat, tears, phlegm, urine, phlegm, ear effluent, saliva or ight soil by non-invasive.In certain embodiments, sample is blood plasma and/or the sera components of periphery blood sample or periphery blood sample.In other embodiments, this biological sample is cotton swab or smear, biopsy sample or cell chulture.In another embodiment, this sample is the potpourri of two or more biological samples, and such as biological sample can comprise two or more biological fluid sample, tissue sample and cell culture samples.As used in this, term " blood ", " blood plasma " and " serum " clearly contain their classification part or the part of processing.Similarly, when a sample is when taking from a kind of biopsy, cotton swab, smear etc., " sample " separation unit derived from the processing of this biopsy, cotton swab, smear etc. or part should be contained clearly.
In certain embodiments, sample can derive from multiple source, include but not limited to: from the sample of Different Individual, from the sample of the different stages of development of identical or different individuality, from different diseased individuals (such as suffer from cancer or suspect the individuality with genetic block), the sample of normal individual, at the sample that the different phase of the disease of individuality obtains, derive from the sample of experience to the individuality that the difference of disease is treated, from the sample of the individuality of experience varying environment factor, from the sample of the individuality to a kind of state of an illness susceptible, from individuality being exposed to a kind of infectious disease factor (such as HIV) etc.
Schematic but in nonrestrictive embodiment, this sample is the maternal sample deriving from pregnant female (such as pregnant woman) at one.In this case, this sample can use method described herein to analyze, to provide the pre-natal diagnosis of potential chromosome abnormality in fetus.This maternal sample can be tissue sample, biological fluid sample or cell sample.Biological fluid comprises (as limiting examples): blood, blood plasma, serum, sweat, tears, phlegm, urine, phlegm, ear effluent, lymph, saliva, cerebrospinal fluid, irrigating solution, bone marrow floater liquid, vaginal discharge, through the irrigating solution of uterine neck, brain liquid, ascites, milk, breathes, the secretion of intestines and genitourinary tract, and leukapheresis sample.
In another schematic but nonrestrictive embodiment, maternal sample is the potpourri of two or more biological samples, and such as, this biological sample can comprise two or more biological fluid sample, tissue sample and cell culture samples.In some embodiments, this sample is by non-invasive process easily obtainable sample, such as, and blood, blood plasma, serum, sweat, tears, phlegm, urine, milk, phlegm, ear effluent, saliva and ight soil.In some embodiments, this biological sample is peripheral blood sample and/or its blood plasma or sera components.In other embodiments, this biological sample is the sample of cotton swab or smear, biopsy sample or cell chulture.As disclosed above, their separation unit or the part of processing clearly contained in term " blood ", " blood plasma " and " serum ".Similarly, when a sample takes from biopsy, cotton swab, smear etc., this " sample " clearly contains separation unit derived from the processing of biopsy, cotton swab, smear etc. or part.
In certain embodiments, sample can also be derive from the tissue of in vitro culture, cell or other sources containing polynucleotide.These samples cultivated can take from multiple source, include but not limited to: maintain the culture (such as tissue or cell) under different culture media and condition (such as pH value, pressure or temperature), maintain the culture (such as tissue or cell) of the period of different length, by biological factors or reagent (such as drug candidate, or correctives) culture (such as tissue or cell) that processes, or the culture of dissimilar tissue and/or cell.
That people know and depend on that the character in source is by difference from the method for biological origin isolating nucleic acid.Those of ordinary skill in the art can easily isolate as one or more nucleic acid required for method described herein from a source.In some cases, can be favourable by the nucleic acid molecule fragmentization in nucleic acid samples.Fragmentation can be random, or it can be special, such as, use the situation that digestion with restriction enzyme reaches.Be known for the method for random fragmentation in this area, and comprise such as restricted dnase digestion, alkali treatment and physical shear.In one embodiment, sample nucleic obtains with cfDNA form, and it does not experience fragmentation.
In other illustrative embodiments, sample nucleic obtains with genomic DNA form, and it is by the fragment of fragmentation into about 300 or more, about 400 or more or about 500 or more base-pair, and NGS method can easily be applied thereon.
prepared by sequencing library
In one embodiment, method described here can utilize sequencing technologies of future generation (NGS), and these technology allow multiple sample check order individually (i.e. single channel order-checking) using genome molecules form or in single order-checking batch, check order (such as multiple order-checking) as the sample that collects comprising the genome molecules of indexing.These methods can produce nearly several hundred million readings of DNA sequence dna.In different embodiments, genomic nucleic acids and/or the sequence of genomic nucleic acids of indexing can example sequencing technologies of future generation (NGS) as described in this be determined.In different embodiments, one or more processor as described in this can be used to analyze a large amount of sequence datas using NGS to obtain.
In different embodiments, the use of these sequencing technologies does not relate to the preparation of sequencing library.
But, in certain embodiments, relate to the preparation of sequencing library at this sequence measurement contained.In an exemplary process, the preparation of sequencing library comprises the DNA fragmentation (such as polynucleotide) producing a series of preparation through aptamer modification at random and carry out checking order.The sequencing library of polynucleotide can from comprising the coordinator of DNA or cDNA (such as DNA or cDNA of the complementation produced by RNA template under the effect of reverse transcriptase or copy DNA), analog prepared at interior DNA or RNA.Polynucleotide can originate in bifilar form (such as dsDNA (such as genomic DNA fragment), cDNA, pcr amplification product etc.), or in certain embodiments, polynucleotide can originate in single-stranded form (such as ssDNA, RNA etc.) and be transformed into dsDNA form.For example, in certain embodiments, sub-thread mRNA molecule can copy into the bifilar cDNA being applicable to prepare sequencing library.The precise sequence of main polynucleotide molecule generally concerning unimportant method prepared by library, and may be known or unknown.In one embodiment, polynucleotide molecule is DNA molecular.More particularly, in certain embodiments, polynucleotide molecule represents the whole genetic complement of biosome or the whole genetic complement of biosome in fact, and is the genomic DNA molecule (such as cell DNA, Cell-free DNA (cfDNA) etc.) typically comprising intron sequences and exon sequence (coded sequence) and non-coding regulatory sequence (such as promoter and strengthen subsequence).In certain embodiments, main polynucleotide molecule comprises human genome DNA's molecule, such as, be present in the cfDNA molecule in the periphery blood of pregnant subject.
By the preparation using the polynucleotide comprising the piece size of particular range to promote the sequencing library of some NGS order-checking platform.The preparation in these libraries typically comprises large polynucleotide (such as cell genomic dna) fragmentation to obtain the polynucleotide within the scope of required size.
Fragmentation is realized by any one in multiple method known to persons of ordinary skill in the art.For example, by including but not limited to spray, the mechanical means of sonication and hydraulic shear to be to realize fragmentation.But, machinery fragmentation typically can make DNA backbone cracking on C-O, P-O and C-C key, thus produce have blunt end and 3 '-and the 5 '-jag of C-O, P-O and C-C key of disconnection multiphase mixture (see such as A Nairui (Alnemri) and Li Wake (Liwack), journal of biological chemistry (J Biol.Chem) 265:17323-17333 [1990]; Richard (Richards) and Bu Waye (Boyer), molecular biology periodical (J Mol Biol) 11:327-240 [1965]), these ends may need repair because its may lack concerning preparation for check order DNA required for enzyme reaction subsequently (connection of the aptamer that such as checks order) necessary 5 '-phosphate.
By contrast, cfDNA typically exists with the pieces being less than about 300 base-pairs, therefore produces sequencing library for use cfDNA sample, does not typically need fragmentation.
Typically, no matter polynucleotide are firmly broken into fragment (being such as broken into fragment in vitro), or natural in pieces existence, and it all will be transformed into the blunt end DNA with 5 '-phosphate and 3 '-hydroxyl.Such as instruct user to carry out end reparation to sample DNA for standard schemes such as the schemes that uses the Yi Luna platform such as described in these other places to check order, with purifying before dA tailing carry out end reparation product and before aptamer Connection Step prepared by library the product of purifying dA tailing.
The one or more steps in order to obtain the modified DNA product by NGS order-checking that the different embodiment of Sequence Library preparation method described here typically requires without the need to operative norm scheme.The following describe simple method (ABB method), single stage method and two-step approach.Continuous print dA tailing is connected referred to here as two-step process with aptamer.Continuous print dA tailing, aptamer connect and increase referred to here as single stage method.In different embodiments, ABB method and two-step approach can perform in the solution or on solid surface.In certain embodiments, single stage method performs on a solid surface.
The standard methods such as such as Yi Luna and the simple method (ABB being undertaken by NGS for the preparation of DNA molecular confession checking order according to embodiment of the present invention is illustrated in Fig. 2; Example 2), the comparison of two-step approach and single stage method (example 3-6).
simple preparation-ABB
In one embodiment, provide the simple method (ABB method) for the preparation of Sequence Library, it comprises the consecutive steps (ABB) that end reparation, dA tailing and aptamer connect.In the embodiment without the need to dA tailing step for the preparation of sequencing library (see such as using Roche 454 and SOLID tM3 platforms carry out the scheme checked order) in, before the step that end reparation is connected with aptamer can connect not included in aptamer, the product of end reparation is carried out the step of purifying.
Comprise the sequencing library preparation method of the consecutive steps that end reparation, dA tailing and aptamer connect referred to here as simple method (ABB), and demonstrate the sequencing library (see such as example 2) creating sample analysis quickening while that quality being improved unexpectedly.According to some embodiments of the method, ABB method can perform in the solution, as at this illustration.ABB method can also perform on a solid surface, is by carrying out end reparation and dA tailing to DNA first in the solution, and subsequently as these other places on a solid surface one-step or two-step preparation described by DNA is attached to solid surface.Three enzymatic steps comprising the step on DNA aptamer being connected to band dA tail all perform when not having polyglycol.Comprise the open scheme that aptamer is connected to the coupled reaction of DNA instruct user to perform connection when there is polyglycol for performing.Applicant determines that aptamer is connected to and is with on the DNA of dA tail and can performs when not having polyglycol.
In another embodiment, sequencing library is prepared without the need to carrying out end reparation to cfDNA before dA tailing step.Applicant determines, cfDNA without the need to being broken into fragment need not carry out end reparation, and prepare cfDNA sequencing library according to embodiment of the present invention not comprise end and repair step and purification step, thus combination enzymatic reaction and simplify the preparation of DNA to be checked order further.CfDNA is with blunt end and 3 '-exist with the form of mixtures of 5 '-jag, and these ends produce in vivo making cell genomic dna be cracked under end is the effect of the nuclease of the cfDNA fragment of 5 '-phosphate and 3 '-hydroxyl.The elimination that end repairs step is natural in the cfDNA molecule of blunt end molecular forms existence and the natural cfDNA molecule with 5 ' jag by selecting, and these 5 ' jags are by being such as filled for the polymerase activity one or more deoxynucleotide being attached to the enzymes such as the circumscribed polymerase of Ke Lienuo (Klenow Exo-) of 3 '-OH (dA tailing) as described below.The cfDNA molecule with 3 '-jag (3 '-OH) is not selected in the elimination of the end reparation step of cfDNA.Unexpectedly, these 3 '-OH cfDNA molecules get rid of the expression not affecting genome sequence in library outside sequencing library, and this shows that the end of cfDNA molecule is repaired step and can be excluded (see example) from the preparation of sequencing library.Except cfDNA, the polynucleotide of not repairing that can be used for the other types preparing sequencing library comprise by the DNA molecular of RNA molecule (such as mRNA, siRNA, sRNA) reverse transcription generation and the non-DNA plerosis molecule as DNA cloning synthesized from phosphorylated primers.When using non-phosphorylated primers, the DNA from RNA reverse transcription and/or the DNA from DNA profiling amplification (i.e. DNA cloning) also can synthesize rear phosphorylation by polynucleotide kinase.
In another embodiment, the DNA do not repaired is used to prepare sequencing library according to two-step approach, does not wherein comprise the end reparation of DNA, and the DNA do not repaired carries out dA tailing is connected this two consecutive steps (see Fig. 2) with aptamer.Two-step approach can perform in the solution or on solid surface.When performing in the solution, two-step approach comprises the DNA utilizing and obtain from biological sample, do not comprise the step of this DNA being carried out to end reparation, and such as add the 3 '-end of monodeoxyribonucleotide (such as desoxyadenossine (A)) to the polynucleotide in the DNA sample of not repairing by the activity of such as some type DNA polymerase such as Plutarch (Taq) polymerase or the circumscribed polymerase of Ke Lienuo.In consecutive steps subsequently, the product of dA tailing is connected to aptamer, and 3 ' of each double helix region of these products and commercially available aptamer holds the `T` jag existed compatible.DA tailing prevents the oneself of two blunt end polynucleotide to connect, and is beneficial to the sequence formed through connecting aptamer.Therefore, in some embodiments, the cfDNA do not repaired carries out the consecutive steps that dA tailing is connected with aptamer, is wherein with the DNA of dA tail to be prepared by the DNA never repaired and does not carry out purification step after dA adds end reaction.Bifilar aptamer can be connected to the two ends of the DNA of band dA tail.The aptamer that one group can be utilized to have an identical sequence or one group two different aptamers.In different embodiments, the identical or different aptamer of one group or multiple different group can also be used.Aptamer can comprise index sequence can carry out multiple order-checking to library DNA.Aptamer is connected to be with on the DNA of dA tail and optionally performs when not having polyglycol.
two steps-prepare in the solution
In different embodiments, when two-step approach performs in the solution, can the product of purifying aptamer coupled reaction to remove the aptamer do not connected, the aptamer that may be connected to each other.Purifying can also select the range of size of the template produced for cluster, can optionally first increase, such as pcr amplification before.Connect product by including but not limited to any one purifying in the multiple method of gel electrophoresis, solid phase reversible fixing (SPRI) etc.In some embodiments, the DNA of purified connection aptamer increases before order-checking, such as pcr amplification.Some order-checking Platform Requirements library DNA carries out another amplification further.For example, according to Yi Luna technology, the cluster amplification of Yi Luna Platform Requirements library DNA should be performed as an integral part of order-checking.In other embodiments, make the DNA sex change of purified connection aptamer and make single-stranded DNA attaching molecules to the flow cell of sequenator.Therefore, in certain embodiments, the DNA for never repairing in the solution prepares sequencing library and comprises for the method that NGS checks order and obtain DNA molecular from sample; And the consecutive steps that dA tailing is connected with aptamer is carried out to the DNA molecular do not repaired obtained from sample.
As indicated above, in different embodiments, these methods prepared by library are integrated in the method for the variation such as copy number of determining such as aneuploidy (CNV).Therefore, in an illustrative embodiments, there is provided a kind of method for determining one or more fetal chromosomal aneuploidies of presence or absence, the method comprises: (a) obtains the maternal sample comprising the potpourri of fetus and parent Cell-free DNA; B fetus is separated from described sample with the potpourri of parent cfDNA by (); C () prepares sequencing library by the potpourri of fetus and parent cfDNA; Wherein prepare this library to comprise and carry out to cfDNA the consecutive steps that dA tailing is connected with aptamer, and wherein prepare this library and do not comprise end reparation is carried out to cfDNA, and this preparation performs in the solution; D () is to carrying out extensive parallel order-checking at least partially, to obtain the sequence information for fetus in sample and parent cfDNA in this sequencing library; E this sequence information is at least temporarily stored in a kind of computer-readable medium by (); F () uses the sequence information of this storage, identify the number of the sequence label of the normalization sequence of each in the number of the sequence label of each in one or more interested chromosome and any one or multiple interested chromosome in the mode calculated; G () uses the number of the sequence label of the normalization sequence of each in the number of the sequence label of each in this or these interested chromosome and this or these interested chromosome, calculate chromosome dosage for each in this or these interested chromosome in the mode calculated; And (h) compare for each the chromosome dosage in this or these interested chromosome with for each respective threshold in this or these interested chromosome, and determine presence or absence fetal chromosomal aneuploidy in the sample to which thus, wherein step (e)-(h) uses one or more processor to perform.The method illustration is in example 3 and 4.
Two steps and one step-solid phase preparation
In certain embodiments, sequencing library is according to preparing on a solid surface for the two-step approach prepared in the solution described by library above.Prepare sequencing library on a solid surface according to two-step approach to comprise and obtain the DNA moleculars such as such as cfDNA from sample, and perform the consecutive steps that dA tailing is connected with aptamer, wherein aptamer connects and performs on a solid surface.The DNA repairing or do not repair can be used.In certain embodiments, the product connecting aptamer is separated from solid surface, purifying and increasing before order-checking.In other embodiments, the product connecting aptamer is separated from solid surface, purifying and not increasing before order-checking.In other other embodiments, by the product amplification of connection aptamer, be separated and purifying from solid surface.In certain embodiments, purified product is increased.In other embodiments, purified product is not increased.Order-checking scheme can comprise amplification, such as cluster amplification.In different embodiments, the product of the connection aptamer of separation is purified before amplification and/or order-checking.
In certain embodiments, sequencing library prepares on a solid surface according to single stage method.In different embodiments, prepare sequencing library on a solid surface according to single stage method to comprise and obtain the DNA moleculars such as such as cfDNA from sample, and the consecutive steps performing dA tailing, aptamer connection and increase, wherein aptamer connection performs on a solid surface.Connect the product of aptamer without the need to separated before purification.
Fig. 3 depicts two-step approach for preparing sequencing library on a solid surface and single stage method.The DNA repairing or do not repair can be used to prepare sequencing library on a solid surface.In certain embodiments, the DNA do not repaired is used.The example that can be used for the DNA do not repaired preparing sequencing library on a solid surface includes but not limited to cfDNA, uses phosphorylated primers from the DNA of RNA reverse transcription, the DNA (i.e. phosphorylated cdna amplicon) that uses phosphorylated primers to increase from DNA profiling.The example that can be used for the DNA of the reparation preparing sequencing library on a solid surface include but not limited to cfDNA with form blunt end and phosphorylation become the genomic DNA of fragment (phosphorylated cdna of the reparation namely produced by RNA reverse transcriptions such as such as mRNA, sRNA, siRNA).In some illustrative embodiments, the cfDNA do not repaired obtained from maternal sample is used to prepare sequencing library.
Prepare on a solid surface sequencing library comprise with the Part I applying solid of two parts bond surface, by the Part II of two parts bond is attached to aptamer modifies the first aptamer and by the binding interactions of first and second part of two parts bond by fixing for aptamer on a solid surface.For example, prepare sequencing library on a solid surface and can comprise end polypeptide, polynucleotide or Small molecular being attached to library aptamer, this polypeptide, polynucleotide or Small molecular can be formed in conjunction with compound with fixing polypeptide, polynucleotide or Small molecular on a solid surface.Can be used for immobilized polypeptide, polynucleotide or micromolecular solid surface and include but not limited to plastics, paper, film, filter paper, chip, pin or microslide, silica or polymer beads (such as polypropylene, polystyrene, polycarbonate), 2D or 3D molecular skeleton or any stilt for solid-phase synthetic peptide or polynucleotide.
One-tenth key between polypeptide-polypeptide, polypeptide-polynucleotide, polypeptide-Small molecular and polynucleotide-polynucleotide bond can be covalently or non-covalently.Preferably, combined by non-covalent bond in conjunction with compound.For example, the bond that can be used for preparing on a solid surface sequencing library includes but not limited to streptavidin-biotin bond, antibody-antigene bond and ligand-receptor bond.The example that can be used for the polypeptide-polynucleotide bond preparing sequencing library on a solid surface includes but not limited to DNA-associated proteins-DNA bond.The example that can be used for the polynucleotide-polynucleotide bond preparing sequencing library on a solid surface includes but not limited to oligodT-oligoA and oligodT-oligodA.The example of polypeptide-Small molecular and polynucleotide-small molecule binders comprises streptavidin-biotin.
According to the embodiment (step and two steps) of solid surface method as shown in Figure 3, be coated with the solid surface of the container (such as polypropylene PCR pipe or 96 porose discs) for the preparation of sequencing library with polypeptide such as such as streptavidins.The end of first group of aptamer is modified by being attached the Small molecular such as such as biotin molecule, and biotinylated aptamer is incorporated into the streptavidin (1) on solid surface.Subsequently, the DNA not repairing or repair is connected on the biotinylation aptamer of streptavidin combination, thus is fixed on solid surface (2).Second group of aptamer is connected on fixing DNA (3).
two steps-prepare in solid phase
In one embodiment, two-step approach is that the DNA using such as cfDNA etc. not repair performs, for preparing sequencing library on a solid surface.3 ' the end by the mononucleotide bases such as such as dA being attached to the stock of the DNA that such as cfDNA etc. does not repair carries out dA tailing to the DNA do not repaired.Optionally, multiple nucleotide base can be attached on the DNA that do not repair.The potpourri comprising the DNA of band dA tail is added in fixing aptamer on a solid surface, and this DNA is connected on aptamer.Carrying out to DNA the step that dA tailing is connected with aptamer is continuous print, does not namely perform the purifying (for shown in two-step approach in as Fig. 2) of the product through dA tailing.As mentioned above, aptamer can have the jag with the jag complementation on the DNA molecular do not repaired.Subsequently, second group of aptamer is added in DNA-biotinylation aptamer compound to provide the DNA library connecting aptamer.Optionally, the DNA repaired is used to prepare library.The DNA repaired can be the genomic DNA having become fragment and carried out the unorganized ferment reparation of 3 ' and 5 ' end.In one embodiment, in the consecutive steps such as connected for the end reparation described by the simple method performed in the solution, dA tailing and aptamer, carry out end reparation to DNA such as such as parent cfDNA, dA tailing and aptamer are connected on fixing aptamer on a solid surface.
In some embodiment utilizing two-step approach, the DNA connecting aptamer is separated (Fig. 2 4a), purifying (in Fig. 2 5) by chemistry or physical means (such as heat, ultraviolet etc.) from solid surface, and optionally, before beginning sequencing procedure, it increases in the solution.In other embodiments, the DNA connecting aptamer is not increased.When not increasing, the aptamer being connected to DNA can be configured to the sequence (people such as Ku Zhawa (Kozarewa) comprising the oligonucleotide hybridization that the flow cell with sequenator exists, natural method (Nat Methods) 6:291-295 [2009]), and avoid the amplification can introduced for the sequence of being hybridized by the flow cell of library DNA and sequenator.Described by the DNA for the connection aptamer produced in the solution, extensive parallel order-checking (in Fig. 2 6) is carried out to the library of the DNA connecting aptamer.In certain embodiments, order-checking is the extensive parallel order-checking using the synthetic method by reversible dye-terminators to check order.In other embodiments, order-checking uses connection method order-checking to carry out extensive parallel order-checking.Order-checking technique can comprise solid-phase amplification, and such as cluster amplification, as described by these other places.
Therefore, in different embodiments, the DNA for never repairing on a solid surface prepares sequencing library and can comprise for the method for NGS and obtain DNA molecular from sample; And carry out to the DNA molecular do not repaired the consecutive steps that dA tailing is connected with aptamer, wherein aptamer connects and performs in solid phase.In certain embodiments, aptamer can comprise index sequence, to allow to carry out multiple order-checking to multiple sample in single reaction container (passage of such as flow cell).As mentioned above, DNA molecular can be cfDNA molecule, and it can be the DNA molecular from rna transcription, and it can be amplicon of DNA molecular etc.
As indicated above, in different embodiments, these libraries preparation method is integrated in the method for the variation such as copy number of determining such as aneuploidy (CNV).Therefore, in certain embodiments, the method that the cfDNA for never repairing on a solid surface prepares sequencing library is integrated into for analyzing maternal sample to determine in the method for presence or absence fetal chromosomal aneuploidy.Therefore, in one embodiment, provide a kind of method for determining one or more fetal chromosomal aneuploidies of presence or absence, the method comprises: (a) obtains the maternal sample comprising the potpourri of fetus and parent Cell-free DNA; B fetus is separated from described sample with the potpourri of parent cfDNA by (); C () prepares sequencing library by the potpourri of fetus and parent cfDNA; Wherein prepare this library to comprise and carry out to cfDNA the consecutive steps that dA tailing is connected with aptamer, wherein prepare this library and do not comprise end reparation is carried out to cfDNA, and preparation performs on a solid surface; D () is to carrying out extensive parallel order-checking at least partially, to obtain the sequence information for fetus in sample and parent cfDNA in this sequencing library; E this sequence information is at least temporarily stored in a kind of computer-readable medium by (); F () uses the sequence information of this storage, identify the number of the sequence label of the normalization sequence of each in the number of the sequence label of each in one or more interested chromosome and any one or multiple interested chromosome in the mode calculated; G () uses the number of the sequence label of the normalization sequence of each in the number of the sequence label of each in one or more interested chromosome and this or these interested chromosome, calculate chromosome dosage for each in this or these interested chromosome in the mode calculated; And (h) compare for each chromosome dosage in this or these interested chromosome with for the respective threshold of each in this or these interested chromosome, and determine presence or absence fetal chromosomal aneuploidy in the sample to which thus, wherein the one or more processor of the use of step (e)-(h) performs.Sample can be biological fluid sample, such as blood plasma, serum, urine and saliva.In certain embodiments, sample is maternal blood sample or its blood plasma and sera components.The method illustration is in example 4.
one step-prepare in solid phase
In another embodiment, dA tailing is carried out to the DNA do not repaired, but purifying is not carried out to dA tailing product before amplification, make the step of the connection of dA tailing, aptamer and amplification perform continuously or consistently like this.Before order-checking, continuous print dA tailing, aptamer connection and amplification, subsequently purifying are referred to here as a step process.Single stage method can perform on a solid surface (see such as Fig. 3).First group of aptamer is attached to solid surface (1), by do not repair and DNA with dA tail be connected to the upper and step be connected to by second group of aptamer on the DNA (3) of surface conjunction of the aptamer (2) of surface conjunction can as performed for as described in two-step approach above.But, in single stage method, can increase to the DNA of the surface conjunction connecting aptamer, be attached to (in Fig. 2 4b) on solid surface simultaneously.Subsequently, the gained library of the DNA of connection aptamer produced on a solid surface is separated and purifying (in Fig. 2 5), then as described in the DNA for the connection aptamer produced in the solution, carries out extensive parallel order-checking.In certain embodiments, order-checking is the extensive parallel order-checking using the synthetic method by reversible dye-terminators to check order.In other embodiments, order-checking is the extensive parallel order-checking using connection method order-checking.
Therefore, in certain embodiments, provide a kind of method for the preparation of the sequencing library for NGS order-checking, the method is undertaken by performing the step comprising the following: obtain DNA molecular from a sample; And to the consecutive steps that DNA molecular carries out dA tailing, aptamer connection and increases, wherein aptamer connection performs on a solid surface.As described in for two-step approach, in different embodiments, aptamer can comprise index sequence, to allow to carry out multiple order-checking to multiple sample in single reaction container (passage of such as flow cell).
In certain embodiments, DNA repairs.DNA molecular can be cfDNA molecule, and it can be the DNA molecular from rna transcription, or DNA molecular can be the amplicon of DNA molecular.It is perform as mentioned above that aptamer connects.The excessive aptamer do not connected can wash away from the DNA of fixing connection aptamer; Added in the DNA of fixing connection aptamer by reagent needed for amplification, this DNA stands to take turns amplification more, such as pcr amplification, as known in the art.In other embodiments, the DNA connecting aptamer is not increased.When not increasing, the DNA connecting aptamer can be removed from solid surface by chemistry or physical means (such as heat, UV-lamp etc.).When not increasing, the aptamer being connected to DNA can comprise the sequence (people such as Ku Zhawa (Kozarewa), natural method (Nat Methods) 6:291-295 [2009]) of the oligonucleotide hybridization that the flow cell with sequenator exists.
In different embodiments, sample can be biological fluid sample (such as blood, blood plasma, serum, urine, brains liquid, amniotic fluid, saliva etc.).In certain embodiments, a kind of for analyzing maternal sample to determine that the method for the presence or absence fetal chromosomal aneuploidy cfDNA comprised for never repairing on a solid surface prepares the method for sequencing library as a step.
Therefore, in one embodiment, provide a kind of method for determining one or more fetal chromosomal aneuploidies of presence or absence, the method comprises: (a) obtains the maternal sample comprising the potpourri of fetus and parent Cell-free DNA; B fetus is separated from described sample with the potpourri of parent cfDNA by (); C () prepares sequencing library by the potpourri of fetus and parent cfDNA; Wherein prepare this library and comprise the consecutive steps that cfDNA is carried out dA tailing, aptamer connection and increased, and wherein preparation performs on a solid surface; D () is to carrying out extensive parallel order-checking at least partially, to obtain the sequence information for fetus in sample and parent cfDNA in this sequencing library; E this sequence information is at least temporarily stored in a kind of computer-readable medium by (); F () uses the sequence information of this storage, identify the number of the sequence label of each in one or more interested chromosome and the number of the sequence label of the normalization sequence of each in any one or multiple interested chromosome in the mode calculated; G () uses the number of the sequence label of the normalization sequence of each in the number of the sequence label of each in this or these interested chromosome and this or these interested chromosome, calculate chromosome dosage for each in this or these interested chromosome in the mode calculated; And (h) compare for each the chromosome dosage in this or these interested chromosome with for each respective threshold in this or these interested chromosome, and determine presence or absence fetal chromosomal aneuploidy in the sample to which thus, wherein step (e)-(h) uses one or more processor to perform.In certain embodiments, end reparation is carried out to DNA.In other embodiments, prepare this library not comprise and carry out end reparation to cfDNA.The method illustration is in example 5 and 6.
Technique as mentioned above for the preparation of sequencing library is applicable to sample analysis method, include but not limited to the method for determining copy number variation (CNV), with in the sample comprising single-gene group with comprise by the method determining the polymorphism of any interested sequence of presence or absence in sample that is known or that suspect at least two genomic potpourris that one or more interested sequence is different.
The amplification of the product of the connection aptamer that may need in solid phase or prepare in the solution, is connected in the template molecule of aptamer to be introduced by the oligonucleotide sequence needed for hybridizing with the flow cell existed in some NGS platforms or other surfaces.The content of amplified reaction is known to persons of ordinary skill in the art and comprises suitable substrate (such as dNTPs), enzyme (such as DNA polymerase) and the buffer components needed for amplified reaction.Optionally, the amplification of the polynucleotide connecting aptamer can be saved.Generally, amplified reaction needs at least two amplimers, such as primer tasteless nucleotide, these primers may be the same or different and can comprise " the aptamer specific part " that during annealing steps, can be annealed into primer binding sequence in polynucleotide molecule to be amplified (if or template regard sub-thread as, so its complement).
Once be formed, the library of the template prepared according to method described above can be used for the solid-phase nucleic acid amplification that some NGS platform may need.As used in this, term " solid-phase amplification " to refer on solid support thing or at any nucleic acid amplification reaction carried out explicitly with solid support thing, all or a part of amplified production is fixed on solid support thing when it is formed.In particular embodiments, solid phase polymerase chain reaction (Solid phase PCR) and its solid phase isothermal duplication contained in this term, these reactions are similar to the reaction that standard solution increases mutually, except the one or both of forward and reverse amplimer is fixed on solid support thing.Solid phase PCR also comprises such as the following system: emulsion, and one of them primer anchors to bead and another primer is in free solution; Colony forming in solid phase gel-type vehicle, one of them primer anchors to surface and a primer is in free solution.
In different embodiments, after amplification, sequencing library can be analyzed to guarantee library not containing aptamer dimer or single-stranded DNA by micro fluidic Capillary Electrophoresis.The library of template polynucleotide molecule is particularly useful in solid phase sequencing method.Except being provided for the template of solid phase sequencing and Solid phase PCR, library template is also provided for the template of whole genome amplification.
For following the trail of the label nucleic acid with verification sample integrality
In different embodiments, by carrying out the integrality of verification sample to sample gene group nucleic acid (such as cfDNA) and the order-checking of potpourri of such as having introduced the adjoint label nucleic acid in sample before processing and follow the trail of sample.
Label nucleic acid can combine with test sample (such as biogenetic derivation sample) and stand to comprise the process of such as following one or more step: be separated by biogenetic derivation sample classification, such as, obtain substantially acellular blood plasma fractions from whole blood sample, from purification of nucleic acid the biogenetic derivation sample (such as blood plasma) carrying out classification separation or the biogenetic derivation sample (such as tissue sample) not carrying out classification separation and check order.In certain embodiments, order-checking comprises and prepares sequencing library.The sequence of the marker molecules combined with source sample or combined sequence are unique through selecting concerning source sample.In certain embodiments, the unique tag thing molecule in sample all has identical sequence.In other embodiments, the unique tag thing molecule in sample is multiple sequences, such as two, three, four, five, six, seven, eight, nine, ten, 15,20 or more not homotactic combinations.
In one embodiment, the integrality of sample can use multiple label nucleic acid molecules with identical sequence to verify.As an alternative, the identity of sample can use and have at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 25, at least 30, at least 35, at least 40, at least 50 or more not homotactic multiple label nucleic acid molecules and verify.Each verified the integrality of multiple biological sample (i.e. two or more biological samples) and need in these two or more samples marks with the label nucleic acid had each in marked multiple test samples being unique sequence.For example, first sample can with the label nucleic acid marking with sequence A, and second sample can with the label nucleic acid marking with sequence B.As an alternative, first sample can with multiple label labeled nucleic acid molecules all with sequence A, and the potpourri of second sample available sequences B and C mark, wherein sequence A, B and C have not homotactic marker molecules.
Be added in sample in any stage of the sample preparation that label nucleic acid can be prepared in library (if will prepare library) and occur before order-checking.In one embodiment, marker molecules can combine with undressed source sample.For example, label nucleic acid can be provided to collect in the collection tube of blood sample.As an alternative, label nucleic acid can add in blood sample after blood drawing.In one embodiment, label nucleic acid is added in the container in order to collection of biological fluid sample, and such as label nucleic acid is added into collect in the blood collection tube of blood sample.In another embodiment, label nucleic acid is added in a part of biological fluid sample.For example, in the label nucleic acid blood plasma that is added into blood sample and/or sera components (such as Maternal plasma sample).In another embodiment again, marker molecules is added into purified sample (such as from the nucleic acid samples of biological sample purification).For example, label nucleic acid is added in the sample of purified parent and fetus cfDNA.Equally, label nucleic acid can be added in biopsy sample before processing specimen.In certain embodiments, label nucleic acid can with send marker molecules to the carrier combinations in the cell of biological sample.Cell delivery vehicle comprises pH sensitive liposomes and cationic liposome.
In different embodiments, marker molecules has antigene strand sequence, and these sequences are non-existent sequences in the genome of biological origin sample.In an exemplary embodiment, the marker molecules in order to the integrality verifying mankind's biogenetic derivation sample has non-existent sequence in human genome.In an alternative embodiment, marker molecules to have in source sample and non-existent sequence in any one or multiple known group.For example, the marker molecules in order to the integrality verifying mankind's biogenetic derivation sample has in human genome and non-existent sequence in mouse gene group.Replacement scheme allows checking to comprise the integrality of two or more genomic test samples.For example, the integrality of the mankind's Cell-free DNA sample obtained from the experimenter attacked by pathogen (such as bacterium) can use the marker molecules with all non-existent sequence in the genome of human genome with invasion and attack bacterium to verify.The genomic sequence of many pathogen (such as bacterium, virus, yeast, fungi, protozoan etc.), the public can obtain on WWW ncbi.nlm.nih.gov/genomes.In another embodiment, marker molecules is the nucleic acid with non-existent sequence in any known group.The sequence of marker molecules produces at random by algorithm.
In different embodiments, marker molecules can be naturally occurring DNA (deoxyribonucleic acid) (DNA), RNA (ribonucleic acid) or artificial nucleic acid analog (nucleic acid mimics), and these artificial nucleic acid analogs comprise peptide nucleic acid (PMA), morpholino nucleic acid, lock nucleic acid, glycol nucleic acid and threose nucleic acid (difference of itself and naturally occurring DNA or RNA is that molecular backbone changes) or do not have the DNA analog of phosphodiester backbone.DNA (deoxyribonucleic acid) can come from naturally occurring genome maybe can by using enzyme or being produced in the lab by solid-state chemical reaction method.Chemical method also can in order to produce natural undiscovered DNA analog.Phosphodiester bond is replaced, but the obtained DNA derivant that ribodesose retains includes but not limited to the DNA analog with the main chain formed by sulphur dimethoxym ethane or formamide key, and these analogies verified are excellent structural DNA analogies.Other DNA analog comprises morpholino derivant and comprises the peptide nucleic acid (PNA) of the false peptide main chain based on N-(2-aminoethyl) glycocoll (biophysics and biomolecular structure year comment (Ann Rev Biophys Biomol Struct) 24:167-183 [1995]).PNA is very excellent DNA (or RNA (ribonucleic acid) [RNA]) structural simulation thing, and PNA oligomer can form very stable double-spiral structure with Wo Sen-Ke Like (Watson-Crick) complementary DNA and RNA (or PNA) oligomer, and it can also be invaded by spiral and be attached to (molecular biotechnology (Mol Biotechnol) 26:233-248 [2004]) in the target in duplex DNA.Structural simulation thing/the analog that can be used as another excellent DNA analog of marker molecules is phosphorothioate DNA, and one of them non-bridge joint oxygen is replaced by sulphur.This modifies to reduce and comprises 5 ' to 3 ' and 3 ' to 5 ' DNA POL 1 exonuclease, s1 nuclease and the endonuclease of P1, ribonuclease, serum nuclease and snake venom phosphodiesterase and the effect of exonuclease 2.
The length of marker molecules can be different from the length of sample nucleic or similar, and namely the length of marker molecules can be similar to the length of sample gene component, or it can be greater than or less than the length of sample gene component.The length of marker molecules is measured by the formation nucleotide of marker molecules or the number of nucleotide analog base.Separation method as known in the art can be used length to be different from the marker molecules of sample gene group molecular length and source nucleic acid distinguishes out.For example, the difference in length of label and sample nucleic acid molecule measures by electrophoretic separation such as such as Capillary Electrophoresis.Size distinction may be conducive to quantizing the quality of label nucleic acid and sample nucleic and evaluating.Preferably, label nucleic acid is shorter than genomic nucleic acids, and length is enough to get rid of it is mapped to sample gene group.For example, uniqueness is mapped to the human sequence that human genome needs 30 bases.Therefore, in certain embodiments, should be at least 30bp for the marker molecules in the order-checking biological test of human sample long.
The selection of marker molecules length is mainly through in order to verify that the sequencing technologies of source sample integrality is determined.It is also conceivable to the length of checked order sample gene group nucleic acid.For example, some sequencing technologies adopts the clonal expansion of polynucleotide, and it can require to treat to have minimum length with the genomic polynucleotide of clonal fashion amplification.For example, the bridge-type PCR (also known as cluster amplification) of use Yi Luna GAII sequential analyser to carry out checking order polynucleotide that to comprise by minimum length be 110bp carries out in vitro clonal expansion, aptamer is connected on these polynucleotide, to provide with at least 200bp of clonal fashion amplification and be less than the nucleic acid of 600bp and check order.In certain embodiments, the length of the marker molecules of aptamer is connected at about 200bp and about between 600bp, about between 250bp and 550bp, about between 300bp and 500bp or about between 350 and 450.In other embodiments, the length connecting the marker molecules of aptamer is about 200bp.For example, when the fetus cfDNA existed in maternal sample checks order, the length of selectable marker molecule is the length being similar to fetus cfDNA molecule.Therefore, in one embodiment, be used in comprise extensive parallel order-checking is carried out to determine that the length of the marker molecules in the inspection of presence or absence fetal chromosomal aneuploidy can approximately 150bp, about 160bp, 170bp, about 180bp, about 190bp or about 200bp to cfDNA in maternal sample; Marker molecules is about 170bp preferably.Such as SOLiD order-checking, other sequence measurements such as polonies order-checking (Polony Sequencing) and 454 order-checkings etc. use emulsion-based PCR with clonal fashion DNA amplification molecule for order-checking, and each technology all defines the minimum of molecule to be amplified and maximum length.About 600bp can be reached in the length of the marker molecules to be checked order of the nucleic acid of clonal fashion amplification.In certain embodiments, the length of marker molecules to be checked order can be greater than 600bp.
Do not adopt molecular cloning increase and the single-molecule sequencing technology that can check order to the nucleic acid within the scope of extremely wide template length does not require that sequence molecule to be measured has any length-specific in most cases.But the sequence productive rate of per unit mass depends on the number of 3 ' terminal hydroxy group, therefore having relatively short template is more more effective than having long template for checking order.If from the nucleic acid being longer than 1000nt, these nucleic acid should be clipped to the average length of 100 to 200nt so generally, more sequence information can be produced from the nucleic acid of equal in quality.Therefore, the length of marker molecules can in tens bases within the scope of many kilobases.Length for the marker molecules of single-molecule sequencing can reach about 25bp, reaches about 50bp, reaches about 75bp, reaches about 100bp, reaches about 200bp, reaches about 300bp, reaches about 400bp, reaches about 500bp, reaches about 600bp, reaches about 700bp, reaches about 800bp, reaches about 900bp, reach about 1000bp or more.
The length being used for marker molecules is selected also to be determined by the length of checked order genomic nucleic acids.For example, cfDNA circulates in mankind's blood flow as the genomic fragment of cell genomic dna.The fetus cfDNA molecule found in pregnant woman blood plasma is shorter than parent cfDNA molecule (people such as old (Chan), clinical chemistry (Clin Chem) 50:8892 [2004]) generally.The size classification of circulation foetal DNA is separated verified, the average length L EssT.LTssT.LT 300bp of circulation fetal DNA fragments, and estimate that mother body D NA is about (people such as Lee (Li), clinical chemistry, 50:1002-1011 [2004]) between 0.5Kb and 1Kb.These find that fetus cfDNA rarely exceeds the discovery of the people such as the model (Fan) of 340bp people such as (, clinical chemistry 56:1279-1286 [2010]) models is consistent with using NGS to determine.Be made up of two parts with the DNA that the standard method based on silica is separated from urine: derive from the high-molecular-weight DNA of cast-off cells and partly (baud pricks the people such as figure through the low-molecular-weight (150-250 base-pair) of kidney DNA (Tr-DNA), clinical chemistry 46:1078-1084,2000; And Su Dengren, molecular diagnostics magazine 6:101-107,2004).Showing in the application of kidney nucleic acid in separation for the technology being separated acellular nucleic acid from body fluid of newly-developed, many (U.S. Patent Application Publication No. 20080139801) that DNA and the RNA fragment existed in urine is shorter than 150 base-pairs.Be carry out in the embodiment of the genomic nucleic acids checked order at cfDNA, the marker molecules of selection roughly can reach the length of cfDNA.For example, in mononucleotide molecular forms or in clonal fashion amplification nucleic acid, can about between 100bp and 600 for the length of the marker molecules in parent cfDNA sample to be checked order.In other embodiments, sample gene group nucleic acid is more macromolecular fragment.For example, the sample gene group nucleic acid carrying out checking order is into the cell DNA of fragment.In the embodiment checked order to the cell DNA of one-tenth fragment, the length of marker molecules can reach the length of DNA fragmentation.In certain embodiments, the length of marker molecules is at least be mapped to sequence reads uniqueness suitably with reference to the minimum length required for genome.In other embodiments, the length of marker molecules is the minimum length that eliminating marker molecules is mapped to required for samples Reference genome.
In addition, marker molecules be can be used for verifying and not to be tested by nucleic acid sequencing and by the sample verified of common biotechnology (PCR in real time) except order-checking.
sample controls (such as the positives contrast of the process checked order and/or analyze)
In different embodiments, such as, label sequence in above-described introducing sample can serve as positive control, to verify order-checking and with the accuracy of aft-loaded airfoil and analysis and effect.
Therefore, provide for providing composition to the positives contrast of process (IPC) that DNA in sample checks order and method.In certain embodiments, the positive control for checking order to the cfDNA comprised in the sample of genome potpourri is provided.IPC can be used for the baseline shift of the sequence information obtained from difference group sample (such as carrying out at different time the sample that checks order in difference order-checking batch) to be associated.Therefore, for example, the sequence information obtained for parent test sample can be associated with the sequence information obtained from one group of qualified samples of carrying out checking order at different time by IPC.
Equally, when fragment analysis, the sequence information obtained for concrete fragment from experimenter can be associated with the sequence (similar sequence) obtained from one group of qualified samples of carrying out checking order at different time by IPC.In certain embodiments, the sequence information obtained for concrete cancer related gene seat from experimenter can be associated with the sequence information obtained from one group of qualified samples (such as from known amplification/disappearance etc.) by IPC.
In addition, IPC can be used as the label following the trail of sample in sequencing procedure.IPC can also provide the qualitative positive sequence dose value (such as NCV) of interested one or more aneuploidy chromosomal (such as trisomy 21,13 trisomys, 18 trisomys) to provide more appropriate deciphering and to guarantee reliability and the accuracy of data.In certain embodiments, the IPC comprised from the genomic nucleic acid of masculinity and femininity can be set up, to provide the dosage of chromosome x and Y in maternal sample, thus determine whether fetus is the male sex.
The type contrasted in process and number depend on type or the character of required test.For example, check order the test determining whether there is chromosomal aneuploidy for needs to the DNA from the sample comprising genome potpourri, in process, contrast can comprise the DNA obtained from the known test sample comprising identical chromosomal aneuploidy.In certain embodiments, IPC comprises from the known DNA comprising the sample of interested chromosomal aneuploidy.For example, the IPC in order to the test determining presence or absence fetal trisomic (such as trisomy 21) in maternal sample comprises the DNA obtained from the individuality with trisomy 21.In certain embodiments, IPC comprises the potpourri of the DNA that the individuality that has different aneuploidy from two or more obtains.For example, for the test in order to determine presence or absence 13 trisomy, 18 trisomys, trisomy 21 and X monosomy, IPC comprises the combination of the DNA sample obtained from the pregnant woman of the fetus of carrying one of test trisomy separately.Except complete chromosome aneuploidy, can be established as to determine that the test of presence or absence part aneuploidy provides the IPCs of positive control.
The IPC serving as the contrast for detecting single aneuploidy can use the potpourri of the cell genomic dna obtained from two experimenters to set up, and one of them experimenter is the genomic donor of aneuploid.For example, the IPC as the contrast of the test in order to determine fetal trisomic (such as trisomy 21) can by being undertaken combining setting up by from the genomic DNA of sex experimenter and the known genomic DNA not carrying the female subjects of this trisomy chromosome that carry this trisomy chromosome.Genomic DNA can extract from the cell of two experimenters, and carries out shearing to provide the fragment about between 100bp to 400bp, about between 150bp to 350bp or about between 200bp to 300bp to simulate the circulation cfDNA fragment in maternal sample.Select the ratio of DNA from the one-tenth fragment of the experimenter carrying aneuploidy (trisomy 21) to simulate the ratio of the circulation fetus cfDNA found in maternal sample, and provide comprise comprise about 5%, about 10%, about 15%, about 20%, about 25%, about 30% from the IPC of DNA potpourri of one-tenth fragment of DNA of experimenter carrying this aneuploidy.This IPC can comprise the DNA from the different experimenters carrying different aneuploidy separately.For example, IPC can comprise the not ill women DNA of about 80%, and to remain 20% can be DNA from three the different experimenters carrying a kind of trisomy chromosome 21, trisomy chromosome 13 and trisomy chromosome 18 separately.Prepare the potpourri of the DNA of section type for order-checking.Carry out processing to becoming the potpourri of DNA of fragment can comprise and prepare sequencing library, this sequencing library can use any extensive parallel mode with single channel or multiplex mode order-checking.The stoste of genome IPC can store and for multiple diagnostic test.
As an alternative, IPC can use and have the cfDNA obtained mother of the fetus of known chromosomal aneuploidy set up from known carrying.For example, cfDNA can obtain from the pregnant woman carrying the fetus with trisomy 21.CfDNA extracts from maternal sample, and to be cloned in bacteria carrier and to grow in bacterium, originates to provide continual IPC.Restriction enzyme can be used to be extracted from bacteria carrier by DNA.As an alternative, the cfDNA of clone can pass through such as pcr amplification.Can process IPC DNA, to check order in identical with the cfDNA of the test sample from presence or absence chromosomal aneuploidy to be analyzed batch.
Although the foregoing describe the foundation of IPC relative to trisomy, should be appreciated that, the IPC that reflection comprises other part aneuploidy of such as different fragment amplifications and/or disappearance can be set up.Therefore, for example, when known different cancer is associated with concrete amplification (such as breast cancer is associated with 20Q13), the IPCs incorporating those known amplifications can be set up.
Sequence measurement
As noted above, as the part differentiating the program that copy number makes a variation, prepared sample (such as, sequencing library) is checked order.Any one in multiple sequencing technologies can be utilized.
Some sequencing technologies commercially can be buied, such as A Feimei company (Sani Wei Er, CA) (Affymetrix Inc. (Sunnyvale, CA) hybrid method order-checking platform) and 454 life science (Bradfords, CT) (454Life Sciences (Bradford, CT)), her Rumi/Suo Lekesa (Hayward, CA) (Illumina/Solexa (Hayward, ) and (Cambridge of Cohan bio-science in the sea CA), MA) (Helicos Biosciences (Cambridge, MA) synthetic method order-checking platform), and applying biological system (Foster city, CA) (Applied Biosystems (Foster City, CA) connection method order-checking platform), as mentioned below.Except the single-molecule sequencing using the synthesis sequencing of nautical mile Cohan bio-science to carry out, other single-molecule sequencing technology include but not limited to the SMRT of Pacific Ocean bio-science (Pacific Biosciences) tMtechnology, ION TORRENT tMthe nano-pore sequencing that technology and such as Oxford nano-pore technology (Oxford Nanopore Technologies) are developed.
Although the Sang Geer method of robotization (Sanger method) is considered to ' first generation ' technology, the Sang Geer sequencing comprising robotization Sang Geer sequencing in method described herein, also can be used.Suitable sequence measurement in addition includes but not limited to nucleic acid imaging technique, such as atomic force microscope (AFM) or transmission electron microscopy (TEM).Schematic sequencing technologies is described in greater detail in hereinafter.
Schematic but in nonrestrictive embodiment at one, method described herein comprises real single-molecule sequencing (tSMS) technology of use nautical mile Cohan (such as, the people such as Harris T.D. (Harris T.D.), described in science (Science) 320:106-109 [2008]) this single-molecule sequencing technology obtains the sequence information of nucleic acid in test sample, such as, cfDNA in maternal sample, cfDNA or cell DNA etc. for the experimenter of cancer institute examination.In tSMS technology, DNA sample splits into has roughly 100 stocks to 200 nucleotide, and many A sequence is added to 3 ' end of each DNA stock.Each personal share is marked by adding fluorescently-labeled adenosine nucleoside acid.Then make DNA stock and flow cell hybridize, flow cell contains the few T catch site that millions of are fixed to flow cell surface.In certain embodiments, template density can be about 100,000,000 template/cm 2.Then flow cell is loaded in instrument, such as HeliScope tMsequenator, and laser irradiates flow cell surface, thus show the position of each template.Ccd video camera can measure the position of template on flow cell surface.Then template fluorescence labeling divides and washes off.Sequencing reaction is started by introducing DNA polymerase and fluorescently-labeled nucleotide.Few T nucleic acid serves as primer.Polymerase makes marked nucleotide be attached in primer in template-directed mode.Removing polymerase and unconjugated nucleotide.The template of the combination of fluorescently-labeled nucleotide is guided to be distinguished by flow cell surface imaging.After imaging, step toward division eliminates fluorescence labeling, and repeats this program to other fluorescently-labeled nucleotide, until the reading length desired by obtaining.Each nucleotide is utilized to add collection step sequence information.Carry out by single-molecule sequencing technology genome sequencing can get rid of or typically avoid PCR-based amplification when preparing sequencing library, and these methods allow directly to measure sample, and the copy of that sample of non-measured.
In another schematic but nonrestrictive embodiment, method described herein comprises use 454 sequencing (Roche) (such as, agate Gulass M. (Margulies, the people such as M.), described in nature (Nature) 437:376-380 [2005]) obtain the sequence information of the nucleic acid in test sample, such as, cfDNA in parent test sample, cfDNA or cell DNA etc. for the experimenter of cancer institute examination.454 sequencing typically comprise two steps.The first step, cuts into the fragment with roughly 300 to 800 base-pairs by DNA, and these fragments are blunt end.Then oligonucleotide aptamer is connected to the end of fragment.Aptamer serves as the primer of fragment amplification and order-checking.Fragment can use the aptamer B such as containing 5 '-biotin label to attach to DNA and catch on bead, such as, be coated with the bead of streptavidin.The fragment attached on bead carries out pcr amplification in O/w emulsion drips.Result is with the multiple copies of DNA fragmentation on each bead of clonal fashion amplification.Second step, catches bead in hole (such as, skin rises the hole of size).Manganic pyrophosphate complex initiation is carried out to each DNA fragmentation is parallel.Add one or more nucleotide and produce light signal, this light signal is recorded to by ccd video camera in order-checking instrument.The nucleotide number of signal intensity and combination is proportional.Manganic pyrophosphate complex initiation method utilizes pyrophosphoric acid (PPi) can depart from when nucleotide adds.PPi is converted into ATP by ATP sulfurylase under adenosine 5 ' phosphosulphate exists.Luciferase uses ATP that fluorescein is converted into oxyluciferin, and this reaction produces light, measures this light and is analyzed.
In another schematic but nonrestrictive embodiment, method described herein comprises use SOLiD tMtechnology (Applied Biosystems, Inc. (Applied Biosystems)) obtains the sequence information of the nucleic acid in test sample, such as, cfDNA in parent test sample, cfDNA or cell DNA etc. for the experimenter of cancer institute examination.At SOLiD tMconnect in sequencing, genomic DNA is cut into fragment, and the 5 ' end and 3 ' aptamer being attached to fragment is held to produce frag-ment libraries.As an alternative, interior aptamer can be introduced as follows: the 5 ' end and the 3 ' end that aptamer are connected to fragment, make fragment Cheng Huan, digest this one-tenth ring plate section to produce interior aptamer, and the 5 ' end and 3 ' aptamer being attached to gained fragment holds to produce pairing library.Next, preparation clone bead group in the microreactor containing bead, primer, template and PCR component.After PCR, by template denaturation and enrichment bead has the bead of the template increased to be separated.Carry out 3 ' to the template on the bead selected to modify, to allow to be attached on microslide.The sequentially hybridization of the base (or base-pair) that can be measured by part random oligonucleotide and the center differentiated by concrete fluorophore and be connected and measure sequence.After record color, connected oligonucleotides divided and removes, then repeating this process.
In another schematic but nonrestrictive embodiment, method described herein comprises the unimolecule (SMRT in real time using Pacific Ocean Biological Science Co., Ltd tM) sequencing technologies obtains the sequence information of nucleic acid in test sample, such as, cfDNA in parent test sample, cfDNA or cell DNA etc. for the experimenter of cancer institute examination.In SMRT sequencing, between DNA synthesis phase, imaging is carried out to the continuous combination of the nucleotide of dye marker.Single DNA polymerase attaching molecules is in the basal surface of independent null mode wavelength detecting (ZMW detecting device) obtaining sequence information, and the nucleotide that phosphoric acid connects just is being combined into the primer strand of growth.ZMW detecting device comprises closed structure, and it allows with the fluorescent nucleotide in outer (such as microsecond) rapid diffusion of ZMW scope is that background observes single nucleotide by the combination of DNA polymerase.Nucleotide is combined into growth stock typically needs several milliseconds.During this period, fluorescence labeling is excited and produces fluorescence signal, and fluorescence labels is divided.Measuring corresponding dye fluorescence, to indicate which base combined.Repeat this process to obtain sequence.
In another schematic but nonrestrictive embodiment, method described herein comprises use nano-pore sequencing (such as, GV and Mai Le A. in rope, clinical chemistry (Clin Chem) 53:1996-2001 [2007]) obtain the sequence information of nucleic acid in test sample, such as, cfDNA in parent test sample, cfDNA or cell DNA etc. for the experimenter of cancer institute examination.Nano-pore sequenced dna analytical technology is developed by multiple company, comprise such as Oxford nano-pore technology company (England Oxford city) (Oxford Nanopore Technologies (Oxford, United Kingdom)), this Kui Long company (Sequenom), Na Bosi company (NABsys) etc.Nano-pore sequencing is single-molecule sequencing technology, wherein when unique DNA directly checks order to it through during nano-pore.Nano-pore is aperture, and its diameter typically is about 1 nanometer.Nano-pore to be immersed in conductive fluid and to apply current potential (voltage), because ionic conduction produces Weak current by nano-pore across it.The magnitude of current flow through is responsive to the size and dimension of nano-pore.When DNA molecular is by nano-pore, each nucleotide pair nano-pore on DNA molecular causes obstruction in various degree, thus makes the current magnitude generation change in various degree by nano-pore.Therefore, when DNA molecular provides the reading of DNA sequence dna by this change of the electric current occurred during nano-pore.
In another schematic but nonrestrictive embodiment, method described herein comprises use chemosensitivity field effect transistor (chemFET) array (such as, described in U.S. Patent Application Publication No. 2009/0026082) obtain the sequence information of nucleic acid in test sample, such as, cfDNA in parent test sample, cfDNA or cell DNA etc. for the experimenter of cancer institute examination.In an example of this technology, DNA molecular can be put into reaction chamber, and template molecule and the sequencing primer be attached on polymerase can be made to hybridize.One or more triphosphate is combined into new nucleic acid stock at sequencing primer 3 ' end and can be distinguished with curent change by chemFET.An array can have multiple chemFET sensor.In another example, mononucleotide can be made to attach to bead, and can on bead amplification of nucleic acid, and independent bead can be transferred in the independent reaction chamber on chemFET array, wherein each room has chemFET sensor, and can check order to nucleic acid.
In another embodiment, the inventive method comprises the sequence information utilizing and use the Hall health molecular engineering (Halcyon Molecular ' s technology) of transmission electron microscopy (TEM) to obtain the nucleic acid in test sample, such as, cfDNA in parent test sample.Be called that independent molecule settles the method for rapid nano transmission (IMPRNT) to comprise: utilize monatomic resolution transmission electron microscope to carry out imaging to high molecular (150kb or the larger) DNA through heavy atom label selected marker, and make these molecules with consistent base to base spacing, be arranged on ultrathin film with the parallel array of highly dense (3nm stock is to stock).Electron microscope is used for the molecular imaging on film to measure the position of heavy atom label and to extract the base sequence information of DNA.The method is further described in the open WO 2009/046445 of PCT patent.The method allows the sequence measuring complete human genome within ten minutes.
In another embodiment, DNA sequencing technology is ion current (Ion Torrent) single-molecule sequencing method, and semiconductor technology coordinates with the chemical technology that simply checks order with the numerical information (0,1) chemical code information (A, C, G, T) be directly changed on semi-conductor chip by it.In fact, when nucleotide is combined into DNA stock by polymerase, hydrogen ion discharges as accessory substance.Ion current be use micro Process hole high density arrays, carry out this biochemical process with extensive parallel mode.Each pore volume receives different DNA moleculars.Be ion-sensitive layer below hole, and be ion transducer below ion-sensitive layer.When add nucleotide (such as C) to DNA profiling, be then combined into DNA stock time, by release hydrogen ions.The electric charge of that ion will change the pH value of solution, and this can be detected by the ion transducer of ion current (Ion Torrent).Sequenator (being essentially solid-state PH meter minimum in the world) reads base (chemically information directly arrives numerical information).An ion human genome machine (PGM tM) sequenator is then with nucleotide bump chip one by one.If the next nucleotide impacting chip does not mate, then can not be recorded to change in voltage and can not base be determined.If existence two identical bases on DNA stock, then voltage can double, and chip can record two be determined identical bases.The nucleotide that direct-detection can record in the several seconds combines.
In another embodiment, the inventive method comprises the sequence information using Sequencing by hybridization to obtain the nucleic acid in test sample, such as, cfDNA in parent test sample.Sequencing by hybridization comprises makes multiple polynucleotide sequence contact with multiple polynucleotide probes, each wherein in multiple polynucleotide probes can optionally mooring on substrate.Substrate may be the flat surfaces comprising known nucleotide sequence alignment.The polynucleotide sequence that exists in working sample can be used for the pattern of this hybridization array.In other embodiments, each probe mooring on bead, such as magnetic bead etc.Can measure with the hybridization of bead and be used for multiple polynucleotide sequences of differentiating in sample.
In another embodiment, the inventive method comprise use Yi Lu meter Na (Illumina) synthesize sequencing and based on reversible terminator order-checking chemical technology (such as, the people such as Bentley (Bentley), described in nature (Nature) 6:53-59 [2009]), the sequence information of the nucleic acid in test sample is obtained, such as, cfDNA in parent test sample by carrying out extensive parallel order-checking to millions of DNA fragmentation.Template DNA can be genomic DNA, such as cfDNA.In certain embodiments, the genomic DNA of institute's isolated cell is used as template, and its fragmentation is become the length of a hundreds of base-pair.In other embodiments, cfDNA is used as template, and because cfDNA exists, so do not require fragmentation as short-movie section.For example, fetus cfDNA to circulate the (people such as model (Fan) as the fragment of length roughly 170 base-pairs (bp) in blood flow, clinical chemistry (Clin Chem) 56:1279-1286 [2010]), and before order-checking, do not require DNA fragmentation.The genomic DNA that Yi Lu meter Na sequencing technologies depends on into fragment is attached on the optical clear flat surfaces that oligonucleotides anchor combines.Template DNA end produces 5 '-phosphorylation blunt end through reparation, and the polymerase activity of Klenow fragment (Klenow fragment) is used for making single A base add 3 ' end of blunt end phosphorylated cdna fragment to.This adds the DNA fragmentation prepared for being connected on oligonucleotide aptamer, and these fragments have single T base overhang to improve joint efficiency at its 3 ' end.Aptamer oligonucleotides and the complementation of flow cell anchor.Under restricted diluting condition, the sub-thread template DNA modified to be added in flow cell and be fixed on anchor by hybridization through aptamer.Extend and DNA fragmentation attached by bridge amplification to set up the super-high density order-checking flow cell with hundreds of millions clumps, each clump is containing the same template of 1,000 copy of having an appointment.In one embodiment, the genomic DNA (such as cfDNA) of random one-tenth fragment used PCR to be increased before standing cluster amplification.As an alternative, use the genomic library preparation without amplification, and be used alone the cluster TRAP (people such as Gao Nawa (Kozarewa), natural method (Nature Methods) 6:291-295 [2009]) enrichment becomes the genomic DNA of fragment, such as cfDNA at random.Utilization employs the reliable four look DNA synthesis sequencing technologies with the reversible terminator can removing fluorescent dye and checks order to template.Laser excitation and total internal reflection optics is used to obtain high sensitivity fluoroscopic examination.Reference the genome that the contrast of the short data records reading of about 20bp to 40bp (such as 36bp) is covered through repeated fragment is compared, and short data records reading is to reference to genomic unique mapping to use the data analysis pipeline software developed specially to differentiate.The reference genome that non-duplicate fragment is covered can also be used.No matter use the reference genome that repeated fragment covers, or the reference genome that non-duplicate fragment is covered, only count with reference to genomic reading being uniquely mapped to.Template in-situ regeneration can read can carry out second time from the end opposite of fragment after having read by first time.Therefore, the single-ended of DNA fragmentation or the order-checking of pairing end can be used.Part order-checking is carried out to the DNA fragmentation be present in sample, and to comprise predetermined length (such as 36bp) reading, be mapped to the genomic sequence label of known reference and count.In one embodiment, NCBI36/hg18 sequence is classified as with reference to genome sequence, can it at WWW genome.ucsc.edu/cgi-bin/hgGateway? org=Human & db=hg 18 & hgsid=166260105 obtains.As an alternative, be classified as GRCh37/hg19 with reference to genome sequence, it can obtain at WWW genome.ucsc.edu/cgi-bin/hgGateway.Other common sequence information sources comprise GenBank, dbEST, dbSTS, EMBL (European Molecular Biology Laboratory (European Molecular BiologyLaboratory)) and DDBJ (DNA Data Bank of Japan).There is multiple computerized algorithm can for aligned sequences, include but not limited to the BLAST (people such as Ao Ciqiu (Altschul), 1990), BLITZ (MPsrch) (Si Teluoke and Collins (Sturrock & Collins), 1993), FASTA (the inferior and Lippmann (Person & Lipman) of pul, 1988), BOWTIE (the people such as youth's lattice rice (Langmead), genome biology (Genome Biology) 10:R25.1-R25.10 [2009]), or ELAND (Illumina Inc., Santiago, CA, USA (Illumina, Inc., San Diego, CA, USA)).In one embodiment, checking order with one end of the copy of clonal fashion amplification and being processed by the bioinformatics compare of analysis of Yi Lu meter Na gene element analyzer (Illumina Genome Analyzer) blood plasma cfDNA molecule, Yi Lu meter Na gene element analyzer uses RiboaptDB (ELAND) software of extensive efficient comparison.
In some embodiment of said method, the sequence label mapped comprises the sequence reads of about 20bp, about 25bp, about 30bp, about 35bp, about 40bp, about 45bp, about 50bp, about 55bp, about 60bp, about 65bp, about 70bp, about 75bp, about 80bp, about 85bp, about 90bp, about 95bp, about 100bp, about 110bp, about 120bp, about 130bp, about 140bp, about 150bp, about 200bp, about 250bp, about 300bp, about 350bp, about 400bp, about 450bp or about 500bp.Estimating that technical progress can realize the single-ended reading being greater than 500bp, when producing pairing end reading, the reading being greater than about 1000bp can be realized.In one embodiment, the sequence label mapped comprises 36bp sequence reads.By comparing sequence label and reference sequences determines that the chromosome starting point of checked order nucleic acid (such as cfDNA) molecule can obtain the mapping of sequence label, and do not need concrete genetic sequence information.The small polymorphism that the mispairing (0 to 2 mispairing of each sequence label) of less degree may be able to exist between the genome in explanation reference genome and biased sample.
Often kind of sample typically obtains multiple sequence label.In certain embodiments, utilize reading to be mapped to reference to genome, often kind of sample obtains at least about 3 × 10 6individual sequence label, at least about 5 × 10 6individual sequence label, at least about 8 × 10 6individual sequence label, at least about 10 × 10 6individual sequence label, at least about 15 × 10 6individual sequence label, at least about 20 × 10 6individual sequence label, at least about 30 × 10 6individual sequence label, at least about 40 × 10 6individual sequence label, at least about 50 × 10 6individual sequence label, these sequence labels comprise the reading of (such as 36bp) between 20bp and 40bp.In one embodiment, all sequences reading is mapped to reference to genomic all regions.In one embodiment, the label be mapped to reference to genomic all regions (such as all chromosome) is counted, and measure the CNV (that is, excessively represent or represent deficiency) of interested sequence in hybrid dna sample (such as chromosome or its part).The method does not require to make differentiation between two genomes.
Correctly determine that whether there is or lack CNV (such as aneuploidy) necessary accuracy in sample is judge with reference to the change (between sequence variability) be mapped to reference to genomic sequence label number in the change of genomic sequence label number between each sample (interchromosomal variability) and different sequencing procedures according to being mapped in sequencing procedures.For example, the change being mapped to the label of rich GC or poor GC reference sequences may be remarkable especially.Other changes can because using different nucleic acid extraction and purification schemes, preparing sequencing library and use caused by different order-checking platforms.The inventive method uses sequence dosage (chromosome dosage or section dosage) according to the understanding to normalization sequence (normalization chromosome sequence or normalization sector sequence), thus explains in itself because of the variability naturally increased caused by the variability that variability (between round) between interchromosomal variability (with criticizing) with sequence is relevant with platform.Chromosome dosage is based on the understanding to normalization chromosome sequence, and normalization chromosome sequence can comprise monosome, or comprises the chromosome that two or more are selected from chromosome 1 to 22, X and Y.As an alternative, normalization chromosome sequence can comprise monosome section, or comprises a chromosome or two or more two or more sections chromosomal.Section dosage is based on the understanding to normalization sector sequence, and normalization sector sequence can comprise any one chromosomal single section, or comprises two or more two or more sections chromosomal any in chromosome 1 to 22, X and Y.
substance checks order
Fig. 4 illustrates the process flow diagram of an embodiment of the method, wherein the source sample nucleic of label nucleic acid and monocyte sample is combined to analyze genetic abnormality, determines the integrality of biological cosmogony sample simultaneously.In step 410, the biological cosmogony sample comprising genomic nucleic acids is obtained.At step 420 which, label nucleic acid and biological cosmogony sample combination are obtained label sample.Prepare the sequencing library of potpourri with the source sample gene group nucleic acid of clonal fashion amplification and label nucleic acid in step 430, and check order to provide the order-checking information relevant with label nucleic acid with sample source genomic nucleic acids with extensive parallel mode to library in step 440.Extensive parallel sequence measurement provides the order-checking information about sequence reads, and these sequence reads are mapped to one or more sequence label can analyzed with generation with reference to genome.In step 450, analyze all order-checking information, and in step 460, according to the order-checking information relevant with marker molecules, the integrality of inspection source sample.Inspection source sample integrity is by determining that the consistance between the order-checking information and the known array adding the marker molecules in original source sample in step 420 to of the marker molecules of step 450 acquisition completes.Can to the multiple sample application identical process checked order respectively, wherein each sample comprises the molecule with the exclusive sequence of this sample, i.e. sample unique marker molecules mark, and other samples in its flow cell with sequenator or microslide are separated check order.If check sample integrality, then can analyze the order-checking information relevant with sample gene group nucleic acid, to provide such as relevant with the situation of the experimenter that source sample derives from information.For example, if check sample integrality, then the order-checking information relevant with genomic nucleic acids is analyzed to determine presence or absence chromosome abnormality.If not check sample integrality, then do not consider order-checking information.
Method depicted in figure 4 is also applicable to comprise biological analysis unimolecule being carried out to substance order-checking, the BASE of the tSMS of such as nautical mile Cohan, the SMRT of Pacific Ocean bio-science, Oxford nano-pore and other technologies, the technology that such as IBM proposes, it does not require to prepare library.
multiple order-checking
Often criticizing a large amount of sequence reads that sequencing procedures can obtain allows the sample be combined to analyze, i.e. multiple analysis, it maximises order-checking ability and decreases workflow.For example, eight swimming lane flow cells of Yi Lu meter Na gene element analyzer are used can multiplely to carry out with to two or more samples order-checking in each swimming lane to the extensive parallel order-checking that eight libraries are carried out, to check order to 16,24,32 etc. an or more sample in single operation.Carry out parallel order-checking (that is, multiple order-checking) to multiple sample to require during sequencing library preparation, sample specificity index sequence (being also called bar code) to be merged.Order-checking index is the unique base sequence of about 5, about 10, about 15, about 20, about 25 of adding at 3 ' end of genomic nucleic acids and label nucleic acid an or more base.Multiplicated system can check order to hundreds of biological samples in single batch of sequencing procedures.The sequencing library of indexing can be prepared in one of PCR primer for cluster amplification and check order for the sequence increased with clonal fashion by index sequence is incorporated to.As an alternative, index sequence can be incorporated in aptamer, before pcr amplification, be connected to cfDNA.For single-molecule sequencing index library can by be positioned at label and genome molecules 3 ' end or add and flow cell anchor hybridize needed for 5 ' the holding and merge index sequence to set up of sequence (such as adding many A tail to use tSMS to carry out single-molecule sequencing).To uniquely tagged and the nucleic acid of indexing check order provide differentiate merge the index sequence information of the sample in sample library, and the sequence information of marker molecules make the order-checking information of genomic nucleic acids and sample source interrelated.Checking order separately to multiple sample (that is, substance order-checking) embodiment in, only need to modify the label of each sample and genomic nucleic acids molecule to comprise adaptor sequence as desired by order-checking platform and to get rid of index sequence.
Fig. 5 provides the process flow diagram of the embodiment 500 for the method for check sample integrality, carry out the multiple order-checking biological analysis of multi-step to these samples, that is, checked order as complex mixture by the Nucleic acid combinations of individual samples.In step 510, obtain multiple biological cosmogony sample, each sample comprises genomic nucleic acids.In step 520, uniquely tagged thing nucleic acid and each biological cosmogony sample combination are obtained multiple uniquely tagged sample.In step 530, for the sequencing library of each uniquely tagged sample preparation sample gene group nucleic acid and label nucleic acid.The predetermined library preparation carrying out the sample of multiple order-checking comprise in the label nucleic acid unique index tab being incorporated to sample and each uniquely tagged sample with provide its source nucleic acid sequence can with correspondence markings thing nucleotide sequence the interrelated and sample differentiated in complex solution.In the embodiment of method comprising the marker molecules (such as DNA) can carrying out enzymatic modification, indexed molecule can be incorporated to 3 ' end of marker molecules by being connected the checked order adaptor sequence comprising index sequence at sample.In the embodiment of method comprising the marker molecules (such as not having the DNA analog of phosphate backbone) can not carrying out enzymatic modification, index sequence is that the 3 ' end in analog marker molecules between synthesis phase is incorporated to.The sequencing library of two or more samples merged and is loaded in the flow cell of sequenator, with extensive parallel mode, they being checked order in step 540.In step 550, analyze all order-checking information and in step 560, check the integrality of source sample according to the order-checking information relevant with marker molecules.The integrality of multiple sources sample each is checked to be correlated with to make to belong to these genome sequences in each library be made up of the genome molecules of multiple sample and label sequence and distinguishing sequence by first being divided into groups by the sequence label relevant with same index sequence.Then divided into groups label and genome sequence are analyzed, to check the sequence obtained for marker molecules corresponding to the known unique sequence code added in the sample of corresponding source.If check sample integrality, then can analyze the order-checking information relevant with sample gene group nucleic acid, with the hereditary information that the experimenter provided with source sample derives from is relevant.For example, if check sample integrality, then the order-checking information relevant with genomic nucleic acids is analyzed to determine presence or absence chromosome abnormality.Be lack of consistency between the order-checking information of marker molecules and known array and represent that sample is chaotic, and do not consider the subsidiary order-checking information relevant with genome cfDNA molecule.
measure CNV and be used for pre-natal diagnosis
The acellular foetal DNA circulated in maternal blood and RNA can be used to the early stage Non-invasive Prenatal Diagnosis (NIPD) of the ever-increasing hereditary conditions of number, both can be used for management and also can help reproduction decision-making.The existence of the Cell-free DNA circulated in blood flow is known more than 50 years.Recently, in the parent blood flow of period of gestation, to have found to exist foetal DNA people such as (, Lancet (lancet) 350:485-487 [1997]) Lo (sieve) of circulation in a small amount.Be considered to be derived from dying placenta cells, the short-movie section that acellular foetal DNA (cfDNA) has been proved to be by length being typically less than 200bp forms, (the people such as Chan (old), clinical chemistry, 50:88-92 [2004]), early to only have 4 weeks pregnant in can be distinguished (the people such as Illanes (she draws Nice), Early Human Dev (early stage human developmental), 83:563-566 [2007]), and knownly within a few hours of childbirth, namely from maternal circulation, removed (the people such as Lo (sieve), Am J Hum Genet (American Journal of Human Genetics), 64:218-224 [1999]).Except cfDNA, can also distinguish the fragment of (cfRNA) of acellular fetal rna in parent blood flow, this is derived from gene transcribed in fetus or placenta.The new chance for NIPD is provided from the extraction of these fetus genetic key elements of maternal blood sample and analysis subsequently.
This method is a kind of method independent of polymorphism, it be in NIPD and it does not require to pick out fetus cfDNA fetus aneuploidy can be determined from parent cfDNA.In some embodiments, this aneuploidy is a kind of complete chromosome trisomy or monosomy, or a kind of partial trisomy or monosomy.Part aneuploidy by obtain or lost part chromosome causes, and contains chromosome imbalance, and these are uneven generates from unbalanced transposition, unbalanced inversion, deletion and insertion.So far, be trisomy 21 with the compatible modal known aneuploidy of life, i.e. Down's syndrome (DS), by existing, part or all of chromosome 21 causes for it.Under few cases, DS can be caused by a kind of heredity or accidental defect, and an all or part of additional copy of chromosome 21 becomes and is attached on another chromosome (normally chromosome 14), to form a single aberrant chromosomal thus.DS is associated with intellectual damage, serious difficulty of learning and the excess mortality rate that caused by chronic health problems (such as heart disease).Other aneuploidy with known clinical significance comprise Edward's syndrome (trisomy 18) and handkerchief tower syndrome (trisomy 13), and their life is in the past few months often fatefulue.The aneuploidy relevant to sex chromosome number is also known and comprises monosomy X, Turner syndrome (XO) such as in female newborn) and three times of X syndromes (XXX), and Ke Lin Fitow syndrome (XXY) in male neonate and XYY syndrome, they all from comprise the different phenotypic correlations that sterile and intellectual skill reduces and join.Monosomy X [45, X] is the common cause of Abortion, and it accounts for about 7% in spontaneous abortion.Based on 1-2/10,45, X (also referred to as Turner syndrome) life birth frequency of 000, estimates to survive term less than 45, the X carcasses of 1%.The Turner syndrome patient of about 30% is 45, the chimera (Hooke (Hook) and Patrick Warburton (Warburton), 1983) of X cell system and 46, XX clone or the clone containing rearrangement X chromosome.The phenotype of life birth baby relatively gentle (considering high embryonic death rate) and supposed to suffer from may all life birth women carrying containing two heterosomal clones of Turner syndrome.Monosomy X can betide in women with 45, X or with 45, X/46XX, and betides in the male sex with 45, X/46XY.Autosome monosomy in the mankind is considered to inconsistent with life generally; But, the report of considerable cytogenetics describes complete monosomy (people such as the blue baby (Vosranova) of Butterworth, molecular cytogenetics (Molecular Cytogen.) 1:13 [2008] of a chromosome 21 of life birth child; The people such as Zhu Tan (Joosten), pre-natal diagnosis (Prenatal Diagn.) 17:271-5 [1997]).Method described here can be used for pre-natal diagnosis these and other chromosome abnormality.
According to some embodiments, method disclosed here can determine the presence or absence of arbitrary chromosomal Trisomy in chromosome 1 to 22, X and Y.The Trisomy example that can detect according to the inventive method includes but not limited to trisomy 21 (T21; Down syndrome), trisomy 18 (T18; Edward's syndrome), trisomy 16 (T16), trisomy 20 (T20), trisomy 22 (T22; Cat's eye syndrome), trisomy 15 (T15; Pu Ruide Willi Syndrome), trisomy 13 (T13; Handkerchief tower syndrome), trisomy 8 (T8; Hua Kani syndrome (Warkany Syndrome)), trisomy 9 and XXY (gram Lay Lifei you special syndrome), XYY or XXX trisomy.Other autosomal complete trisomys are fatal when existing with non-chimeric state, but can be compatible with life when existing with chimeric state.Should be appreciated that, in fetus cfDNA, different complete trisomys (no matter existing with chimeric state or non-chimeric state) and partial trisomy can be measured according to the content of teaching provided at this.
The limiting examples of the partial trisomy that the inventive method can be utilized to measure includes but not limited to partial trisomy 1q32-44, trisomy 9p, trisomy 4 chimera, trisomy 17p, partial trisomy 4q26-qter, part 2p trisomy, partial trisomy 1q and/or partial trisomy 6p/ monosomy 6q.
Method disclosed here can also be used for measuring chromosome monosomy X, chromosome monosomy 21 and partial monoploidy, such as monosomy 13, monosomy 15, monosomy 16, monosomy 21 and monosomy 22, known they with pregnancy miscarry relevant.Method described here can also be utilized measure typically relevant with complete aneuploidy chromosomal partial monoploidy.The limiting examples of the deletion syndrome that can determine in the method in accordance with the invention comprises because of the syndrome caused by partial deletion of chromosome.The example of the excalation that can measure according to method described here includes but not limited to the excalation of chromosome 1,4,5,7,11,18,15,13,17,22 and 10, and it is described in hereinafter.
1q21.1 deletion syndrome or 1q21.1 (recurrent) micro-deleted be the rare deformity of chromosome 1.After deletion syndrome, also there is 1q21.1 and copy syndrome.Although deletion syndrome lacks a part of DNA at specified point, copy syndrome exists the similar portions of DNA two or three copies at identical point.Being referred to disappearance in document and copying is 1q21.1 copy number variation (CNV).1q21.1 disappearance can be relevant with TAR syndrome (thrombopenia companion absence of radius).
Wolf-He Qihuoen syndrome (Wolf-Hirschhorn syndrome, WHS) (OMIN#194190) is that a kind of disappearance with the semizygote of chromosome 4p16.3 relevant adjoins gene delection syndrome.Wolf-He Qihuoen syndrome is a kind of congenital malformation syndrome, it is characterized by utero, maldevelopment in various degree not enough with postnatal growth, have the cranium region feature of feature (in ' the Greece soldier helmet ' nose of appearance, outstanding, the epicanthus of high forehead, convex cheek, hypertelorism, high arc eyebrow, eyes, in short people, lower turn of the distinct corners of the mouth of face and micromandible) and epilepsy.
The excalation of chromosome 5 (is also called that 5p-or 5p subtracts, and is called that cat's cry syndrome (Cris du Chat syndrome (OMIN#123450)) is caused by galianconism (galianconism) (5p15.3-p15.2) disappearance because of chromosome 5.The baby suffering from this symptom often sends the high-pitched tone cry sounding and resemble mewing.The feature of this illness is disturbance of intelligence and growth delays, area of bed little (microcephalus), birth weight are low and Muscle tensility infancy weak (hypotonia), the heart defect that has the facial characteristics of feature and may exist.
Also be called the William-Bi Ren syndrome (Williams-Beuren Syndrome) of chromosome 7q11.23 deletion syndrome (OMIN 194050) be cause multisystem illness adjoin gene delection syndrome, caused by its semizygote because of 1.5Mb to the 1.8Mb on chromosome 7q11.23 disappearance, this semizygote disappearance is containing roughly 28 genes.
Also the Jacobsen syndrome (Jacobsen Syndrome) being called 11q deficit disorder is a kind of rare congenital conditions, its because of comprise the chromosome 11 of zone 11q24.1 stub area disappearance caused by.It can cause disturbance of intelligence, have the looks of feature and various practical problems, comprises heart defect and illness of bleeding.
The partial monoploidy being called as the chromosome 18 of monosomy 18p is a kind of rare chromosome illness, wherein all or part of galianconism (p) (monosomic) of deletion 18.This disease is typically characterised in that of short and small stature, the mental retardation of variable degrees, language retardation, the deformity in skull and face (cranium face) region, and/or extra body abnormality.For different case, relevant craniofacial defect can alter a great deal in scope and seriousness.
Change by the structure of chromosome 15 or copy number object the patient's condition caused and comprise peace lattice Mann syndrome and Pu Ruide-Willie Cotard, they relate to the loss of the gene activity in the same part (15q11-q13 region) of chromosome 15.Should be appreciated that in father and mother carrier, some transpositions and micro-deleted can be asymptomatic, but still the central genetic disease in offspring can be caused.Such as, carry the micro-deleted healthy mother of 15q11-q13 and can bear the child suffering from peace lattice Mann syndrome (a kind of serious neurodegenerative disease).Therefore, method described here, equipment and system may be used for identifying this type of excalation in fetus and other disappearances.
Partial monoploidy 13q is a kind of rare chromosomal disorders, when it occurs in one section of long-armed (q) of chromosome 13 disappearance (monomer).The baby suffering from partial monoploidy 13q during birth can show the deformity of LBW, head and face (craniofacial area), skeletal abnormality (especially hand and pin) and other body abnormalities.Mental retardation is the feature of this patient's condition.Suffer from birth in the individuality of this disease, infantile mortality ratio is very high.The case of nearly all partial monoploidy 13q does not all have obvious cause and occurs at random (sporadic).
Smith-Margie Nice syndrome (Smith-Magenis syndrome) (SMS-OMIM#182290) is because of caused by the disappearance on one of chromosome 17 copy or inhereditary material lose.This famous syndrome abnormal with hypoevolutism, mental retardation, feeblemindedness, congenital anomaly (such as heart and kidney defects) and neurobehavioral (such as severe sleep disorder and Self-injurious behavior) is relevant.Smith-Margie Nice syndrome (SMS) is because of caused by the 3.7-Mb intercalary delection in chromosome 17p11.2 under majority of case (90%).
22q11.2 deletion syndrome, also referred to as DiGeorge syndrome, is the syndrome caused by the disappearance of a bit of chromosome 22.This disappearance (22q11.2) occur in this to one of chromosome long-armed on chromosome near middle.The feature of this syndrome even also can change very wide in the member of same family, and affects a lot of parts of health.Characteristic sign and symptom can comprise inborn defect, as congenital heart disease, relate to the jaw defect of the neuromuscular problem (velopharyngeal insufficiency) of closedown, learning disorder, the Light Difference in facial characteristics, and recurrent infection the most commonly.Micro-deleted in chromosomal region 22q11.2 is associated with the risk increase of schizoid 20 to 30 times.
Disappearance on chromosome 10 galianconism is relevant with the phenotype of DiGeorge syndrome sample.The partial monoploidy of chromosome 10p is rare, but observes in the patient of part display DiGeorge syndrome feature.
In one embodiment, method described here, equipment and system are used to measure partial monoploidy, include but not limited to chromosome 1, 4, 5, 7, 11, 18, 15, 13, 17, the partial monoploidy of 22 and 10, the method can also be used to carry out measurement example as partial monoploidy 1q21.11, partial monoploidy 4p16.3, partial monoploidy 5p15.3-p15.2, partial monoploidy 7q11.23, partial monoploidy 11q24.1, partial monoploidy 18p, the partial monoploidy (15q11-q13) of chromosome 15, partial monoploidy 13q, partial monoploidy 17p11.2, the partial monoploidy (22q11.2) of chromosome 22, and partial monoploidy 10p.
Other partial monoploidy that can measure according to method described here comprise: unbalanced translocation t (8; 11) (p23.2; P15.5); 11q23 is micro-deleted; 17p11.2 lacks; 22q13.3 lacks; Xp22.3 is micro-deleted; 10p14 lacks; 20p micro-deleted [del (22) (q11.2q11.23)], 7q11.23 and 7q36 lack; 1p36 lacks; 2p is micro-deleted; 1 type neurofibromatosis (17ql1.2 is micro-deleted), Yq lack; 4p16.3 micro-deleted; 1p36.2 micro-deleted; 11q14 lacks; 19q13.2 is micro-deleted; Rubinstein-Taybi syndrome (Rubinstein-Taybi) (16p13.3 is micro-deleted); 7p21 is micro-deleted; Miller-Di Ke syndrome (Miller-Dieker syndrome) (17p13.3); And 2q37 is micro-deleted.Excalation can be the little disappearance of a chromosomal part, or it can be chromosomal micro-deleted, wherein monogenic disappearance can occur.
The part identified because of chromosome arm copy caused several copy syndrome (see OMIN [online mankind's Mendelian inheritance (Online Mendelian Inheritance in Man), checks online at ncbi.nlm.nih.gov/omim).In one embodiment, the inventive method can be used for the presence or absence copying and/or increase determining any one chromosome segment in chromosome 1 to 22, X and Y.The syndromic limiting examples that copies can determined according to the inventive method comprises copying of a part for chromosome 8,15,12 and 17, and it is described in hereinafter.
8p23.1 copy syndrome be because of human chromosomal 8 a region copy caused rare genetic block.This copies syndrome going out the incidence of disease in survivor and is estimated as 1/64,000, and be 8p23.1 deletion syndrome inverse.8p23.1 copies relevant from different phenotype, comprises speak slow, hypoevolutism, mile abnormality form, protrudes with forehead and one or more in arc eyebrow and congenital heart disease (CHD).
It is a kind of syndrome can differentiated clinically that chromosome 15q copies syndrome (Dup15q), and it is because of caused by the copying of chromosome 15q11-13.1.The baby suffering from Dup15q presents hypotonia (Muscle tensility is low), growth retardation usually; They may suffer from harelip and/or cleft palate or heart, kidney or other malformation from birth; They show the cognition of some degree slow/obstacle (mental retardation), to speak and delayed speech and sense organ process imbalance.
Pa Nisite-Kai Lian syndrome (Pallister Killian syndrome) is the result of extra #12 chromosomal material.Usually there is cell mixture (chimera), some has extra #12 material, and some is normal (not having 46 chromosomes of extra #12 material).Suffer from this syndromic baby and there is a lot of problem, comprise that severe mental retardation, Muscle tensility are low, the facial characteristics of " vulgarity " and forehead protrude.They tend to have very thin upper lip, thicker lower lip and brachyrhinia.Other health problems comprise epilepsy, feed bad, ankyloses, adulthood cataract, hearing loss and heart defect.Suffers from the syndromic people's lost of life of Pa Nisite-Kai Lian.
The individuality suffering from the hereditary symptom being appointed as dup (17) (p11.2p11.2) or dup17p carries extra hereditary information (be called as and copy) on the galianconism of chromosome 17.Copying of chromosome 17p11.2 causes Bai Tuoqi-Lu Puqi syndrome (Potocki-Lupski syndrome, PTLS), and it is the hereditary symptom just identified, and the case reported in medical literature only has tens examples.Have this patient copied often present Muscle tensility low, feed bad and infantile arrest of development, and the development presenting action and language milestone delays.The a lot of individualities suffering from PTLS are had any problem in pronunciation and Language Processing.In addition, patient may have and is similar to self-closing disease or the behavioural characteristic seen in autism spectrum disorder patient.The individuality suffering from PTLS may suffer from heart defect and sleep apnea.Comprise and copy the known Te-Ma Li-tell this disease (Charcot-Marie-Tooth disease) of causing investigating compared with large regions in the chromosome 17p12 of gene PMP22.
CNV is relevant with stillbirth.But, due to the genetic inherent limitations of conventional cell, therefore think that CNV causes stillbirth to be not by (people such as Harris (Harris), pre-natal diagnosis (Prenatal Diagn) 31:932-944 [2011]) that fully represent.As shown in example and herein described in other places, this method can the existence of determining section aneuploidy, the disappearance of such as chromosome segment and amplification, and can be used for differentiating and the presence or absence determining the CNV relevant with stillbirth.
determine complete fetal chromosomal aneuploidy
In one embodiment, the method for determining presence or absence any one or multiple different, complete fetal chromosomal aneuploidy in the parent test sample comprising fetus and maternal nucleic acids molecule is provided.Preferably, the method determines any four kinds of presence or absence or more and plants different, complete fetal chromosomal aneuploidies.The step of the method comprises: (a) obtains the sequence information for the fetus in parent test sample and maternal nucleic acids; And (b) use this sequence information in any one or more the interested chromosomes being selected from chromosome 1-22, X and Y each and identify a number of sequence label, and for in any one or more interested chromosomes described each a normalization chromosome sequence and identify a number of sequence label.This normalization chromosome sequence can be a monosome, or it can be the group chromosome being selected from chromosome 1-22, X and Y.The method use in the step (c) further for the number of each the described sequence label identified in any one or more interested chromosomes described and the number of described sequence label that identifies for each described normalization chromosome sequence for any one or more interested chromosomes described in each calculates a monosome dosage; And (d) by for each in any one or more interested chromosomes described each described monosome dosage with compare for each threshold value in any one or more interested chromosomes described, determine thus to test presence or absence any one or multiple complete, different fetal chromosomal aneuploidy in sample at this female parent.
In some embodiments, step (c) comprises and calculates a monosome dosage for each described interested chromosome, as the ratio of the sequence label number identified for each described interested chromosome with the sequence label number identified for each described interested chromosomal described normalization chromosome sequence.
In other embodiments, step (c) comprises and calculates a monosome dosage for each described interested chromosome, as the ratio of the sequence label number identified for each described interested chromosome with the sequence label number identified for each described interested chromosomal described normalization chromosome.In other embodiments, step (c) comprises by making the sequence label number obtained for interested chromosome associate with interested chromosomal length, and make to carry out associating for the interested chromosomal number of tags of corresponding normalization chromosome sequence and the length of normalization chromosome sequence to calculate an interested chromosomal sequence label ratio, and calculate a chromosome dosage as interested chromosomal sequence label density and the ratio for the sequence label density of normalization chromosome sequence for interested chromosome.This calculating is repeated for each of whole interested sequence.Step (a)-(d) can be repeated for the test sample from different female subject.
To be comprised in the parent test sample of the potpourri of fetus and parent Cell-free DNA molecule at one by an example of this embodiment and determine four kinds or more and plant complete fetal chromosomal aneuploidy, this example comprises: (a) is to checking order to obtain the sequence information of the Cell-free DNA molecule for fetus in the test sample and parent at least partially in Cell-free DNA molecule; B () uses this sequence information identify a number of sequence label for each interested any 20 that are selected from chromosome 1-22, X and Y or more chromosomes and identify a number of sequence label for an each normalization chromosome in described interested 20 or more individual chromosome; C () uses and calculates a monosome dosage for the number of each identified described sequence label in described interested 20 or more chromosome and the number of sequence label that identifies for each normalization chromosome for each in described interested 20 or more individual chromosome; And (d) compare for each monosome dosage of each in described interested 20 or more chromosome with for the threshold value of each in described interested 20 or more chromosomes, and determine that any 20 kinds of presence or absence or more plants different, complete fetal chromosomal aneuploidies in the test sample thus.
In another embodiment, as previously discussed for determining that the method for any one or more different, complete fetal chromosomal aneuploidies of presence or absence in parent test sample employs a normalization sector sequence for determining interested chromosomal dosage.In this case, the method comprises: (a) obtains the sequence information for the fetus in described sample and maternal nucleic acids; And (b) use described sequence information to identify a number of sequence label for each being selected from any one or more interested chromosomes of chromosome 1-22, X and Y, and for identifying a number of sequence label for each normalization chromosome sequence in any one or more interested chromosomes described.This normalization sector sequence can be chromosomal single section, or it can be one group of section from one or more coloured differently body.The method use in the step (c) further for each the described sequence label number identified in any one or more interested chromosomes described and the described sequence label number that identifies for described normalization sector sequence for any one or more interested chromosomes described in each calculates a monosome dosage; And (d) by for each in any one or more interested chromosomes described each described monosome dosage with compare for each threshold value in described one or more interested chromosome, and determine one or more different, complete fetal chromosomal aneuploidies of presence or absence in described sample thus.
In some embodiments, step (c) comprises and calculates a monosome dosage for each described interested chromosome, as the ratio of the sequence label number identified for each described interested chromosome with the sequence label number identified for each described interested chromosomal described normalization sector sequence.
In other embodiments, step (c) comprises by making the sequence label number obtained for interested chromosome associate with interested chromosomal length, and make to carry out associating for the interested chromosomal number of tags of corresponding normalization sector sequence and the length of normalization sector sequence to calculate an interested chromosomal sequence label ratio, and calculate a chromosome dosage as described interested chromosomal sequence label density and the ratio for the sequence label density of normalization sector sequence for described interested chromosome.This calculating is repeated for each of whole interested sequence.Step (a)-(d) can be repeated for the test sample from different female subject.
By determining that normalized chromosome value (NCV) provides the means of the chromosome dosage for more different sample sets, this makes the chromosome dosage in test sample associate to the mean value of the corresponding chromosome dosage in one group of qualified samples.Calculate this NCV, as:
NCV ij = x ij - u ^ j &sigma; ^ j
Wherein with the estimation mean value for the chromosome dosage of the jth in one group of qualified samples and standard deviation respectively accordingly, and for the viewed jth of a test sample i chromosome dosage.
In some embodiments, the fetal chromosomal aneuploidy that presence or absence at least one is complete is determined.In other embodiments, presence or absence at least two kinds is determined in a sample, at least three kinds, at least four kinds, at least five kinds, at least six kinds, at least seven kinds, at least eight kinds, at least nine kinds, at least ten kinds, at least ten one kinds, at least two kinds, stone, at least ten three kinds, at least ten four kinds, at least ten five kinds, at least ten six kinds, at least ten seven kinds, at least ten eight kinds, at least ten nine kinds, at least two ten kinds, at least two ten one kinds, at least two ten two kinds, at least two ten three kinds, or the fetal chromosomal aneuploidy that 24 kinds complete, 22 kinds in wherein complete fetal chromosomal aneuploidy correspond to any one or multiple autosomal complete chromosome aneuploidy, 23 and the 24 kind of chromosome aneuploidy correspond to the complete fetal chromosomal aneuploidy of chromosome x and Y.Because heterosomal aneuploidy can comprise tetrasomy, five body constituents and other polysomies, so the number of the different complete chromosome aneuploidy can determined according to this method can be at least 24 kinds, at least 25 kinds, at least 26 kinds, at least 27 kinds, at least 28 kinds, at least 29 kinds or at least 30 kinds of complete chromosome aneuploidy.Therefore, be relevant by the number of different complete chromosome aneuploidy determined to the interested chromosomal number selected for analyzing.
In one embodiment, really any one or more are different, complete fetal chromosomal aneuploidy employs for an interested chromosomal normalization sector sequence to fix in parent test sample presence or absence as previously discussed, and it is selected from chromosome 1-22, X and Y.In other embodiments, two or more interested chromosomes be selected from chromosome 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22, in X or Y any two or more.In one embodiment, any one or more the interested chromosomes being selected from chromosome 1-22, X and Y comprise at least two ten chromosomes being selected from chromosome 1-22, X and Y, and wherein determine presence or absence at least two ten kinds of different, complete fetal chromosomal aneuploidies.In other embodiments, any one or more the interested chromosomes being selected from chromosome 1-22, X and Y are whole chromosome 1-22, X and Y, and wherein determine the complete fetal chromosomal aneuploidy of presence or absence whole chromosome 1-22, X and Y.Confirmable complete different fetal chromosomal aneuploidies comprise complete chromosome trisomy, complete chromosome monosomy and complete chromosome polysomy.The example of complete fetal chromosomal aneuploidy such as, including, but not limited to any one or more autosomal trisomys, trisomy 2, trisomy 8, trisomy 9, trisomy 20, trisomy 21, trisomy 13, trisomy 16, trisomy 18, trisomy 22; Heterosomal trisomy, such as 47, XXY, 47XXX and 47XYY; Heterosomal tetrasomy, such as 48, XXYY, 48, XXXY, 48XXXX and 48, XYYY; Heterosomal five body constituents, such as 49, XXXYY, 49, XXXXY, 49, XXXXX, 49, XYYYY; And monosomy X.Other the complete fetal chromosomal aneuploidies can determined according to this method will be described below.
the fetal chromosomal aneuploidy of determining section
In another embodiment, the method for determining presence or absence any one or multiple fetal chromosomal aneuploidy that is different, part in the parent test sample comprising fetus and maternal nucleic acids molecule is provided.The step of the method comprises: (a) obtains the sequence information for the fetus in described sample and maternal nucleic acids; And (b) use this sequence information to identify a number of sequence label for each being selected from any one or more interested any one or more sections chromosomal of chromosome 1-22, X and Y, and for identifying a number of sequence label for each normalization sector sequence in any one or more sections described in any one or more interested chromosomes.This normalization sector sequence can be a chromosomal single section, or it can be one group of section from one or more coloured differently body.The number of the described sequence label that the method uses further the number of the described sequence label identified for any one or more interested any one or more sections chromosomal described and identifies for each described normalization sector sequence in step (c) calculates a single section dosage for each in any one or more interested any one or more sections chromosomal described; And (d) compare for each described monosome dosage of each in any one or more interested any one or more sections chromosomal described with for each a threshold value of any one or more interested any one or more chromosome segments chromosomal described, and determine one or more fetal chromosomal aneuploidies that are different, part of in described sample presence or absence thus.
In some embodiments, step (c) comprises and calculates a single section dosage for each in any one or more interested any one or more sections chromosomal, as the ratio of each the sequence label number identified in any one or more interested any one or more sections chromosomal with the sequence label number identified for the described normalization sector sequence of each in any one or more interested any one or more sections chromosomal described.
In other embodiments, step (c) comprises and calculates a sequence label ratio for an interested section as follows: associate with the length of interested section by making the number of the sequence label obtained for interested section, and the number of the label of the corresponding normalization sector sequence for interested section is associated with the length of normalization sector sequence, and the ratio of the sequence label density of a section dosage as interested section and the sequence label density for this normalization sector sequence is calculated for interested section.This calculating is repeated for each of whole interested sequence.Step (a)-(d) can be repeated for the test sample from different female subject.
By determining that a normalized section value (NSV) provides the means of the section dosage for more different sample sets, this makes the section dosage tested in sample associate to the mean value of the corresponding section dosage in one group of qualified samples.Calculate NSV, as:
NCV ij = x ij - u ^ j &sigma; ^ j
Wherein with the estimation mean value for the section dosage of the jth in one group of qualified samples and standard deviation accordingly, and x ijfor the viewed jth of a test sample i section dosage.
In some embodiments, the fetal chromosomal aneuploidy of a kind of part of presence or absence is determined.In other embodiments, determine in a sample presence or absence two kinds, three kinds, four kinds, five kinds, six kinds, seven kinds, eight kinds, nine kinds, ten kinds, 15 kinds, 20 kinds, 25 kinds, or more the fetal chromosomal aneuploidy of kind part.In one embodiment, any one interested section be selected from chromosome 1-22, X and Y is selected from chromosome 1-22, X and Y.In another embodiment, two or more sections interested being selected from chromosome 1-22, X and Y be selected from chromosome 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22, X or Y.In one embodiment, any one or more sections interested being selected from chromosome 1-22, X and Y comprise at least one being selected from chromosome 1-22, X and Y, five, ten, 15,20,25 or more sections, and wherein determine presence or absence at least one, five kinds, ten kinds, 15 kinds, 20 kinds, 25 kinds fetal chromosomal aneuploidies that are different, part.Confirmable fetal chromosomal aneuploidy that is different, part comprises partial replication, part multiplication, partial insertion and excalation.The example of the fetal chromosomal aneuploidy of part comprises autosomal partial monoploidy and partial trisomy.Autosomal partial monoploidy comprises the partial monoploidy of chromosome 1, the partial monoploidy of chromosome 4, the partial monoploidy of chromosome 5, the partial monoploidy of chromosome 7, the partial monoploidy of chromosome 11, the partial monoploidy of chromosome 15, the partial monoploidy of chromosome 17, the partial monoploidy of chromosome 18 and the partial monoploidy of chromosome 22.The fetal chromosomal aneuploidy of other parts can determined according to this method will be described below.
In any one embodiment above-mentioned, this test sample is the maternal sample being selected from blood, blood plasma, serum, urine and saliva sample.In some embodiments, this parent test sample is plasma sample.The nucleic acid molecules of this maternal sample be fetus with the potpourri of the Cell-free DNA molecule of parent.Can use as next generation's order-checking (NGS) illustrated by other places of the application carries out the order-checking of nucleic acid.In some embodiments, order-checking is the extensive parallel order-checking using the synthetic method by reversible dye-terminators to check order.In other embodiments, order-checking is connection method order-checking.In other other embodiments, order-checking is single-molecule sequencing.Optionally, before order-checking, an amplification step is carried out.
measure the CNV of clinical disease
Measure except inborn defect except early stage, method described here may be used for any exception of genetic sequence on expressing measured in genome.The abnormal number of genetic sequence in genome on expressing is relevant from different symptom.This type of symptom includes but not limited to cancer, infectiousness and autoimmune disease, the nervous system disease, metabolism and/or angiocardiopathy etc.
Correspondingly, in different embodiments, consider the purposes being used for method described herein to diagnose and/or monitor and/or treat these symptom.For example, these methods may be used for determining the presence or absence of disease, the progress monitoring disease and/or therapeutic scheme effect, determine pathogen (such as virus) nucleic acid presence or absence, determine the chromosome abnormality relevant with graft versus host disease (GVHD) and determine individual effect in forensic analysis.
the CNV of cancer
Verified, the Tumour DNA of measurable value is contained from the blood plasma of cancer patient and serum CRP, it can be recovered and be used as the substitute source of Tumour DNA, and the feature of tumour is the inappropriate number of aneuploidy or gene order or even complete chromosome.Determine that the difference of the amount of a given sequence (i.e. interested sequence) in the sample individual from can therefore for prognosis and the diagnosis of medical condition.In some embodiments, this method may be used for determining in suspection or knownly suffers from presence or absence chromosome aneuploidy in the patient of cancer.
In certain embodiments, aneuploidy is the genomic feature of experimenter and causes the overall raising of cancer liability.In certain embodiments, easily suffer from the specific cells (such as, tumour cell, former tumour neoplastic cell etc.) that tumour is formed or tumour formation liability improves and there is aneuploidy feature.Specific aneuploidy is relevant with particular cancers or particular cancers liability, as mentioned below.
Correspondingly, the different embodiments of said method provides the mensuration of interested sequence (such as clinical correlated series) copy number variation in the test sample to experimenter, wherein certain index provided there is cancer and/or cancer liability that makes a variation of copy number.In certain embodiments, this sample comprises the potpourri of the nucleic acid deriving from two or more cells.In one embodiment, this mixtures of nucleic acids derives from normal cell and cancer cell, and cancer cell derives from the experimenter suffering from Medical Condition (such as cancer).
The development of cancer is often with the change of whole chromosome number, i.e. complete chromosomal aneuploidy, and/or the change of chromosome segment number, i.e. part aneuploidy, these changes result from process people such as (, Switzerland's medical science weekly (Swiss Med Weekly) 2011:141:w13170) Toms (Thoma) being called as chromosome instability (CIN).It is believed that a lot of solid tumor (such as breast cancer) by the accumulation of some genetic freaies from develop into transfer.[people such as Sa Tuo (Sato), cancer research (Cancer Res.), 50:7184-7189 [1990]; The people such as Jian Sima (Jongsma), clinicopathologia magazine: molecular pathology (J Clin Pathol:Mol Path) 55:305-309 [2002])].This type of genetic freak may give the drug-fast subsidiary ability of Hypertrophic advantage, genetic instability and fast development and angiogenesis enhancing, proteolysis and transfer when accumulating.Genetic freak may affect recessive " tumor suppressor gene " or the oncogene of dominance action.Lack and cause the recombinant of loss of heterozygosity,LOH (LOH) to be considered to play a major role in tumour progression by disclosing the tumor suppression allele of sudden change.
CfDNA has been found in diagnosis and has suffered from the circulation system of the patient of malignant diseases, these malignant diseases include but not limited to the lung cancer (people such as Pa Saka (Pathak), clinical medicine 52:1833-1842 [2006]), the prostate cancer (people such as Xue Hua Zibaqi (Schwartzenbach), Clinical Cancer Research (Clin Cancer Res) 15:1032-8 [2009]) and breast cancer (people such as Xue Hua Zibaqi, can obtain [2009] online at breast-cancer-research.com/content/11/5/R71).Differentiate that the genomic instability relevant with cancer (can determine according to the circulation cfDNA of cancer patient) is a kind of potential diagnosis and prognosis instrument.In one embodiment, method described herein is used to working sample and (such as comprises the sample of mixtures of nucleic acids, these nucleic acid source suffer from or the known experimenter suffering from cancer, such as cancer, sarcoma, lymthoma, leukaemia, gonioma and blastoma in suspecting) in the CNV of one or more interested sequence.In one embodiment, this sample be peripheral blood derive the plasma sample of (treated), this peripheral blood may comprise the potpourri of the cfDNA deriving from normal cell and cancer cell.In another embodiment, the biological sample determining whether there is CNV is needed to be the cell deriving from other biological tissue, if there is cancer, then this cell comprises the potpourri of cancer cell and non-cancerous cells, other biological tissue includes but not limited to biological fluid, such as serum, sweat, tears, phlegm, urine, phlegm, ear effluent, lymph, saliva, celiolymph, irrigating solution, bone marrow floater liquid, vaginal fluid, transcervical irrigating solution, brain fluid, ascites, milk, respiratory tract, the juice of enteron aisle and genitourinary tract, and leucopheresis sample, or at biopsy, in cotton swab or smear.In other embodiments, this biological sample is stool (ight soil) sample.
Method described herein is not limited to the analysis of cfDNA.Should be appreciated that, similar analysis can be carried out to cell DNA sample.
In different embodiments, interested sequence comprises known or suspects the nucleotide sequence worked in cancer development and/or progress.The example of interested sequence is included in the nucleotide sequence of amplification or disappearance in cancer cell as mentioned below, such as chromosome and/or chromosome segment completely.
total CNV number and risk of cancer.
Common cancer SNPs and common cancer CNVs by that analogy makes disease risks only produce small increase separately.But in general, they may cause risk of cancer to raise in fact.About this point, should point out the germline of the large DNA fragmentation reported obtain with lose as individuality easily suffer from neuroblastoma, prostate cancer oophoroma relevant with colorectal cancer, breast cancer and BRCA1 factor (see people such as such as gram Lays strange (Krepischi), breast cancer research (Breast Cancer Res.), 14:R24 [2012]; The people such as Di Sijin (Diskin), nature (Nature) 2009,459:987-991; The people such as Liu (Liu), cancer research (Cancer Res) 2009,69:2176-2179; The people such as Lu Situo (Lucito), carcinobiology and treatment (Cancer Biol Ther) 2007,6:1592-1599; The people such as Si En (Thean), gene chromosome cancer (Genes Chromosomes Cancer) 2010, the people such as 49:99-106: Fan Katachalan (Venkatachalam), international journal of cancer (Int J Cancer) 2011,129:1635-1642; With people such as Jis former (Yoshihara), gene chromosome cancer (Genes Chromosomes Cancer) 2011,50:167-177).Should point out, often the CNVs (common CNVs) found in healthy population is considered to work in cancer aetiology (see such as silk woods (Shlien) and wheat gold (Malkin) (2009) genome medical science (Genome Medicine), 1 (6): 62).In a research test, the following hypothesis of test: common CNVs and the malignant diseases (people such as silk woods (Shlien), institute of NAS periodical (Proc NatlAcad Sci USA) 2008,105:11264-11269) relevant, this is the mapping of a kind of each known CNV, its locus is consistent with the locus of true cancer related gene (as breathed out the people such as gold (Higgins), classified in nucleic acids research (Nucleic Acids Res) 2007,35:D721-726).These CNV are called " cancer C NVs ".(the people such as silk woods (Shlien) initial analyzing, institute of NAS periodical (Proc Natl Acad Sci USA) 2008,105:11264-11269), use A Feimei 500K (Affymetrix 500K) array to integrate (its average probe spacing is as 5.8kb) and assess 770 healthy genomes.Due to (the people (2006) such as thunder Tang (Redon) that thinks that CNVs is excluded in gene regions generally, nature (Nature) 2006,444:444-454), therefore find surprisingly, in many people of a restricted publication of international news and commentary entitled population, 49 cancer genes are directly contained by CNV or overlap.In front ten genes, cancer C NVs can be found in the people of four or more.
Therefore think, CNV frequency can be used as the tolerance (see such as U.S. Patent Publication No.: 2010/0261183A1) of risk of cancer.CNV frequency can measure simply by organic constitutive gene group or it can represent the part deriving from one or more tumours (neoplastic cell) (if these exist).
In certain embodiments, be used in this and measure for the method described in copy number variation the CNVs number testing in sample (such as comprising the sample of composition (germline) nucleic acid) or in mixtures of nucleic acids (such as germline nucleic acid and the nucleic acid deriving from neoplastic cell).The CNVs number identified in test sample improves (such as compared with reference value) and represents that experimenter has risk of cancer or has cancer liability.Should be understood that reference value can become with appointment population.Should also be understood that the resolution of the method depended on for measuring CNV frequency and other parameters becomes by the absolute value of CNV frequency amplification.Typically, determine CNV frequency increase to reference value at least about 1.2 times represent risk of cancer (see such as U.S. Patent Publication No.: 2010/0261183 A1), such as, CNV frequency increase at least 1.5 times of reference value or about 1.5 times or larger (2 to 4 times of such as reference value) be risk of cancer improve index (such as, with normal health with reference to groupy phase than).
Also think that the structure variation (compared with reference value) determining mammalian genes group represents risk of cancer.In this context, in one embodiment, the CNV frequency of term " structure variation " useful mammal is multiplied by mammiferous average CNV size (bp) and is defined.Therefore, high structure variation mark is by because CNV frequency increases and/or because large genomic nucleic acids disappearance occurring or copying.Therefore, in certain embodiments, method described herein is used to measure the CNVs number tested in sample (such as, comprising the sample of composition (germline) nucleic acid), to measure copy number variation size and number.In certain embodiments, be greater than about 1 megabasse or be greater than about 1.1 megabasses or be greater than about 1.2 megabasses or be greater than about 1.3 megabasses or be greater than about 1.4 megabasses or be greater than about 1.5 megabasses or be greater than about 1.8 megabasses or the structure variation total score be greater than in the genomic DNA of about 2 megabasse DNA represents risk of cancer.
These methods are considered to the tolerance that can provide any risk of cancer, these cancers include but not limited to acute and chronic leukemia, lymthoma, a lot of solid tumors of interstitial or epithelial tissue, the cancer of the brain, breast cancer, liver cancer, cancer of the stomach, colon cancer, B cell lymphoma, lung cancer, bronchiolar carcinoma, colorectal cancer, prostate cancer, breast cancer, cancer of pancreas, cancer of the stomach, oophoroma, carcinoma of urinary bladder, the cancer of the brain or central nervous system cancer, peripheral neverous system cancer, cancer of the esophagus, cervical carcinoma, melanoma, the cancer of the uterus or carcinoma of endometrium, carcinoma of mouth or pharynx cancer, liver cancer, kidney, cancer of bile ducts, small intestine or appendix cancer, salivary-gland carcinoma, thyroid cancer, adrenal, osteosarcoma, chondrosarcoma, sarcolipoma, carcinoma of testis, and malignant fibrous histiocytoma, and other cancers.
complete chromosomal aneuploidy.
As noted above, in cancer, there is high-frequency aneuploidy.In some research of examination volume cell copy number variation (SCNAs) prevalence rate in cancer, have been found that full arm SCNAs or 1/4th genomes of whole chromosome SCNAs on typical cancer cell of aneuploidy have impact (see people such as the such as soft gold of uncles (Beroukhim), nature (Nature) 463:899-905 [2010]).Whole chromosome variation is repeatedly observed in some cancer types.Such as, acute myelocytic leukemia (acute myeloid leukaemia 10% to 20%, AML) in case, and see in some solid tumor (comprising Ai Wen sarcoma (Ewing ' s Sarcoma) and fiber-like knurl) chromosome 8 acquisition (see people such as such as Bayer Naders (Bamard), leukaemia (Leukemia) 10:5-12 [1996]; The people such as thatch Ritz (Maurici), cancer genet and cytogenetics (Cancer Genet.Cytogenet.) 100:106-110 [1998]; The very people such as (Qi), cancer genet and cytogenetics (Cancer Genet.Cytogenet.) 92:147-149 [1996]; The people such as Bayer Nader D.R. (Bamard, D.R.), blood (Blood) 100:427-434 [2002]; Etc..In human cancer, chromosome obtains and the schematic but non-limiting catalogue of losing is shown in table 1.
table 1: in human cancer, the chromosomal acquisition of schematic particular rendition and loss are (see people (2012) such as such as Gordon (Gordon), naturally science of heredity (Nature Rev.Genetics) is summarized, 13:189-203).
In different embodiments, method described herein can be used for detecting and/or quantize with cancer generally about and/or with the whole chromosome aneuploidy of concrete related to cancer.Therefore, such as, in certain embodiments, the whole chromosome aneuploidy detecting and/or quantize to it is characterized in that with the acquisition shown in table 1 or loss is considered.
the horizontal chromosome segment copy number variation of arm.
Pattern (people such as woods (Lin), the cancer research (Cancer Res) 68,664-673 (2008) of the variation of arm horizontal copy number across a large amount of cancer sample have been reported in multinomial research; The people such as George (George), PLoS ONE2, e255 (2007); The people such as Dai meter Che Lisi (Demichelis), gene chromosome cancer (Genes Chromosomes Cancer) 48:366-380 (2009); The people such as the soft gold of uncle (Beroukhim), nature (Nature.) 463 (7283): 899-905 [2010]).Observed in addition and observed, the frequency of the horizontal copy number variation of arm reduces along with chromosome arm length.According to this tendency adjustment, the preferential strong evidence obtaining or lose of major part chromosome arm performance, but across multiple cancer pedigree, both all rare (see people such as the such as soft gold of uncles (Beroukhim), nature (Nature) 463 (7283): 899-905 [2010]).
Therefore, in one embodiment, the horizontal CNVs of arm (comprising the CNVs of a chromosome arm or a chromosome arm substantially) that is used in working sample of method described here.In CNVs in the test sample comprising composition (germline) nucleic acid, CNVs can be determined, and in a little composition nucleic acid, the horizontal CNVs of arm can be identified.In certain embodiments, in the sample comprising mixtures of nucleic acids (such as, deriving from Normocellular nucleic acid and the nucleic acid deriving from neoplastic cell), the horizontal CNVs of arm (if existence) is identified.In certain embodiments, sample source is in suspection or the known experimenter suffering from cancer (such as, cancer, sarcoma, lymthoma, leukaemia, gonioma, blastoma and similar cancer).In one embodiment, sample be peripheral blood derive the plasma sample of (treated), this peripheral blood can comprise the potpourri of the cfDNA deriving from normal cell and cancer cell.In another embodiment, for determining whether the biological sample of the CNV existed derives from cell, if there is cancer, then these cells comprise the potpourri of cancer cell from other biological tissue and non-cancerous cells, this other biological tissue includes but not limited to biological fluid, such as serum, sweat, tears, phlegm, urine, phlegm, ear effluent, lymph, saliva, celiolymph, irrigating solution (ravages), bone marrow floater liquid, vaginal fluid, transcervical irrigating solution, brain fluid, ascites, milk, respiratory tract, enteron aisle and genitourinary tract juice, and leukapheresis sample, or at biopsy, in cotton swab or smear.In other embodiments, biological sample is ight soil (ight soil) ight soil (ight soil) sample.
In different embodiments, represent that the CNVs that cancer exists or risk of cancer increases includes but not limited to the horizontal CNVs of arm cited in table 2 through identifying.Illustrated by table 2, some CNVs comprising the acquisition of substantive arm level represents to there is cancer or the increase of some risk of cancer.Therefore, such as, 1q obtains and represents that Acute Lymphoblastic Leukemia (ALL), breast cancer, GIST, HCC, lung NSC, medulloblastoma, melanoma, MPD, oophoroma and/or prostate cancer exist or risk increases.3q obtains and represents that esophageal squamous cell carcinoma, lung SC and/or MPD exist or risk increases.7q obtains and represents that colorectal cancer, glioma, HCC, lung NSC, medulloblastoma, melanoma, prostate cancer and/or kidney exist or risk increases.7p obtains and represents that breast cancer, colorectal cancer, esophageal adenocarcinoma, glioma, HCC, lung NSC, medulloblastoma, melanoma and/or kidney exist or risk increases.20q obtains and represents that breast cancer, colorectal cancer, dedifferentiated liposarcoma, esophageal adenocarcinoma, esophageal squamous cell carcinoma, glioma cancer, HCC, lung NSC, melanoma, oophoroma and/or kidney etc. existence or risk increase.
Similarly, illustrated by table 2, some CNVs comprising substantive arm level loss represents that some cancer exists and/or risk increases.Therefore, such as, 1p loses and represents that gastrointestinal stromal tumors exists or risk increases.4q loses and represents that colorectal cancer, esophageal adenocarcinoma, lung sc, melanoma, oophoroma and/or kidney exist or risk increases.17p loses and represents that breast cancer, colorectal cancer, esophageal adenocarcinoma, HCC, lung NSC, lung SC and/or oophoroma etc. existence or risk increase.
table 2:16 kind cancer subtypes (breast cancer, colorectal cancer, dedifferentiated liposarcoma, esophageal adenocarcinoma, esophageal squamous cell carcinoma, GIST (gastrointestinal stromal tumors), glioma, HCC (hepatocellular carcinoma), lung NSC, lung SC, medulloblastoma, melanoma, MPD (myeloproliferative disorders), oophoroma, prostate cancer, Acute Lymphoblastic Leukemia (ALL) and kidney) each in the variation of remarkable arm horizontal chromosome segment copy number (see the such as uncle soft gold people such as (Beroukhim), nature (Nature) (2010) 463 (7283): 899-905).
Between the variation of arm horizontal copy number, the example of relation is intended to for illustrative and not restrictive.The horizontal copy number variation of other arms and its cancer are known to those skilled in the art.
less (such as focus) copy number makes a variation.
As noted above, in certain embodiments, method described here can be used for the presence or absence measuring chromosome amplification.In some embodiments, chromosome amplification is one or more whole chromosomal acquisitions.In other embodiments, chromosome amplification is the acquisition of one or more section in chromosome.Still in other other embodiments, chromosome amplification is the acquisition of two or more sections in two or more chromosomes.In different embodiments, chromosome amplification can relate to the acquisition of one or more oncogene.
The dominant acting gene be associated with human entity knurl typically via the expression of process LAN or change to play their effect.Gene magnification is a kind of common mechanism causing gene expression to be raised.Evidence from cytogenetical study shows, in the people's breast cancer more than 50%, there occurs remarkable amplification.It should be noted that most, the amplification being positioned at the proto-oncogene human epidermal growth factor receptor 2 (HER2) on chromosome 17 (17 (17q21-q22)) causes the process LAN of the HER2 acceptor on cell surface, thus causing excessive in breast cancer and other malignant tumours and the signal of the dysregulation (people such as Park (Piao), Clinical Breast Cancer (clinical breast cancer), 8:392-401 [2008]).In other human malignancies, had been found that multiple oncogene is amplified.In human tumor, the example of cellular oncogene amplification comprises the amplification of the following: promyelocytic leukemia clone HL60, and the c-myc in small-cell carcinoma of the lung, former neuroblastoma (stage III and IV), neuroblastoma cell line, Retinoblastoma Cells system and primary tumo(u)r, and the N-myc in small cell lung cancer cell system and tumour, L-myc in small cell lung cancer cell system and tumour, c-myb in acute myelocytic leukemia and in colon carcinoma cell line, epidermoid carcinoma cell, and the former c-erbb gone crazy in glioma, lung, colon, bladder, and the c-K-ras-2 in the primary carcinoma of rectum, N-ras (Varmus (Wa Musi) H. in breast cancer cell line, Ann Rev Genetics (science of heredity yearbook), 18:553-612 (1984), [quote people such as Watson (fertile gloomy), Molecular Biology of the Gene (molecular biology of gene) (the 4th edition, Benjamin/Cummings Publishing Co. company 1987)].
It is the common etiology of very eurypalynous cancer that oncogene is copied, and P70-S6 kinases 1 increases and breast cancer is exactly this situation.In such cases, genetic replication to betide in body cell and only affects the genome of cancer cell self (instead of whole organism), and the impact for any filial generation is afterwards then much smaller.Other examples of the oncogene of increasing in human cancer comprise MYC, ERBB2 (EFGR) in breast cancer, CCND1 (cycle element D1), FGFRi and FGFR2; MYC and ERBB2 in cervical carcinoma; HRAS, KRAS and MYB in cervical carcinoma; MYC, CCND1 and MDM2 in cancer of the esophagus; CCNE, KRAS and MET in cancer of the stomach; ERBB1 and CDK4 in glioblastoma; CCND1, ERBB1 and MYC in head and neck cancer; CCND1 in hepatocellular carcinoma; MYCB in neuroblastoma; MYC: ERBB2 and AKT2 in oophoroma; MDM2 and CDK4 in sarcoma; MYC in small-cell carcinoma of the lung.In one embodiment, the inventive method can be used for determining the amplification presence or absence with the oncogene of related to cancer.In certain embodiments, the oncogene of increasing is relevant with breast cancer, cervical carcinoma, colorectal cancer, cancer of the esophagus, cancer of the stomach, glioblastoma, head and neck cancer, hepatocellular carcinoma, neuroblastoma, oophoroma, sarcoma and small-cell carcinoma of the lung.
In one embodiment, this method can be used to determine a kind of chromosome deficiency of presence or absence.In some embodiments, one or more complete chromosome is lost in this chromosome deficiency.In other embodiments, chromosomal one or more section is lost in this chromosome deficiency.In other other embodiments, two or more two or more sections chromosomal are lost in this chromosome deficiency.This chromosome deficiency can relate to loses one or more tumor suppressor gene.
The chromosome deficiency relating to tumor suppressor gene is considered to play a kind of vital role in the development and progress of solid tumor.Retinoblastoma tumor suppressor gene (Rb-1) (being positioned at chromosome 13q14) is the tumor suppressor gene of characterization the most widely.The people such as Rb-1 gene outcome (nuclear phosphoprotein of a kind of 105kDa) obviously plays an important role in cell cycle regulating (Howe (person of outstanding talent according to), Proc Natl Acad Sci (institute of NAS periodical) (U.S.), 87:5883-5887 [1990]).By by a point mutation also or the allelic inactivation of these two genes of chromosome deficiency cause the change of Rb albumen or lose expression.Have been found that Rb-i gene alteration does not exist only in retinoblastoma, but also be present in other malignant tumours, as osteosarcoma, the small-cell carcinoma of the lung (people such as Rygaard (Rui Gede), Cancer Res (cancer research), 50:5312-5317 [1990)]) and breast cancer.Restriction fragment length polymorphism (RFLP) research shows, this type of tumor type lost heterozygosity through the 13q that is everlasting, prompting is due to total chromosome deficiency, one of allele of Rb-1 gene the is lost (people such as Bowcock (Bai Kaoke), Am J Hum Genet (American Journal of Human Genetics), 46:12 [1990]).Comprise and relate to chromosome 6 and other are with x linkedly copying, lacking and the abnormal region showing chromosome 1 of chromosome 1 of unbalanced translocation, particularly q21-1q32 and 1p11-13, may hold fall ill chronic and advanced stage excrescent with hemoblastosis's property goes up relevant oncogene or the tumor suppressor gene (people such as Caramazza (OK a karaoke club horse Sa), Eur J Hematol (European hematology magazine), 84:191-200 [2010]).Hemoblastosis's property neoplasm is also associated with the disappearance of chromosome 5.The complete loss of chromosome 5 or intercalary delection are modal chromosome abnormalities in myelodysplastic syndrome (MDS).Del (the 5q)/5q-MDS patient be separated has the prognosis more favourable than those patients suffering from extra caryogram defect, and they tend to development hemoblastosis's property neoplasm (MPN) and acute myelocytic leukemia.The frequency that unbalanced chromosome 5 lacks has drawn an idea, that is: 5q holds one or more tumor suppressor gene, and these genes play basic effect in the growth of candidate stem cell/hemopoietic progenitor cell (HSCsHPC) controls.The cytogenetics in the region (CDR) of usual disappearance maps the candidate tumor suppressor concentrating on 5q31 and 5q32 and identify, comprise ribosomal subunit RPS14, transcription factor Egr1/Krox20 and cytoskeleton remodeling proteins, α-Lian albumen (Eisenmann (Ai Siman), Oncogene (oncogene), 28:3429-3441 [2009]).Cytogenetics and the allelotype research of fresh and tumor cell line are verified, from the allelic loss in the some clear and definite region (comprising 3p25,3p21-22,3p21.3,3p12-13 and 3p14) on chromosome 3p be in the main epithelioma of the wide spectrum of cancer at lung cancer, breast cancer, kidney, head and neck cancer, oophoroma, cervix cancer, colon cancer, cancer of pancreas, cancer of the esophagus, carcinoma of urinary bladder and other organs involved the earliest with modal genomic abnormality.Some tumor suppressor genes have been mapped to chromosome 3p region, and think that intercalary delection or promoter high methylation are prior at the developing 3p of cancer or loss ((Angeloni (An Geluoni) D. of complete chromosome 3, Briefings Functional Genomics (functional genomics bulletin), 6:19-39 [2007]).
The neonate and the children that suffer from Down syndrome (DS) usually present inborn symptomatic leukemia and have the risk of the increase of acute myelocytic leukemia and Acute Lymphoblastic Leukemia.Chromosome 21 (holding about 300 genes) can involve various structures distortion, such as, transposition in leukaemia, lymthoma and solid tumor, disappearance and amplification.In addition, identified be arranged in gene on chromosome 21 tumour occur the vital role that rises.The isostructural distortion of company of the number of entities of chromosome 21 is associated with leukaemia, and specific gene comprises RUNX1, TMPRSS2 and TFF, they are positioned at 21q, work (Fonatsch (Feng Nacike) C in tumour occurs, Gene Chromosomes Cancer (gene, chromosome and cancer), 49:497-508 [2010]).
Consider foregoing, in different embodiments, method described here can be used for determining section CNVs, and these CNVs are known to be comprised one or more oncogene or tumor suppressor gene and/or knownly increases relevant with cancer or risk of cancer.In certain embodiments, the CNVs in the test sample comprising composition (germline) nucleic acid can be measured, and can identification section in those composition nucleic acid.In certain embodiments, identification section CNVs (if existence) in the sample comprising mixtures of nucleic acids (such as, deriving from Normocellular nucleic acid and the nucleic acid deriving from neoplastic cell).In certain embodiments, sample source is in suspection or the known experimenter suffering from cancer (such as, cancer, sarcoma, lymthoma, leukaemia, gonioma, blastoma etc.).In one embodiment, sample be peripheral blood derive the plasma sample of (treated), this peripheral blood can comprise the potpourri of the cfDNA deriving from normal cell and cancer cell.In another embodiment, for determining whether the biological sample that there is Dare CNV derives from cell, if there is cancer, then this cell comprises the potpourri of cancer cell from other biological tissue and non-cancerous cells, this other biological tissue includes but not limited to biological fluid, such as serum, sweat, tears, phlegm, urine, phlegm, ear effluent, lymph, saliva, celiolymph, irrigating solution (ravages), bone marrow floater liquid, vaginal fluid, transcervical irrigating solution, brain fluid, ascites, milk, respiratory tract, enteron aisle and genitourinary tract juice, with leukapheresis sample, or at biopsy, in cotton swab or smear.In other embodiments, biological sample is ight soil (ight soil) sample.
For determining that the CNVs that cancer exists and/or risk of cancer increases can comprise amplification or disappearance.
In different embodiments, represent that the CNVs that cancer exists or risk of cancer increases comprises the one or more amplifications shown in table 3 through identifying.
table 3: it is characterized by with the schematic of the amplification of related to cancer but nonrestrictive chromosome segment.Cited cancer types is the soft gold of uncle (Beroukhim), identify in nature (Nature) 18:463:899-905 those.
In certain embodiments, through identifying, the Amplification with above described in (at this) or respectively, represents that there is the CNVs that cancer or risk of cancer increase comprises the one or more disappearances shown in table 4.
table 4: it is characterized by with the schematic of the disappearance of related to cancer but nonrestrictive chromosome segment.Cited cancer types is the soft gold of uncle (Beroukhim), identify in nature (Nature) 18:463:899-905 those.
Knownly the etiologic etiological gene of cancer (such as tumor suppressor, oncogene etc.) is involved through identifying that the aneuploidy (aneuploidy such as, identified in table 3 and table 4) characterizing various cancers can comprise.But these aneuploidy can also be detected to identify relevant gene unknown in advance.
Such as, the people such as the soft gold of above-mentioned uncle (Beroukhim) utilize GRAIL (gene relationship between the Loci20 involved) algorithm of funtcional relationship (between a kind of muca gene group region), change assess potential oncogene according to copy number.Based on the text similarity in the viewpoint that the open summary of all papers mentioning gene works with common pathway at some target gene, GRAIL evaluates ' correlativity ' of each gene in one group of genome area and the gene in other regions.These methods allow to identify/characterize incoherent with concrete cancer in advance gene in dispute.Table 5 illustrates and is knownly positioned at identified amplification section and the target gene of predicted gene, and table 6 illustrates and is knownly positioned at identified deleted segment and the target gene of predicted gene.
table 5: known or prediction is present in schematic in the region of the amplification that it is characterized in that in various cancers but non-limiting chromosome segment and gene (see people such as the such as soft gold of above-mentioned uncle (Beroukhim)).
table 6: known or prediction is present in schematic in the region of the amplification that it is characterized in that in various cancers but non-limiting chromosome segment and gene (see people such as the such as soft gold of above-mentioned uncle (Beroukhim)).
In different embodiments, consider and be used in this and know the CNV that method for distinguishing identification comprises the section of amplification region or the gene identified in table 5, and/or be used in this and know the CNV that method for distinguishing identification comprises the section of absent region or the gene identified in table 6.
In one embodiment, these methods described here provide a kind of means to the relevance between the degree evaluated gene magnification and tumour and develop.Amplification and/or disappearance and the association between carcinoma stage or grade can be important for prognosis, because this type of information can form the definition of hereditary tumor grade, this prediction can have the following course of disease of the more late tumor of the worst prognosis better.In addition, the information about earlier amplifications and/or deletion events can be useful when these events being associated in the predictive factors of progression of disease subsequently.
Can by by the gene magnification of this method identification and disappearance and other known parameters (as tumor grade, medical history, Brd/Urd label index, Hormonal States, lymphatic metastasis, tumor size, life span with from epidemiology and biostatistics research other tumor characteristics obtainable) associate.Such as, need can comprise atypical hyperplasia, the carcinoma in situ of conduit, the cancer of stage I-III and lymphnode metastatic by the Tumour DNA that this method carries out testing, to allow to be identified in amplification and the relevance between disappearance and stage.The association made can make effective therapeutic intervention become possibility.Such as, the gene of a process LAN can be contained in the region of consistent amplification, and perhaps its product can accept therapeutic attachment (such as, growth factor receptor tyrosine kinase p185 hER2).
In different embodiments, these methods described here are by determining, from primary cancer to the copy number variation of those nucleotide sequences of cell transferring to other positions, to may be used for identifying the amplification relevant to the resistance to the action of a drug and/or deletion events.If gene magnification and/or disappearance are the one performances of the karyotype instability allowing the resistance to the action of a drug to develop rapidly, so compared with the tumour of the patient from chemosensitivity, the more amplifications in the primary tumo(u)r of the patient from chemoresistant and/or disappearance will be expected.Such as, if the amplification of specific gene causes drug-fast development, so will expect to obtain consistent amplification around the region of those genes in the tumour cell of the patient from chemoresistant instead of in primary tumo(u)r.Can allow to identify the patient that or can not benefit from complementary therapy in the discovery of gene magnification and/or the relevance between disappearance and development of drug resistance.
Be similar to for determine to determine in maternal sample presence or absence complete and/or part fetal chromosomal aneuploidy illustrated by mode, method described here, equipment and system can be used to determine to determine in any Patient Sample A comprising nucleic acid (such as DNA or the cfDNA) Patient Sample A of maternal sample (comprise be not) presence or absence complete and/or the chromosome aneuploidy of part.This Patient Sample A can be as any biological sample type illustrated by other places of the application.Preferably, this sample is obtained by non-invasive process.Such as, this sample can be blood sample, or its serum and plasma part.Alternately, this sample can be urine samples or excrement sample.In other other embodiments, this sample is a kind of Tissue biopsy samples.In all cases, this sample comprises nucleic acid, such as cfDNA or genomic DNA, and it is purified, and uses above-mentioned any NGS sequence measurement to check order.
Both chromosome aneuploidy of the complete and part be associated with the formation of cancer and progress can be determined according to this method.
In different embodiments, when using method determination cancer described here existence and/or risk to increase, can relative to one or more chromosomes of measured CNV by data normalization.In certain embodiments, can relative to one or more chromosome arms of measured CNV by data normalization.In certain embodiments, can relative to the one or more concrete section of measured CNV by data normalization.
Except the effect of CNV in cancer, CNV is also relevant with increasing common complex disease, comprises human immunodeficiency virus (HIV), autoimmune disease and a series of Neuropsychiatric disorders.
cNV in communicable disease and autoimmune disease
Up to now, large quantity research has reported the relation (people such as Fan Cini (Fanciulli), clinical genetics (Clin Genet) 77:201-213 [2010]) between CNV and HIV, asthma, Crow grace disease (Crohn ' s disease) and other autoimmune conditions relating to inflammation and immunoreactive gene.Such as, CNV in CCL3L1 with HIV/AIDS neurological susceptibility (CCL3L1,17q11.2 lack), rheumatoid arthritis (CCL3L1,17q11.2 lacks) and Kawasaki disease (Kawasaki disease) (CCL3L1,17q11.2 copy) implication; CNV in HBD-2 has reported and has easily suffered from colonic Crohn disease (HDB-2,8p23.1 lack) and psoriasis (HDB-2,8p23.1 lack); CNV in FCGR3B has shown the glomerulonephritis (FCGR3B easily suffered from systemic loupus erythematosus, 1q23 lacks, 1q23 copies), anti-neutrophil's matter antibody (ANCA) relevant blood vessel inflammation (FCGR3B, 1q23 lacks), and the risk suffering from rheumatoid arthritis increases.Two kinds of inflammation or autoimmune disease is had at least to show relevant with the CNV of different genes seat.Such as, Crohn disease is not only low relevant with the copy number of HDB-2, and relevant with the common deletion polymorphism of the IGRM upstream region of gene of coding p47 immunity correlative GTP ase family member.Except relevant with FCGR3B copy number, also report that SLE neurological susceptibility significantly increases in the experimenter that complement ingredient C4 copy number is lower.
Relation between the genomic deletion of GSTM1 (GSTM1,1q23 lack) and GSTT1 (GSTT1,22q11.2 lack) locus and allergic asthma risk increase has been reported in a large amount of independent studies.In some embodiments, method described here can be used for the presence or absence determining the CNV relevant with inflammation and/or autoimmune disease.Such as, these methods can be used for determining to suspect the existence of CNV in the patient suffering from HIV, asthma or Crohn disease.The CNV example relevant with this type of disease includes but not limited to the disappearance at 17q11.2,8p23.1,1q23 and 22q11.2 place, and the copying of 17q11.2 and 1q23 place.In some embodiments, the inventive method can be used for the existence determining CNV in gene, and these genes include but not limited to CCL3L1, HBD-2, FCGR3B, GSTM, GSTT1, C4 and IRGM.
neural CNV disease
Newborn CNV and hereditary CNV and the relation between some common neurologys and psychiatric disorders have been reported in some case of self-closing disease, schizophrenia and epilepsy and neurodegenerative disease, such as Parkinson's, amyotrophic lateral sclerosis (ALS) and autosomal dominant stages alzheimer's disease (people such as Fan Cini (Fanciulli), clinical genetics (Clin Genet) 77:201-213 [2010]).In the patient suffering from self-closing disease and autism spectrum disorder (ASD), observed the cytogenetics exception existing at 15q11-q13 place and copy.According to infantile autism gene group plan alliance (Autism Genome project Consortium), the 154CNV comprising some recurrent CNV is positioned at chromosome 15q11-q13 also or new genomic locations, comprise chromosome 2p16,1q21, and with the lucky syndrome of Smith-Ma about the 17p12 in, overlapping with ASD region.Micro-deleted or micro-the copying of recurrent on chromosome 16p11.2 emphasizes following observations: newborn CNV detects at the locus of the gene of known controllable cynapse differentiation and the release of regulation and control Glutamatergic neurotransmitter, such as SHANK3 (22q13.3 disappearance), the overhanging albumen 1 of presynaptic membrane (NRXN1,2p16.3 lacks) and neuroglia quality (NLGN4, Xp22.33 lack).Schizophrenia is also relevant with multiple newborn CNV.Micro-deleted and the micro-copy package relevant with schizophrenia excessively represents containing the gene belonging to neurodevelopment and glutamatergic pathways, and multiple CNV that prompting affects these genes can directly form schizoid pathogenesis, such as ERBB4,2q34 disappearance; SLC1A3,5p13.3 lack; RAPEGF4,2q31.1 lack; CIT, 12.24 disappearances; With the polygenes with newborn CNV.CNV is also relevant with other nervous disorders, and (SMN1,5q12.2.-q13.3 lack to comprise epilepsy (CHRNA7,15q13.3 lack), Parkinson's (SNCA 4q22 copies) and ALS; With SMN2 disappearance).In some embodiments, method described here can be used for the presence or absence determining the CNV relevant with the nervous system disease.Such as, these methods can be used for determining that suspection suffers from the existence of the CNV in the patient of self-closing disease, schizophrenia, epilepsy, neurodegenerative disease (such as Parkinson's), amyotrophic lateral sclerosis (ALS) or autosomal dominant stages alzheimer's disease.Method can be used for the CNV measuring the gene relevant with the nervous system disease (including but not limited to any one in autism spectrum disorder (ASD), schizophrenia and epilepsy), and the CNV of the gene relevant with neurodegenerative illness (such as Parkinson's).The CNV example relevant with this type of disease includes but not limited to copying of 15q11-q13,2p16,1q21,17p12,16p11.2 and 4q22 place, and 22q13.3,2p16.3, Xp22.33,2q34,5p13.3,2q31.1,12.24, the disappearance at 15q13.3 and 5q12.2 place.In some embodiments, these methods can be used for the existence determining CNV in gene, and these genes include but not limited to SHANK3, NLGN4, NRXN1, ERBB4, SLC1A3, RAPGEF4, CIT, CHRNA7, SNCA, SMN1 and SMN2.
cNV and metabolic or cardiovascular disease
Relation between metabolic and cardiovascular sick feature (such as familial hypercholesterolemia (FH), atherosclerosis and coronary artery disease) and CNV has been reported in (people such as Fan Cini (Fanciulli), clinical genetics (Clin Genet) 77:201-213 [2010]) in large quantity research.Such as, observe germline at LDLR gene (LDLR, 19p13.2 lack/copy) place of not carrying some FH patient that other LDLR suddenly change and reset (being mainly disappearance).Another example is the LPA gene of coding apolipoproteins (a) (apo (a)), and the plasma concentration of apolipoproteins (a) is relevant with the risk of coronary artery disease, myocardial infarction (MI) and apoplexy.The plasma concentration comprising the apo (a) of lipoprotein Lp (a) variability is between individuals more than 1000 times, and this variability 90% determines at LPA locus in heredity, wherein plasma concentration and Lp (a) isotype size and height change ' kringle 4 ' repetitive sequence number (scope 5 to 50) is proportional.These data show that the CNV at least two kinds of genes can associate with cardiovascular risk.Method described here can in large-scale research the specific relation for searching for CNV and cardiovascular disorder.In some embodiments, the inventive method can be used for the presence or absence determining the CNV relevant with metabolic or cardiovascular disease.Such as, the inventive method can be used for determining to suspect the existence of CNV in the patient suffering from familial hypercholesterolemia.Method described here can be used for the CNV measuring the gene relevant with metabolic or cardiovascular disease (such as hypercholesterolemia).The CNV example relevant with this type of disease includes but not limited to that the 19p13.2 in LDLR gene lacks/copies, and the amplification in LPA gene.
measure the complete chromosomal aneuploidy in Patient Sample A
In one embodiment, providing method, determining presence or absence any one or multiple different, complete chromosome aneuploidy for testing the patient comprising nucleic acid molecules in sample.In some embodiments, the method determination presence or absence any one or multiple different, complete chromosome aneuploidy.The step of the method comprises: (a) obtains the sequence information for the patient's nucleic acid tested patient in sample; And (b) use this sequence information to identify a number of sequence label for each being selected from any one or more interested chromosomes of chromosome 1-22, X and Y, and for for described interested any one or more each normalization chromosome sequence in chromosome identify a number of sequence label.This normalization chromosome sequence can be a monosome, or it can be the group chromosome being selected from chromosome 1-22, X and Y.The method use in the step (c) further for the number of each the described sequence label identified in any one or more interested chromosomes described and the number of described sequence label that identifies for each described normalization chromosome sequence for described interested any one or more each calculates a monosome dosage in individual chromosome; And (d) by for each in any one or more interested chromosomes described each described monosome dosage with for described interested any one or more each threshold value in chromosome compare, determine thus to test presence or absence any one or multiple different, complete patient's chromosome aneuploidy in sample this patient.
In some embodiments, step (c) comprises and calculates a monosome dosage for each described interested chromosome, as the ratio of the sequence label number identified for each described interested chromosome with the sequence label number identified for each described interested chromosomal described normalization chromosome sequence.
In other embodiments, step (c) comprises and calculates a monosome dosage for each described interested chromosome, as the ratio of the sequence label number identified for each described interested chromosome with the sequence label number identified for each described interested chromosomal described normalization chromosome.In other embodiments, step (c) comprising: associate with interested chromosomal length by making the number of the sequence label obtained for interested chromosome, and the number of tags for interested chromosomal corresponding normalization chromosome sequence is associated with the length of normalization chromosome sequence, a sequence label ratio is calculated for an interested chromosome, and calculate a chromosome dosage for this interested chromosome, as interested chromosomal sequence label density and the ratio for the sequence label density of normalization chromosome sequence.This calculating is repeated for each of whole interested sequence.Step (a)-(d) can be repeated for the test sample from different patient.
To be tested in sample the cancer patient comprising Cell-free DNA molecule by an example of this embodiment and determine one or more complete chromosome aneuploidy, this example comprises: (a) is to checking order to obtain the sequence information for patient's Cell-free DNA molecule in the test sample at least partially in Cell-free DNA molecule; B () uses this sequence information identify a number of sequence label for each interested any 20 that are selected from chromosome 1-22, X and Y or more chromosomes and identify a number of sequence label for each described interested 20 or more individual chromosomal normalization chromosomes; C () uses the number of the described sequence label identified for each described interested 20 or more chromosomes and the number of sequence label that identifies for each normalization chromosome calculates a monosome dosage for each interested 20 or more individual chromosomes; And (d) will compare with for each interested 20 or more chromosomal threshold values for each described interested 20 or more chromosomal each monosome dosage, and determine that testing any 20 kinds of presence or absence in sample or more patient plants different, complete chromosome aneuploidy thus.
In another embodiment, as previously discussed for determining that the method for testing any one or more different, complete chromosome aneuploidy of presence or absence in sample patient employs a normalization sector sequence to determine interested chromosomal dosage.In this example, the method comprises: (a) obtains the sequence information for the nucleic acid in described sample; And (b) use described sequence information to identify a number of sequence label for each being selected from any one or more interested chromosomes of chromosome 1-22, X and Y, and for for described interested any one or more each normalization sector sequence in chromosome identify a number of sequence label.This normalization sector sequence can be a chromosomal single section, or it can be one group of section from one or more coloured differently body.The method employ in step (c) further for the number of each the described sequence label identified in any one or more interested chromosomes described and the number of described sequence label that identifies for described normalization sector sequence for described interested any one or more each calculates a monosome dosage in a chromosome; And (d) by for each in any one or more interested chromosomes described each described monosome dosage with compare for each threshold value in described one or more chromosome interested, and determine one or more different, complete chromosome aneuploidy of presence or absence in Patient Sample A thus.
In some embodiments, step (c) comprises and calculates a monosome dosage for each described interested chromosome, as the ratio of the sequence label number identified for each described interested chromosome with the sequence label number identified for each described interested chromosomal described normalization sector sequence.
In other embodiments, step (c) comprising: associate with interested chromosomal length by making the sequence label number obtained for interested chromosome, and the number of tags for interested chromosomal corresponding normalization sector sequence is associated with the length of normalization sector sequence, a sequence label ratio is calculated for an interested chromosome, and calculate a chromosome dosage for this interested chromosome, as interested chromosomal sequence label density and the ratio for the sequence label density of normalization sector sequence.This calculating is repeated for each of whole interested sequence.Step (a)-(d) can be repeated for the test sample from different patient.
By determining that a normalized chromosome value (NCV) provides a kind of means of the chromosome dosage for more different sample sets, it makes the chromosome dosage in test sample associate to the mean value of the corresponding chromosome dosage in one group of qualified samples.Calculate NCV, as:
NCV ij = x ij - &mu; ^ j &sigma; ^ j
Wherein with estimation average and the standard deviation of the jth time chromosome dosage of qualified samples collection respectively, and x ijit is the jth time chromosome dosage observed value of test sample i.
In some embodiments, the chromosome aneuploidy that presence or absence one is complete is determined.In other embodiments, in a sample, determine presence or absence two kinds, three kinds, four kinds, five kinds, six kinds, seven kinds, eight kinds, nine kinds, ten kinds, 11 kinds, 12 kinds, 13 kinds, 14 kinds, 15 kinds, 16 kinds, 17 kinds, 18 kinds, 19 kinds, 20 kinds, 21 kinds, 22 kinds, 23 kinds or 24 kinds of complete chromosome aneuploidy, wherein 22 kinds of complete chromosome aneuploidy correspond to any one or more autosomal complete chromosome aneuploidy; 23 and the 24 kind of chromosome aneuploidy correspond to the complete chromosome aneuploidy of chromosome x and Y.Because aneuploidy can comprise trisomy, tetrasomy, five body constituents and other polysomies, and in various disease and in the different phase of same disease, the number of complete chromosome aneuploidy changes, the number of the complete chromosome aneuploidy determined according to this method is at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30complete, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100 or more plant chromosome aneuploidy.The System Core type analysis of tumour discloses, chromosome number in cancer cell is alterable height, scope is from hypodiploid (being considerably less than 46 chromosomes) to tetraploid and hypertetraploid (up to 200 chromosomes) (Storchova (stoke watt) and Kuffer (withered no), J Cell Sci (cell science magazine), 121:3859-3866 [2008]).In some embodiments, the method comprise determine suspect from one or known suffer from the sample of the patient of cancer (such as colon cancer) to exist not or do not exist up to 200 kinds or more plant chromosome aneuploidy.These chromosome aneuploidy comprise loses one or more complete chromosome (hypodiploid), obtains and comprises trisomy, tetrasomy, five body constituents and other polysomic complete chromosome.As illustrated by other places of the application, acquisition and/or the loss of chromosome segment can also be determined.The method is applicable to determine from suspection or knownly suffer from aneuploidy as different in presence or absence in the sample of the patient of the cancer illustrated by other places of the application.
In some embodiments, any one in chromosome 1-22, X and Y can at the interested chromosome determining to test patient as above in presence or absence any one in sample or multiple different, complete chromosome aneuploidy.In other embodiments, two or more interested chromosomes are selected from chromosome 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22, in X or Y any two or more.In one embodiment, any one or more the interested chromosomes being selected from chromosome 1-22, X and Y comprise at least two ten chromosomes being selected from chromosome 1-22, X and Y, and wherein determine that presence or absence at least two ten kinds is different, complete chromosome aneuploidy.In other embodiments, any one or more the interested chromosomes being selected from chromosome 1-22, X and Y are whole chromosome 1-22, X and Y, and wherein determine the complete chromosome aneuploidy of presence or absence whole chromosome 1-22, X and Y.The complete chromosome monosomy of any one or more in chromosome 1-22, X and Y can be comprised by complete, the different chromosome aneuploidy determined; The complete Trisomy of any one or more in chromosome 1-22, X and Y; Any one or more complete chromosome tetrasomy in chromosome 1-22, X and Y; The complete chromosome five body constituents of any one or more in chromosome 1-22, X and Y; And other the complete chromosome polysomies of any one or more in chromosome 1-22, X and Y.
measure the chromosome dyad aneuploidy in Patient Sample A
In another embodiment, providing multiple method, determining presence or absence any one or multiple chromosome aneuploidy that is different, part for testing the patient comprising nucleic acid molecules in sample.The step of the method comprises: (a) obtains the sequence information for the patient's nucleic acid in described sample; And (b) use this sequence information to identify a number of sequence label for each being selected from any one or more interested chromosomes of chromosome 1-22, X and Y, and for identifying a number of sequence label for each normalization sector sequence in any one or more sections described in any one or more interested chromosomes.This normalization sector sequence can be a chromosomal single section, or it can be one group of section from one or more coloured differently body.The number that the method employs the described sequence label identified for each any one or more interested any one or more sections chromosomal described further in step (c) and the number of described sequence label identified for each described normalization sector sequence calculate a single section dosage for each in any one or more interested any one or more sections chromosomal described; And (d) compare for each described monosome dosage in each any one or more interested any one or more sections chromosomal described and a threshold value for each any one or more interested any one or more chromosome segments chromosomal described, and determine one or more chromosome aneuploidy that are different, part of in described sample presence or absence thus.
In some embodiments, step (c) comprising: calculate a single section dosage for any one or more interested any one or more sections chromosomal each, as the ratio of the sequence label number that the sequence label number identified for any one or more interested any one or more sections chromosomal each identifies with the described normalization sector sequence for each any one or more interested any one or more sections chromosomal described.
In other embodiments, step (c) comprising: associate with the length of interested section by making the number of the sequence label obtained for interested section, and the number of tags of the corresponding normalization sector sequence for interested section is associated with the length of normalization sector sequence, a sequence label ratio is calculated for an interested section, and calculate a section dosage for this interested section, the ratio of the sequence label density as interested section and the sequence label density for normalization sector sequence.This calculating is repeated for each of whole interested sequence.Step (a)-(d) can be repeated for the test sample from different patient.
By determining that normalized section value (NSV) provides a kind of means of the section dosage for more different sample sets, this makes the section dosage in test sample associate to the mean value of the corresponding section dosage in one group of qualified samples.Calculate NSV, as:
NCV ij = x ij - &mu; ^ j &sigma; ^ j
Wherein with estimation average and the standard deviation of the jth time section dosage of qualified samples collection respectively, and x ijit is the jth time section dosage observed value of test sample i.
In some embodiments, the chromosome aneuploidy of a kind of part of presence or absence is determined.In other embodiments, determine in a sample presence or absence two kinds, three kinds, four kinds, five kinds, six kinds, seven kinds, eight kinds, nine kinds, ten kinds, 15 kinds, 20 kinds, 25 kinds, or more the chromosome aneuploidy of kind part.In one embodiment, any one interested section be selected from chromosome 1-22, X and Y is selected from chromosome 1-22, X and Y.In other embodiments, two or more the interested sections being selected from chromosome 1-22, X and Y are selected from chromosome 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22, in X or Y any two or more.In one embodiment, any one or more the interested sections being selected from chromosome 1-22, X and Y comprise at least one being selected from chromosome 1-22, X and Y, five, ten, 15,20,25,50,75,100 or more sections, and wherein determine presence or absence at least one, five kinds, ten kinds, 15 kinds, 20 kinds, 25 kinds, 50 kinds, 75 kinds, 100 kinds, or more plant chromosome aneuploidy that is different, part.Confirmable chromosome aneuploidy that is different, part comprises partial replication, part multiplication, partial insertion and excalation.
Can be used for determining that the sample of presence or absence chromosome aneuploidy (part or complete) in patients can be any biological sample illustrated by other places of the application.Can be used for the sample type of the aneuploidy determined in patient or sample and will depend on the type of the known or disease suffered under a cloud of patient.Such as, fecal specimens can be selected as DNA source to determine the aneuploidy that presence or absence is associated with colorectal cancer.The method is also applicable to tissue sample described herein.Preferably, this sample is the biological sample obtained by non-invasive mode, such as plasma sample.As illustrated by other places of the application, next generation's order-checking (NGS) that can be used in illustrated by other places of the application carries out the order-checking of the nucleic acid in Patient Sample A.In some embodiments, order-checking is the extensive parallel order-checking using the synthetic method by reversible dye-terminators to check order.In other embodiments, order-checking is connection method order-checking.In other other embodiments, order-checking is single-molecule sequencing.Optionally, before order-checking, an amplification step is carried out.
In some embodiments, determine presence or absence aneuploidy in a patient body, this patient suspects to suffer from as the cancer illustrated by other places of the application, the such as cancer of lung cancer, breast cancer, kidney, head and neck cancer, oophoroma, cervix cancer, colon cancer, cancer of pancreas, cancer of the esophagus, carcinoma of urinary bladder and other organs, and hematologic cancers.Hematologic cancers comprises marrow, blood and lymphoid cancer, and lymphatic system comprises lymph node, lymphatic vessel, tonsillotome, thymus gland, spleen and alimentary canal lymphoid tissue.Start from leukaemia and the myeloma of marrow, and to start from lymphoid lymthoma be modal blood cancer types.
The determination of one or more chromosome aneuploidy of presence or absence can be made and the following is not limited in Patient Sample A, that is: the neurological susceptibility of patient to a kind of concrete cancer is determined, as the part of routine screening in the middle of patient that is known or that do not know a kind of cancer of susceptible to determine the cancer that presence or absence is concerned about, prognosis to disease is provided, the needs of assessment to complementary therapy, and determine progress or the recovery of disease.
genetic counselling
Fetal chromosomal abnormalities be cause miscarrying, the main cause of congenital anomaly and perinatal death (people such as Wellesley (Wellesley), Europe human genetics magazine (Europ.J.Human Genet.), 20:521-526 [2012]; The people such as Changgong (Nagaoka), summarize science of heredity (Nature Rev.Genetics) 13:493-504 [2012] naturally).Since introducing amniocentesis, introduce chorionic villi sampling (CVS) subsequently, pregnant woman has had the right to obtain the information (ACOG puts into practice No. 77th, bulletin (ACOG Practice Bulletin No.77): obstetrics and gynaecology (Obstet Gynecol) 109:217-227 [2007]) about fetal chromosomal situation.When obtaining enough organizing, the sizing of cytogenetics caryogram is carried out to the fetal cell obtained from these programs or chorionic villi, make diagnostic sensitivity and specificity very high (about 99%) (Halle graceful (Hahnemann) and Fu Jisile (Vejerslev) in most cases, pre-natal diagnosis (Prenat Diagn.), 17:801-8201997; NICHD amniocentesis research national registration JAMA 236:1471-1476 [1976]).But these programs also bring risk to fetus and pregnant woman, and (Audi wins people such as (Odibo), obstetrics and gynaecology (Obstet Gynecol) 112:813-819 [2008]; The people such as Audi rich (Odibo), obstetrics and gynaecology (Obstet Gynecol) 111:589-595 [2008]).
In order to alleviate these risks, develop a series of Prenatal Screening algorithm, for occurring most common foetal trisomy-T21 (Down syndrome) and trisomy 18 (T18, edward's syndrome), and their possibility of the trisomy 13 (T13, handkerchief tower syndrome) of less degree is by women's classification.Examination typically relates to the multiple biochemical analysis thing in different time point measurement maternal serum, fetus nuchal translucency (NT) is measured in conjunction with ultrasonic inspection, and the merging of other maternal factors (such as age), to produce risk score.According to its development for many years and improvement and depend on and when give examination (only the gravidic most junior three month or second three months, continuous or abundant integration) and how to give examination (only serum or serum and NT combine), develop the options menu (ACOG puts into practice No. 77th, bulletin (ACOG Practice Bulletin No.77): obstetrics and gynaecology (Obstet Gynecol) 109:217-227 [2007]) with different recall rate (65% to 90%) and high screening positive rate (5%).
For patient, after this multi-step process, gained information or " risk score " may make it puzzled and cause its anxiety, particularly when comprehensive consulting lacks.Finally, when women makes decision, weigh result for because of the risk of miscarriage caused by invasive program.The better non-invasive manner obtained about the clearer and more definite information of fetal chromosomal situation assists to make decision in this context.This type of Noninvasive improved means obtained about the clearer and more definite information of fetal chromosomal situation is considered to provide by method described herein.
In different embodiments, consider genetic counselling as the part using analysis described herein, particularly under clinical settings.On the contrary, aneuploidy detection method described herein can be included in an option providing under antenatal care and correlated inheritance consulting background.
Therefore, in different embodiments, method described herein as preliminary examination (such as, for the women of the conceived risk of establishing before having) or can be provided as the secondary examination of those women be positive to " routine " examination.In certain embodiments, consider the antenatal test of Noninvasive described herein (NIPT) method and comprise genetic counselling part in addition, and and/or in NIPT method described herein, optionally or be clearly incorporated to genetic counselling and conceived " management ".
Such as, in certain embodiments, women's conceived risk of establishing before there are one or more.This type of risk includes but not limited to following one or more:
1) maternal age was more than 35 years old, although point out, about 80% children suffering from Down syndrome are from birth given birth to by the women less than 35 years old.
2) there are the previous fetus/children of autosome trisomy.Depend on trisomy type, previous pregnancy whether spontaneous abortion and first maternal age when occurring and pre-natal diagnosis afterwards time maternal age, think that again incidence is about 1.6 times to about 8.2 times of maternal age risk.
3) sex chromosomal abnormality had previous fetus/children---not every sex chromosomal abnormality has maternal source, and not all there is recurrent risk.When they occur, then incidence be about 1.6 times to about 1.5 times of parent age risk.
4) the parental generation carrier of chromosome translocation.
5) the parental generation carrier of chromosome inversion.
6) parental generation aneuploidy or mosaic.
7) some auxiliary procreation technology is used.
In such situations, obedience states different consideration, mother, such as through consulting with the people such as doctor, genetic counselling teacher, can be provided and use method described herein, determine the presence or absence of fetus aneuploidy (such as trisomy 21, trisomy 18, trisomy 13, monosomy X etc.) for Noninvasive.In this, should point out that method described herein is considered to effective, even if in the gravidic most junior three month.Therefore, in certain embodiments, consider and used NIPT method described herein as far back as 8 weeks time, and in different embodiments, at about 10 weeks or more late.
In certain embodiments, those women that can be positive to " routine " examination provide method described herein as secondary examination.Such as, in certain embodiments, pregnant woman may present textural anomaly, such as fetus cystic hygroma, or the nuchal translucency improved, such as, detected as use ultrasonography.Typically, carried out the ultrasound examination of fault of construction at 18 weeks by 22 weeks, and particularly when observing scrambling, can with the coupling of fetal ultrasound cardiogram.Contemplated that when observing extremely (such as, " routine " examination is positive) time, mother, such as through consulting with the people such as doctor, genetic counselling teacher, can be provided and use method described herein, determine the presence or absence of fetus aneuploidy (such as trisomy 21, trisomy 18, trisomy 13, monosomy X etc.) for Noninvasive.
Therefore, in different embodiments, consider genetic counselling, wherein provide (NIPT) described herein to analyze an ingredient of the exploitation/design as antenatal care, conceived management and/or labor scheme.There is provided NIPT as secondary examination by those women of be positive to routine screening (or establishing risk before other), estimate the number of times that can reduce unnecessary amniocentesis and CVS program.But, because letter of consent is the important component part of NIPT, so the necessity of genetic counselling improves.
Because NIPT positive findings (using method described herein) is more similar to the positive findings of amniocentesis or CVS, therefore before this test, when genetic counselling, should provide to women and can determine its chance the need of the information of this degree.NIPT genetic counselling before test also should comprise discussing/advising and confirms via CVS, amniocentesis, the abnormality test result of umbilical cord puncture etc. (depending on conceptional age), thus can give with due regard to the arrangement of time desired by result, for the planning after testing according to national genetic counselling Shi Xuehui (NSGC, USA) about the statement of this theme (see people such as such as Dai Fusi (Devers), antenatal test/the non-invasive prenatal diagnosis of Noninvasive: position (by the NSGC public policy council) NSGC position statement 2012 (Noninvasive Prenatal Testing/Noninvasive Prenatal Diagnosis:the position of the National Society of Genetic Counselors (by NSGC Public Policy Committee) .NSGC Position Statements 2012 of national genetic counselling Shi Xuehui, the people such as Berne (Benn), pre-natal diagnosis (Prenat Diagn), 31:519-522 [2011]) because all chromosome of NIPT at present not examination or hereditary conditions, so its may can not replace risk assessment and pre-natal diagnosis of standard.Contemplated that the patient of other factors (such as, the ultrasound wave result of study of some exception) with hint chromosome abnormality should accept genetic counselling, wherein provide the option of conventional authentication diagnostic test to them, and regardless of NIPT result.Women should also be appreciated that when genetic counselling for some patient, and NIPT the possibility of result quantity of information is little.
Compared with amniocentesis, the chromosome composition of fetus is typically represented in the detection of aneuploidy, but may represent in restricted placenta aneuploidy or restricted placenta mosaic (CPM) in some cases, the NIPT of said method is used perhaps more to be similar to CVS.In the CVS result of today, there is CPM in the situation of about 1% to 2%, and some women are after CVS, in the amniocentesis of more late conceptional age experience, makes to create a difference between the placenta aneuploidy contrast fetus aneuploidy of clear separation.Along with NIPT implements more extensive, therefore estimate that CPM situation can produce the positive NIPT result that may can not be confirmed by invasive program (particularly amniocentesis) subsequently of some.Again, in different embodiments, consider this information (such as by doctor, genetic counselling teacher etc.) under the background of genetic counselling and present to patient.
Will be appreciated that, in different embodiments, an ingredient of genetic counselling may be recommend to make a definite diagnosis mode, inform risk level arrangement of time, and make a definite diagnosis mode for difference and carry out arrangement of time, the input about the value of information provided by these verification methods can be used to provide, particularly under the background selecting the conceived time.In different embodiments, genetic counselling can also establish a scheme, is used for monitoring conceived (such as subsequent ultrasonic ripple inspection, extra doctor pay a home visit etc.), and is used for setting up a series of decision point in due course.In addition, genetic counselling can be advised and contribute to developing labor scheme, and labor scheme can comprise such as about the personnel involved by childbirth place (such as family, hospital, specialized facilities etc.), childbirth place, the obtainable third party's nursing of baby etc.
Although more than discuss and concentrate on the ingredient (and be perhaps second instrument) of method described herein as pre-natal diagnosis, and if but along with the result success of clinical experience accumulation from comparative studies to routine screening, NIPT method so described herein may replace existing examination scheme and may be used as main tool.
Also contemplate method described herein and the pregnancy for multifetation is found purposes.
Typically, estimate that genetic counselling (such as mentioned above) is by doctor's (such as main doctor, obstertrician etc.) and/or provided by genetic counselling teacher or other qualified medical professions.In certain embodiments, provide advice face-to-face, but it should be understood that in some cases, provide advice by remote access (such as, by text, mobile phone, application program of mobile phone, flat computer application program, the Internet etc.).
It will also be appreciated that in certain embodiments, genetic counselling or an one ingredient can be sent by computer system.Such as, can provide that " " system, it is in response to test result, from the instruction of medical treatment and nursing supplier and/or in response to inquiry (such as from patient query) and provide genetic counselling information (such as mentioned above) in intelligence suggestion.In certain embodiments, information will be the specific clinical information provided by doctor, health care system and/or patient.In certain embodiments, information can iteratively provide.Therefore, such as, the inquiry that patient can provide " and if so on " and system can return messages, such as the connotation of diagnose option, risk factor, arrangement of time and Different Results.
In certain embodiments, information can provide (such as, presenting on the computer screen) in temporary mode.In certain embodiments, information can provide in non-transitory mode.Therefore, such as, information can print, and (such as, alternatively and/or the menu of suggestion, it is optionally with arranging correlation time) and/or is stored in computer-readable media (such as magnetic medium, such as local hard drive, server etc.; Optical media; Flash memory etc.) on.
Should be appreciated that, this type systematic is typically configured to provide enough securities, to maintain patients ' privacy, such as, according to the act.std in industry.
The above discussion of genetic counselling is intended to for illustrative and not restrictive.Genetic counselling is a branch well confirmed in medical science, and belongs in the skill of practitioner about the combination of the consulting ingredient of analysis described herein.In addition, it should be understood that the character of genetic counselling and relevant information and suggestion probably changes along with developing in this field.
determine fetus mark
Fetus mark defining method is disclosed in U.S. Patent Application Publication 2010-0010085 (117.201), U.S. Patent Application Publication 2011-0201507 (120.201), U.S. Patent Application No. 13/365, in 240 (submissions on February 2nd, 2012) and U.S. Patent Application No. 13/445,778 (submission on April 12nd, 2012).Expounding adequately of the technology for determining fetus mark can be found in these files.
Method described herein allows to the fetus mark determined in sample, and this sample comprises the potpourri of fetus and maternal nucleic acids, or more generally, is the potpourri of the nucleic acid deriving from two different genes groups.The object for this reason discussed, will describe parent and fetal nucleic acid, it should be understood that, therefore can substitute any two genomes.In some embodiments, determine fetus mark, determine the presence or absence of copy number variation (such as aneuploidy) simultaneously.As hereafter more fully described, one group of label of test sample can be adopted to determine fetus mark and copy number variation.
The method quantizing fetus mark depends on the difference between Fetal genome and maternal gene group.In some embodiment described herein, determine that the fetus mark of sample DNA depends on the multiple dna sequence reads of the sequence site of known one or more polymorphisms of accommodation.In some embodiments, to sequence label each other and/or reference sequences compare while find polymorphic site or target nucleic acid sequence.In certain embodiments, the fetus mark of sample DNA is by considering that the copy number information of concrete chromosome or chromosome sequence determines wherein there is copy number difference between maternal DNA and fetal chromosomal.In this type of embodiment, the fetus mark of sample DNA is by considering that the sample DNA relative populations of mother and fetus is determined, wherein chromosome or section are natively determined or knownly had copy number variation.In this type of embodiment, fetus mark can use the copy number between maternal DNA and fetal chromosomal to make a variation and be calculated.For this purpose, the method and equipment can be calculated as follows the normalized chromosome value (NCV) described in literary composition, or similar module.
Some method is subject to the restriction of sex of foetus, and such as, method for quantizing fetus mark depends on chromosome dosage Y chromosome to the existence of specific sequence or the X chromosome of decision male fetus.In certain embodiments, quantizing foetal DNA is for fetus target, these fetus targets do not have parent counter pair, the such as Y chromosome sequence (people such as model (Fan), Proceedings of the National Academy of Sciences (Proc Natl Acad Sci) 105:16266-16271 [2008] and U.S. Patent Application Publication No. 2010/0112590, on November 6th, 2009 submits to, the people such as sieve (Lo)) or the negative parent of RhD in there is no RHD1 gene, also or by multiple DNA base-pair, be different from and parent background.Additive method independent of sex of foetus, and depends on the polymorphic differences between fetus and maternal gene group.
Allele imbalance in polymorphism can be detected by different technologies and quantize.In some embodiments, use the allele in digital pcr determination polymorphism uneven, such as, SNP on mRNA.Alternately, capillary gel electrophoresis is used to detect the difference of Polymorphic Regions size, such as, in STR situation.
In some embodiments, outer hereditary difference can be detected, such as promoter region is discrepant to methylate, can separately or and digital pcr combine for determining difference between Fetal genome and maternal gene group and quantizing fetus mark people such as (, clinical chemistry (Clin Chem) 56:90-98 [2010]) virgin (Tong).Also comprise the modification of epigenetic methods, such as, distinguish (people such as Ai Niqi (Erich), AJOG 204: the 205.e1 page is to 205.e11 page [2011]) based on methylated DNA.In some embodiments, use the order-checking of the polymorphic sequence of the group as the one or more pre-selected illustrated by other places of the application, estimate fetus mark.
Except such as except the method checked order to many group preliminary election polymorphic sequence illustrated by other places of the application, the method for quantizing the foetal DNA in Maternal plasma includes but not limited to real-time qPCR, mass spectroscopy, digital pcr (comprising microfluid digital pcr), capillary gel electrophoresis.
This section is discussed and is started to consider fetus mark, as never (or through determining not) have the chromosome of copy number variation or one or more polymorphisms of chromosome segment or other information determine.The fetus mark determined by this type of technology will be called non-CNV fetus mark or " NCNFF " at this.Part after this section, describes multiple technologies, for calculating fetus mark from the chromosome or chromosome segment through determining to have copy number variation.The fetus mark determined from this type of technology will be called CNV fetus mark or " CNFF " at this.
In some embodiments, the Relative Contribution by determining the polymorphic allele deriving from Fetal genome assesses fetus mark with the contribution of the corresponding polymorphic allele deriving from maternal gene group.In some embodiments, by determining that the Relative Contribution contrast of the polymorphic allele deriving from Fetal genome derives from Fetal genome and assesses fetus mark to total contribution of the corresponding polymorphic allele of maternal gene group.
Polymorphism can be tell-tale, informational (informative), or both.Indicative polymorphism shows to there is fetus Cell-free DNA (" cfDNA ") in maternal sample.Informedness polymorphism (such as informedness SNP) produces about the information of fetus, such as, and the presence or absence of disease, genetic abnormality or any other biological information, such as stages of gestation or sex.In this case, informedness polymorphism identifies those of difference between mother and the sequence of fetus, and in method disclosed here.In other words, informedness polymorphism has the polymorphism in not homotactic nucleic acid samples (that is, they have different allele), and these sequences exist with different amounts.In this certain methods, the sequence/allele of varying number is used to determine fetus mark, particularly NCNFF.
Polymorphic site includes but not limited to single nucleotide polymorphism (SNP), series connection SNP, on a small scale many base deletions or insertion (IN-DELS or disappearance insert polymorphism (DIP)), polynucleotide polymorphism (MNP), Short tandem repeatSTR fragment (STR), restriction fragment length polymorphism (RFLP), or has any polymorphism of any other allelic sequences variation in chromosome.In some embodiments, each target nucleic acid comprises two series connection SNP.Series connection SNP is analyzed as single unit (such as, as short haplotype), and provides in this as multiple set with two SNP.
In some embodiments, fetus mark is determined by statistics and approximation technique, and these technology are by being used for determining that the polymorphic site of Relative Contribution assesses the Relative Contribution of the distribution type of fetus and maternal gene group.Fetus mark can also be determined by electrophoresis, wherein the polymorphic site of some type be separated with electrophoretic and be used for the Relative Contribution identifying the Relative Contribution from the polymorphic allele of Fetal genome and the corresponding polymorphic allele from maternal gene group.
In the embodiment of shown in Fig. 6 process chart, fetus mark is determined by method 600, method 600 comprises the test sample obtaining first in operation 610 and comprise the potpourri of fetus and maternal nucleic acids, for polymorphic target nucleic acid enriched nucleic acid potpourri in operation 620, in operation 630, the mixtures of nucleic acids of enrichment is checked order, and determine the fetus mark in sample and aneuploidy in operation 640 simultaneously.
Fig. 7 display is used for the process chart of some embodiments.Fetus mark is determined: (i) obtains Maternal plasma sample in operation 710 by following, (ii) cfDNA in operation 720 in purification of samples, (iii) increase polymorphic nucleic acid in operation 730, (iv) in operation 740, use extensive parallel sequence measurement to check order to potpourri, and (v) calculate fetus mark in operation 760.In another embodiment, fetus mark is determined: (i) obtains Maternal plasma sample in operation 710 by following, (ii) cfDNA in operation 720 in purification of samples, (iii) increase polymorphic nucleic acid in operation 730, (iv) in operation 750, use electrophoresis according to size isolating nucleic acid, and (v) calculate fetus mark in operation 770.
In the embodiment of shown in Fig. 8 process chart, fetus mark is determined: (i) obtains the sample comprising the potpourri of fetus and maternal nucleic acids in operation 810 by following, (ii) increase sample in operation 820, (iii) in operation 830, enriched sample is carried out by being merged by the sample that do not increase of the sample of amplification and original mixture, (iv) purification of samples in operation 840, (v) make in operation 850 differently to check order to sample to determine fetus mark, determine the presence or absence of fetus mark and aneuploidy in 860 operations simultaneously.
In another embodiment shown in Fig. 9 process chart, fetus mark is determined: (i) obtains the sample comprising the potpourri of fetus and maternal nucleic acids in operation 910 by following, (ii) purification of samples in operation 920, (iii) increase the part of sample in operation 930, (iv) in operation 940 by by the purified of the sample of amplification and the initial sample of original mixture but the incorporating aspects do not increased carrys out enriched sample, (v) in operation 950, sample is checked order to determine fetus mark, the presence or absence differently simultaneously determining fetus mark and aneuploidy is made in 960 operations.
In another embodiment shown in Figure 10 process chart, fetus mark is determined: (i) obtains the sample comprising the potpourri of fetus and maternal nucleic acids in operation 1010 by following, (ii) purification of samples in operation 1020, (iii) increase the Part I of sample in operation 1040, (iv) in operation 1050, prepare the sequencing library through amplification part of sample, v () prepares the purified but sequencing library of the part that do not increase of second of sample in operation 1030, (vi) in operation 1060 by the combination of two sequencing libraries is carried out enriched Mixture, (vii) in operation 1070, potpourri is checked order, the presence or absence differently simultaneously determining fetus mark and aneuploidy is made in 1080 operations.
In another embodiment, fetus mark is determined: (i) obtains the sample comprising the potpourri of fetus and maternal nucleic acids by following, (ii) purification of samples, (iii) the primer amplification sample through mark is used, (iv) electrophoresis is used to check order to sample, to make differently to determine fetus mark.
In another embodiment, fetus mark is determined: (i) obtains the sample comprising the potpourri of fetus and maternal nucleic acids by following, (ii) purification of samples, (iii) optionally enriched sample is carried out by a part for the sample that increases, (iv) to sample order-checking, to make differently to determine fetus mark.
The sample that purifying obtains at first, through the sample of amplification or through amplification and the sample of enrichment or other nucleic acid samples relevant with method disclosed here (such as in operation 720,840,920 and 1020), can be completed by any routine techniques.For being separated cfDNA from cell, classification separation, centrifugal (such as density gradient centrifugation), DNA specificity precipitation or high-flux cell sorting and/or separation method can be used.Optionally, gained sample can purifying or amplification before fragmentation.If specimen in use comprises cfDNA, so may not request fragmentation, because cfDNA is at fragmentation in nature, wherein piece size is about 150bp to 200bp often.
In more above-mentioned programs, use selective amplification and enrichment improve the relative populations from the nucleic acid in the region residing for polymorphism.Similar results can by carrying out deep order-checking to obtain to genomic selected areas (region particularly residing for polymorphism).
amplification
Obtain sample and after purification of samples, use the multiple polymorphic target nucleic acid of part amplification of the purified mixture of fetus and maternal nucleic acids (such as cfDNA), each nucleic acid comprises polymorphic site.The target nucleic acid increased in fetus and maternal nucleic acids potpourri, in some implementation, be that any method (including but not limited to asymmetric PCR, helicase dependent amplification, heat start PCR, qPCR, Solid phase PCR and touchdown PCR) of the variation by using PCR (PCR) or the method realizes.In some embodiments, sample can partly increase to assist to determine fetus mark.In some embodiments, do not increase.Disclosed amplification method and other amplification techniques can be used in operation 730,820,930 and 1040.
amplification SNP
There is a large amount of nucleic acid primers can comprise the DNA fragmentation of SNP for being used for increasing, and its sequence can be obtained, such as, from the database known to those of ordinary skill in the art.Other primer can also be designed, such as use with the similar approach disclosed in Publication about Document: Vickers E.F. (Vieux, E.F.), Guo P-Y (Kwok, P-Y) and Miller R.D. (Miller, R.D.), biotechnology (BioTechniques) (in June, 2002), 32nd volume, supplementary issue: " SNP: the discovery (SNPs:Discovery ofMarker Disease) of label disease ", the 28th page to the 32nd page.
Selective sequence Auele Specific Primer is with the target nucleic acid that increases.In one embodiment, as amplicon amplification comprises the target nucleic acid of polymorphic site.In another embodiment, as amplicon amplification comprises the target nucleic acid of two or more polymorphic sites (such as two series connection SNP).The target nucleic acid amplicon through amplification at least about 100 bp comprises single or series connection SNP.The primer comprising the target sequence of series connection SNP for increasing can contain two SNP site through design.
amplification of STR
Some nucleic acid primers can comprise the DNA fragmentation of STR for being used for increasing, and this type of sequence can obtain by a database known to the skilled from this area.
In some embodiments, use a part for fetus and maternal nucleic acids potpourri as the template of target nucleic acid for increasing with at least one STR.Comprehensive directory about the list of references of STR, disclosed PCR primer, common multiplicated system and relevant population data, argument and sequence information is compiled in STRBase, and this STRBase can conduct interviews at cstl.nist.gov/strbase place via the Internet.Come comfortable ncbi.nlm.nih.gov/genbank's , be also addressable for the sequence information of conventional str locus seat by STRBase.
STR multiplicated system allows to increase multiple nonoverlapping locus in single reaction simultaneously, thus improves flux in fact.Because the polymorphism of STR is high, so most of individuality is heterozygous.STR can be used in electrophoretic analysis as described further below.
MiniSTRs can also be used to carry out the amplicon increasing to produce size reduction, thus distinguish STR allele shorter in length.The method of disclosed embodiment contains the fetal nucleic acid mark in the maternal sample determining enrich target nucleic acid, the each self-contained miniSTR of target nucleic acid, the method comprises and quantizes to be positioned at least one fetus and a maternal allele of a polymorphism miniSTR, and it can increase to produce the amplicon that length is about the size of circulation fetal DNA fragments.Arbitrary to miniSTR primer or two to or more at least one miniSTR that increases be can be used for the combination of miniSTR primer.
enrichment
The sample of enrichment can comprise in addition: the blood plasma separate section of blood sample; The sample of the purified cfDNA extracted from blood plasma; From sequencing library sample prepared by the purified potpourri of fetus and maternal nucleic acids; Etc..
In certain embodiments, before to genome sequencing, comprise the sample of DNA molecular potpourri for full-length genome unspecific enrichment, that is, before order-checking, carry out whole genome amplification.Unspecific enrichment mixtures of nucleic acids refers to that carrying out this DNA sample of whole genome amplification to the genomic DNA fragment of DNA sample is used in and identifies by order-checking the level improving sample DNA before polymorphism.Unspecific enrichment can be the selective enrichment of one of two genomes (fetus and parent) existed in sample.
In other embodiments, the cfDNA in sample is through specific enrichment.Specific enrichment refers to the enrichment of genomic samples for particular sequence (such as polymorphism target sequence), and it is completed by the method comprising specific amplification target nucleic acid sequence, and target nucleic acid sequence comprises polymorphic site.
In other embodiments, the mixtures of nucleic acids be present in sample is the polymorphic target nucleic acid enrichment in addition for each self-contained polymorphic site.This type of enrichment can be used in operation 620.The potpourri of enriches fetal and maternal nucleic acids comprises, and increase a part for the nucleic acid comprised from initial maternal sample target sequence, and is combined, such as, in operation 830 and 940 by the remainder of part or whole amplified production and initial maternal sample.
In still another embodiment, the sample of enrichment is in addition the sequencing library sample prepared by the purified mixture of fetus and maternal nucleic acids.The amount of the amplified production being used for enrichment initial sample is selected to be enough to be used in obtain the sequence information determining fetus mark.The sum of sequence label obtained from checking order at least about 3%, at least about 5%, at least about 7%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, mapped to determine fetus mark at least about 30% or more.
In one embodiment, in Fig. 10, enrichment is included in the target nucleic acid amplification will comprised in a part for the initial sample of the purified mixture of fetus and maternal nucleic acids (such as, the cfDNA of purifying from Maternal plasma sample) in operation 1040.Similarly, in operation 1050, a part that the is purified but cfDNA do not increased is used to prepare elementary sequencing library.In operation 1060, the part in target library is combined with the primary libraries produced by the mixtures of nucleic acids do not increased, and in operation 1070, the fetus comprised in two libraries and maternal nucleic acids potpourri is checked order.The library of enrichment can comprise target library at least about 5%, at least about 10%, at least about 15%, at least about 20% or at least about 25%.In operation 1080, to the data analysis from order-checking round, and described in the operation 640 of embodiment as depicted in figure 6, determine the presence or absence of fetus mark and aneuploidy simultaneously.
sequencing technologies
The fetus of enrichment and maternal nucleic acids potpourri are checked order.For determining that the necessary sequence information of fetus mark can use any known DNA sequencing method to obtain, wherein a lot of method is in other local explanations of the application.This type of sequence measurement comprises real single-molecule sequencing method (Helicos True Single Molecule the Sequencing) (tSMS of sequencing of future generation (NGS), Sang Geer sequencing (Sanger sequencing), nautical mile Cohan tM), 454 sequencing (Roche), SOLiD technology (applying biological system), unimolecule in real time (SMRT tM), sequencing technologies (Pacific Ocean bio-science), nano-pore sequencing, chemosensitivity field effect transistor (chemFET) array, use transmission electron microscopy (TEM) Hall health molecule process (Halcyon Molecular ' s method), ion current single-molecule sequencing method, Sequencing by hybridization etc.In certain embodiments, extensive parallel sequencing is adopted.In one embodiment, the order-checking of Yi Lu meter Na synthetic method and the order-checking chemical technology based on reversible terminator is used.In certain embodiments, part sequencing is used.
The DNA checked order is mapped to reference to genome.Can be artificial gene group with reference to genome or can be mankind's canonical sequence genome.This type of comprises with reference to genome: the made Target sequence gene group comprising polymorphic target nucleic acid sequence; Artificial SNP is with reference to genome; Artificial STR is with reference to genome; Artificial series connection STR is with reference to genome; Mankind's canonical sequence genome NCBI36/hg18 sequence, is it at the Internet genome.ucsc.edu/cgi-bin/hgGateway? org=Human & db=hg18 & hgsid=166260105 can obtain; And comprise mankind's canonical sequence genome NCBI36/hg18 sequence and made Target sequence gene group, such as the SNP genome of target polymorphic sequence.In mapping process, allow to there is some mispairing.
In one embodiment, the order-checking information obtained is analyzed and made simultaneously determine in operation 630, determine fetus mark and the presence or absence determining aneuploidy.
As described above, often kind of sample obtains multiple sequence label.In certain embodiments, utilize reading to be mapped to reference to genome, often kind of sample obtains at least about 3x10 6individual sequence label, at least about 5x10 6individual sequence label, at least about 8x10 6individual sequence label, at least about 10x10 6individual sequence label, at least about 15x10 6individual sequence label, at least about 20x10 6individual sequence label, at least about 30x10 6individual sequence label, at least about 40x10 6individual sequence label or at least about 50x10 6individual sequence label, these sequence labels comprise the reading between 20 bp and 40 bp.In one embodiment, all sequences reading is mapped to reference to genomic all regions.In one embodiment, the label comprising the reading being mapped to the genomic all regions of mankind's canonical sequence (such as all chromosome) is counted, and in the DNA sample of mixing, determine fetus aneuploidy, namely, the excessive representative of interested sequence (such as chromosome or its part) or represent deficiency, and count to determine fetus mark to the label of containment mapping to the reading of made Target sequence gene group.The method does not require to make differentiation between maternal gene group and Fetal genome.
In one embodiment, to from order-checking round data analysis and determine fetus mark, and presence or absence aneuploidy simultaneously.
sequencing library
In some embodiments, use part or all of polymorphic sequence increased for the preparation of the sequencing library checked order with described parallel mode.In one embodiment, library is prepared to use Yi Lu meter Na to carry out synthetic method order-checking based on the order-checking chemical technology of reversible terminator.Library can be prepared from the cfDNA of purifying and comprise at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45% or at least about 50% amplified production.
Checked order in the library that any one method described by Figure 11 is produced, provide the sequence label of the target nucleic acid deriving from amplification and derive from the label of the maternal sample do not increased at first.Fetus mark calculates from being mapped to the genomic number of tags of artificial reference.
calculate fetus mark
As explained above, after relative dna is checked order, computing method can be utilized by sequence mapping or comparison in concrete gene, chromosome, allele or other structures.There is the multiple computerized algorithm for aligned sequences, include but not limited to the BLAST (people such as Ao Ciqiu (Altschul), 1990), BLITZ (MPsrch) (Si Teluoke and Collins (Sturrock & Collins), 1993), FASTA (the inferior and Lippmann (Pearson & Lipman) of pul, 1988), BOWTIE (the people such as youth's lattice rice (Langmead), genome biology (Genome Biology) 10:R25.1-R25.10 [2009]), or ELAND (Illumina Inc., Santiago, CA, the U.S. (Illumina, Inc., San Diego, CA, USA)).In some embodiments, data box sequence is found in the known nucleic acid database of those skilled in the art, comprises GenBank, dbEST, dbSTS, EMBL (European Molecular Biology Laboratory) and DDBJ (DNA Data Bank of Japan).The sequence that BLAST or similar means control sequence database search identify can be utilized, and can utilize search hit that identified sequence is categorized into suitable data box.Alternately, Bloom filter (Bloom filter) or similar set member's tester (set membershiptester) can be adopted reading and reference genome alignment.See the U.S. Patent Application No. 61/552,374 that on October 27th, 2011 submits to, this application is combined in this in full by reference with it.
As mentioned, determine that fetus mark is that the second allele is positioned at the informedness polymorphic site (such as SNP) comprised with reference to genome based on being mapped to the first allelic total number of labels and being mapped to the second allelic sum according to some embodiments (particularly NCNFF technology).Informedness polymorphic site is by the difference of allelic sequences and eachly may allelic quantity be identified.Fetus cfDNA often exists with the concentration of < 10% parent cfDNA.Therefore, relative to the main contributions of maternal allele, exist can distribute to fetus, the allelic minor contributions of fetus and maternal nucleic acids potpourri.Derive from the allele of maternal gene group referred to here as main allele, and the allele deriving from Fetal genome is referred to here as secondary allele.Maternal allele is represented with the allele that the similar level of mapped sequence label represents.The results are shown in Figure 12 of exemplary multiplex amplification is carried out to the target nucleic acid comprising the SNP deriving from Maternal plasma sample.
Here, term " chromosome aneuploidy " and " complete chromosome aneuploidy " refer to the imbalance of the inhereditary material caused by losing or obtain whole chromosome at this, and comprise germline aneuploidy and mosaic aneuploidy.Term " part aneuploidy " and " chromosome dyad aneuploidy " refer to by losing or obtain a chromosomal part (such as at this, partial monoploidy and partial trisomy) and the imbalance of the inhereditary material caused, and contain the imbalance caused by transposition, deletion and insertion.
allele ratio is used to estimate fetus mark
For each in two allele at predetermined polymorphic site place, the relative abundance of fetus cfDNA in maternal sample can be determined, as the parameter of the sum of the unique sequences label be mapped to reference to the target nucleic acid sequence on genome.In one embodiment, the mark of fetal nucleic acid in fetus and maternal nucleic acids potpourri is calculated as follows for each informedness allele (allele x):
equation 1
And the fetus mark calculated for sample, as the allelic fetus score average of all informednesses.Optionally, for each informedness allele (allele x), the following mark calculating fetal nucleic acid in fetus and maternal nucleic acids potpourri:
equation 2
In order to compensate the existence of two foetal allele, one is covered by parent background.
fetus mark is determined by carrying out order-checking to predetermined polymorphic sequence
Determine that the more details of fetus mark provide as follows about by checking order to predetermined polymorphic sequence.
See Fig. 7, operation 720,730,740 and 760 is shown by carrying out the technological process that the mark of the fetal nucleic acid in a maternal biological sample is determined in extensive parallel order-checking to the polymorphic target nucleic acid through pcr amplification.In step 720, the maternal sample of the potpourri comprising fetus and maternal nucleic acids is obtained from an experimenter.This sample is the maternal sample obtained from a pregnant female (such as pregnant woman).Other maternal samples can come from mammal, such as cow, horse, dog or cat.If experimenter is the mankind, so sample can obtain first of gestation or second trimenon.Any maternal biological sample can as being included in cell or the source of acellular fetus and maternal nucleic acids.In certain embodiments, the maternal sample comprising acellular nucleic acid (cfDNA) is advantageously obtained.Preferably, this maternal biological sample is biological fluid sample.Preferably, this maternal sample is the maternal sample being selected from blood, blood plasma, serum, urine and saliva.In certain embodiments, this maternal sample is plasma sample.
In step 720, the potpourri of fetus and maternal nucleic acids processes from sample part such as such as blood plasma, further to obtain the sample of the purified mixture comprising fetus and maternal nucleic acids (such as cfDNA).Method for the treatment of maternal sample describes in other places of this paper.
In step 730, a part for the purified mixture of fetus and parent cfDNA is for multiple polymorphic target nucleic acid that increases, and each polymorphic target nucleic acid comprises a polymorphic site.In certain embodiments, these target nucleic acids comprise SNP separately.In other embodiments, each self-contained pair of series SNP of these target nucleic acids.In other other embodiments, each target nucleic acid comprises STR.The polymorphic site comprised in target nucleic acid includes, without being limited to single nucleotide polymorphism (SNP), series connection SNP, on a small scale many base deletions or insertion (are called IN-DELS, insert polymorphism or DIP also referred to as disappearance), polynucleotide polymorphism (MNP), Short tandem repeatSTR fragment (STR), restriction fragment length polymorphism (RFLP), or comprise the polymorphism of any other sequence variation in chromosome.In certain embodiments, the polymorphic site that the method contains is positioned on autosome, can determine the fetus mark irrelevant with sex of foetus thus.The polymorphism be associated with the chromosome except chromosome 13,18,21 and Y also may be used in method described here.
Polymorphism can be tell-tale, informational, or both.Indicative polymorphism shows to there is fetus Cell-free DNA in maternal sample.For example, concrete genetic sequence (such as SNP) is more, a kind of method is easier existed change into concrete colouring intensity, color density maybe can detect and can measure and the existence showing concrete region of DNA section and/or concrete polymorphism (SNP of such as embryo), some other character of not existing and measuring.About the present invention, these methods are not that all possible SNP in use genome carries out, but use the previously selected polymorphism (i.e. informedness polymorphism) probably identifying sequence difference between mother and fetus to carry out.Informedness polymorphic site is identified by the amount of each in the difference of allelic sequence and possible allele.Any polymorphic site that the reading produced by sequence measurement described here is contained may be used to determine fetus mark.
A part for fetus and maternal nucleic acids (such as cfDNA) potpourri in sample is used to be used as the template increased to the target nucleic acid comprising at least one SNP.In certain embodiments, each target nucleic acid comprises single (namely one) SNP.The target nucleic acid sequence comprising SNP can obtain from the database that can openly access, and these databases include but not limited to that Web address is mankind's snp database of wi.mit.edu, Web address is ncbi.nlm.nih.gov NCBI dbSNP homepage, Web address lifesciences.perkinelmer.com, Web address are the Life Technologies of appliedbiosystems.com tMthe snp database that Celera mankind's snp database that the applying biological system (Applied Biosystems) in (Carlsbad, CA city (Carlsbad, CA)), Web address are celera.com, Web address are the genome analysis groups (GAN) of gan.iarc.fr.In one embodiment, selecting to be used for the SNP of enriches fetal and parent cfDNA is selected from the people such as Parkes (the Pakstis) (people such as Parkes, human genetics (Hum Genet) 127:315-324 [2010]) describe 92 groups identifying individually SNP (IISNP), these SNP have shown and in frequency, have had very little change (F throughout colony st< 0.06) and be that there is elevation information in the whole world, average heterozygosity>=0.4.The SNP that the inventive method contains comprises and connecting and the SNP be not connected.Other available SNP that can apply or be applicable to method described here are disclosed in U.S. Patent Application No. 20080070792,20090280492,20080113358,20080026390,20080050739,20080220422 and 20080138809, and these patented claims are incorporated into this in full with it by reference.Each target nucleic acid comprises at least one polymorphic site, such as single SNP, this polymorphic site is different from the polymorphic site existed on another target nucleic acid, thus produce one group of polymorphic site of the polymorphic site containing enough numbers, such as SNP, wherein at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 25, at least 30, at least 40 or more is informational.For example, one group of SNP can be configured to contain at least one informedness SNP.In one embodiment, target is that the SNP carrying out increasing is selected from rs560681, rs1109037, rs9866013, rs13182883, rs13218440, rs7041158, rs740598, rs10773760, rs4530059, rs7205345, rs8078417, rs576261, rs2567608, rs430046, rs9951171, rs338882, rs10776839, rs9905977, rs1277284, rs258684, rs1347696, rs508485, rs9788670, rs8137254, rs3143, rs2182957, rs3739005 and rs530022.In one embodiment, this group SNP comprises at least 3, at least 5, at least 10, at least 13, at least 15, at least 20, at least 25, at least 30 or more SNP.In one embodiment, this group SNP comprises rs560681, rs1109037, rs9866013, rs13182883, rs13218440, rs7041158, rs740598, rs10773760, rs4530059, rs7205345, rs8078417, rs576261 and rs2567608.The polymorphic nucleic acid comprising SNP can be used in example 24 and to provide and the exemplary primer pair being disclosed as SEQ ID NOs:63-118 increases.
In other embodiments, each target nucleic acid comprises two or more SNP, and namely each target nucleic acid comprises series connection SNP.Preferably, each target nucleic acid comprises two series connection SNP.Series connection SNP is analyzed as single unit (such as, as short haplotype), and provides in this as multiple set with two SNP.For identifying applicable series connection SNP sequence, international HapMap group (Intemational HapMap Consortium) database (international HapMap plan (The International HapMap Project), nature (Nature) 426:789-796 [2003]) can be searched for.This database can obtain at hapmap.org place on the world wide web (www.In one embodiment, target is be selected from the right following set of series connection SNP for the series connection SNP carrying out increasing: rs7277033-rs2110153; Rs2822654-rs1882882; Rs368657-rs376635; Rs2822731-rs2822732; Rs1475881-rs7275487; Rs1735976-rs2827016; Rs447340-rs2824097; Rs418989-rs13047336; Rs987980-rs987981; Rs4143392-rs4143391; Rs1691324-rs13050434; Rs11909758-rs9980111; Rs2826842-rs232414; Rs1980969-rs1980970; Rs9978999-rs9979175; Rs1034346-rs12481852; Rs7509629-rs2828358; Rs4817013-rs7277036; Rs9981121-rs2829696; Rs455921-rs2898102; Rs2898102-rs458848; Rs961301-rs2830208; Rs2174536-rs458076; Rs11088023-rs11088024; Rs1011734-rs1011733; Rs2831244-rs9789838; Rs8132769-rs2831440; Rs8134080-rs2831524; Rs4817219-rs4817220; Rs2250911-rs2250997; Rs2831899-rs2831900; Rs2831902-rs2831903; Rs11088086-rs2251447; Rs2832040-rs11088088; Rs2832141-rs2246777; Rs2832959-rs9980934; Rs2833734-rs2833735; Rs933121-rs933122; Rs2834140-rs12626953; Rs2834485-rs3453; Rs9974986-rs2834703; Rs2776266-rs2835001; Rs1984014-rs1984015; Rs7281674-rs2835316; Rs13047304-rs13047322; Rs2835545-rs4816551; Rs2835735-rs2835736; Rs13047608-rs2835826; Rs2836550-rs2212596; Rs2836660-rs2836661; Rs465612-rs8131220; Rs9980072-rs8130031; Rs418359-rs2836926; Rs7278447-rs7278858; Rs385787-rs367001; Rs367001-rs386095; Rs2837296-rs2837297; And rs2837381-rs4816672.
In one embodiment, use a part for fetus and maternal nucleic acids (such as cfDNA) potpourri in sample as the template for increasing to the target nucleic acid comprising at least one STR.In certain embodiments, each target nucleic acid comprises single (namely one) SNP.Str locus seat almost each chromosome can find and multiple polymerase chain reaction (PCR) primer can be used to increase in genome.Tetranucleotide repeat fragment is preferably due to the fidelity in pcr amplification in forensic science family belongings, but also uses some trinucleotide and pentanucleotide repeated fragment.About the detail list editor of the reference of STR, disclosed PCR primer, conventional multiplicated system and Reference Group's data, the fact and sequence information is in STRBase, STRBase can be accessed by WWW ibm4.carb.nist.gov:8800/dna/home.htm.From the sequence information about conventional str locus seat of (http://www2.ncbi.nlm.nih.gov/cgi-bin/genbank) also can be obtained by STRBase.The commercial reagents box that can be used for analyzing str locus seat provides all necessary reactive component and the contrast required for amplification usually.STR multiplicated system allows to increase multiple nonoverlapping locus in single reaction simultaneously, and this substantially adds throughput.Use multicolor fluorescence detects, and even overlapping locus also can multiplely carry out.The genetic marker that the polymorphism of the tandem sequence repeats DNA sequence dna blazoned throughout human genome makes these sequences become important, identifies test for assignment of genes gene mapping research, linking parsing and the mankind.Because the polymorphism of STR is high, so most of individuality will be heterozygous, that is, most people will have two allele (version)---one by each parental generation heredity---, and each have different repetition number.The PCR primer comprising STR can use manually, semi-automatic or automatic mode carrys out separation and detection.Electrophoresis, detection and analysis bank based on gel, and are synthesized a unit by automanual system.In automanual system system, gel assembling and sample load and remain artificial process; But once sample is carried on gel, electrophoresis, detection and analysis will be carried out automatically.When fluorescently-labeled fragment migrate across fixed point detecting device and can along with collection they to observe them time, " in real time " carries out Data Collection.As its name suggests, Capillary Electrophoresis carries out in microcapillary but not between glass plate.Once load on instrument by sample, gelatin polymer and damping fluid, then kapillary is full of gelatin polymer and automatic load sample.Therefore, the fetus STR sequence of non-maternal inheritance will be different from parental sequences in repetition number.These STR sequences that increase can produce one or both main amplified productions corresponding with maternal allele (with the foetal allele of maternal inheritance), the secondary product corresponding with foetal allele that is a kind of and non-maternal inheritance.This technology was reported first in 2000 (people such as Pu'er (Pertl), human genetics (Human Genetics) 106:45-49 [2002]) and used PCR in real time to identify multiple different STR region subsequently simultaneously and be developed (people such as Liu, Acta Obset Gyn Scand 86:535-541 [2007]).The pcr amplification of various sizes has been used to distinguish the corresponding domain size distribution of circulation fetus and mother body D NA material, and show fetal DNA in maternal plasma DNA molecular (people such as Chan, clinical chemistry (Clin Chem) 50:8892 [2004] shorter in mother body D NA molecule usually.The size classification of circulation foetal DNA is separated verified, the average length L EssT.LTssT.LT 300bp of circulation fetal DNA fragments, and estimate mother body D NA (people such as Li, clinical chemistry, 50:1002-1011 [2004]) between about 0.5 Kb and 1 Kb.The invention provides a kind of method for determining fetal nucleic acid mark in a maternal sample, the method comprises to be determined to be positioned at least one fetus in a polymorphic miniSTR site and the copy number of a maternal allele, and miniSTR can through amplification with the amplicon producing the size (being such as less than about 250 base-pairs) that length is approximately circulation fetal DNA fragments.In one embodiment, fetus mark can be determined the method checked order at least partially of the polymorphic target nucleic acid through amplification by a kind of comprising, and each target nucleic acid comprises a miniSTR.Be positioned at the fetus in informedness STR site and maternal allele by its different length, that is, repetition number distinguishes, and fetus mark the ratio percentage of allelic amount can be calculated by the fetomaternal being positioned at this site.The method can use the combination of the informedness miniSTR of an informedness miniSTR or any number to determine the mark of fetal nucleic acid.In one embodiment, the method comprises determines at least to be positioned at least one fetus of a polymorphic miniSTR and the copy number of at least one maternal allele, and this miniSTR is less than about 300 bp through amplification to produce, be less than about 250 bp, be less than about 200 bp, be less than about 150 bp, be less than about 100 bp or be less than the amplicon of about 50 bp.In another embodiment, about 300 bp are less than by the produced amplicon that increases to miniSTR.In another embodiment, about 250 bp are less than by the produced amplicon that increases to miniSTR.In another embodiment, about 200 bp are less than by the produced amplicon that increases to miniSTR.The allelic amplification of informedness comprises and uses miniSTR primer, and the amplicon that these primers can reduce size increase to detect and be less than about 500 bp, is less than about 450 bp, is less than about 400 bp, is less than about 350 bp, is less than about 300 base-pairs (bp), is less than about 250 bp, is less than about 200 bp, is less than about 150 bp, is less than about 100 bp or is less than the STR allele of about 50 bp.The amplicon that the size using miniSTR primer to produce reduces is called as miniSTR, and these miniSTR are according to the label title identification corresponding with the locus that they have mapped.In one embodiment, miniSTR primer comprises all 13 the CODIS str locus seats for finding in commercially available STR kit, except D2S1338, outside Penta D and pentaE, allow the amplicon size farthest miniSTR primer (people such as Nicholas Murray Butler (Butler) that reduces of size, Journal of Forensic Sciences (J ForensicSci) 48:1054-1064 [2003]), as Cusparia (Coble) and the miniSTR locus (Cusparia and the Nicholas Murray Butler that are not connected with CODIS label as described in Nicholas Murray Butler, Journal of Forensic Sciences 50:43-53 [2005]) and at other miniSTR that NIST characterizes.About the information of the miniSTR characterized at NIST can obtain via WWW cstl.nist.gov/biotech/strbase/newSTRs.htm.Arbitrary to miniSTR primer or two to or more at least one miniSTR that increases be can be used for the combination of miniSTR primer.
The target nucleic acid increased in fetus and maternal nucleic acids (such as cfDNA) potpourri is by using PCR or any method realization as other the local variations described in the application.Increase these target sequences be use each target nucleic acid sequence comprising polymorphic site (such as SNP) that can increase in multi-PRC reaction primer pair realization.Multi-PRC reaction comprises and is combined in same reaction by least 2, at least three, at least 3, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 40 or more primer sets, to quantize the target nucleic acid through amplification comprising at least two, at least three, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 40 or more polymorphic sites at same sequencing reaction.Any group of primer set can be configured at least one informedness polymorphic sequence that increases.
Primer is designed to a sequence hybridization close to the SNP site on cfDNA to guarantee that this SNP site is included in the length of the reading produced by sequenator.As in example provide, hybridize in the mode enough close to polymorphic site at least one identifying in two primers in the primer set of any one polymorphic site, to make this polymorphic site be encompassed in by carrying out on Yi Lu meter Na analyser GII in 36 bp readings that extensive parallel order-checking produces, and produce the amplicon that length is enough to carry out bridge amplification between cluster Formation period.Therefore, primer is designed to the amplicon producing at least 110 bp, these amplicons with General adaptive increased for cluster (Illumina Inc. of San Diego, CA city (Illumina Inc., San Diego, CA)) combination time produce the DNA molecular of at least 200 bp.The SNP provided in table 33 is for 13 target sequences that increase in a multiple check simultaneously.In table 33, provide group to be an exemplary SNP group.Less or more SNP can be adopted to come for polymorphic target nucleic acid enriches fetal and mother body D NA.Operable extra SNP is included in the SNP provided in table 34.SNP allele runic is shown and is underlined.Can be used for method according to the present invention and determine that other SNP of fetus mark comprise rs315791, rs3780962, rs1410059, rs279844, rs38882, rs9951171, rs214955, rs6444724, rs2503107, rs1019029, rs1413212, rs1031825, rs891700, rs1005533, rs2831700, rs354439, rs1979255, rs1454361, rs8037429 and rs1490413.These SNP by TaqMan PCR for determining fetus mark to analyze, and to be disclosed in U.S. Patent Application Publication 2010-0010085.
Primer forward or backwards in each primer set enough hybridizes to be included in by carrying out the previously selected polymorphic nucleic acid through increasing in sequence reads that described extensive parallel order-checking produces close to the DNA sequence dna of described polymorphic site with one.The length of sequence reads is relevant with concrete sequencing technologies.Extensive parallel sequence measurement provides the sequence reads that size changes from tens base-pairs to hundreds of base-pair.At least one primer in each primer set is designed to be identified in 20bp, about 25bp, about 30bp, about 35bp, about 40bp, about 45bp, about 50bp, about 55bp, about 60bp, about 65bp, about 70bp, about 75bp, about 80bp, about 85bp, about 90bp, about 95bp, about 100bp, about 110bp, about 120bp, about 130bp, about 140bp, about 150bp, about 200bp, about 250bp, about 300bp, about 350bp, about 400bp, the polymorphic site existed in the sequence reads of about 450bp or about 500bp.In certain embodiments, at least one primer in primer set described in each is designed to be identified in the polymorphic site existed in the sequence reads of about 25bp, about 40bp, about 50bp or about 100bp.
Circulation Cell-free DNA is about < 300bp.Therefore, primer set is designed on average hybridize up to about the polymorphic sequence of 300bp with length and increase to it, and wherein foetal DNA length is on average about 170 bp.In certain embodiments, primer set and DNA hybridization, produce the amplicon up to about 300 bp.In other embodiments, primer set and described DNA sequence dna are hybridized, and produce at least about 100bp, at least about 150bp, amplicon at least about 200bp.Primer set can be hybridized with the DNA sequence dna existed on phase homologous chromosomes or hybridize with the DNA sequence dna existed on coloured differently body.For example, one or more primer set can with the sequence hybridization existed on phase homologous chromosomes.Alternately, two or more primer sets and the sequence hybridization existed on coloured differently body.In one embodiment, the polymorphic sequence one or more in chromosome 1 to 22 of primer pair existed increases.In certain embodiments, primer set not with chromosome 13,18,21, DNA sequence dna that X or Y exists hybridizes.
In step 740 (Fig. 7), part or all of the polymorphic sequence that use is increased is for the preparation of the sequencing library checked order with described parallel mode.In one embodiment, library is prepared to use the order-checking chemical technology synthetic method based on reversible terminator of Yi Lu meter Na to check order.
In step 740, determine that the DNA sequencing method that the sequence information required for fetus mark uses any one known obtains.Preferably, method described here adopts sequencing technologies of future generation (NGS) to provide as the isarithmic sequence label described by other places of the application.Order-checking can be the extensive parallel order-checking of synthetic method.Preferably, the extensive parallel order-checking of synthetic method uses reversible dye-terminators.Alternately, extensive parallel order-checking can be connection method order-checking, or single-molecule sequencing.
Part order-checking is carried out to the polymorphic nucleic acid of increased target, and to comprise predetermined length (such as 36 bp) reading, be mapped to the genomic sequence label of known reference and count.Only count as sequence label with the sequence reads with reference to the unique comparison of genome.In one embodiment, reference genome is the made Target sequence gene group comprising polymorphic target nucleic acid (SNP) sequence.In one embodiment, reference genome is that artificial SNP is with reference to genome.In another embodiment, reference genome is that artificial STR is with reference to genome.In still another embodiment, reference genome manually connects STR with reference to genome.Artificial reference genome can use the polymorphic nucleotide sequence editor of target.Artificial reference genome can comprise each polymorphic target sequence comprising one or more dissimilar polymorphic sequence.For example, artificial reference genome can comprise the polymorphic sequence comprising SNP allele and/or STR.In one embodiment, reference genome is mankind's reference sequences genome NCBI36/hg18 sequence, is it at WWW genome.ucsc.edu/cgi-bin/hgGateway? org=Human & db=hg18 & hgsid=166260105 can obtain.Source of sequence information disclosed in other comprises GenBank, dbEST, dbSTS, EMBL (European Molecular Biology Laboratory (European Molecular Biology Laboratory)) and DDBJ (DNA Data Bank of Japan).In another embodiment, the mankind are comprised with reference to genome NCBI36/hg18 sequence and the made Target sequence gene group comprising target polymorphic sequence with reference to genome, such as SNP genome.By the sequence of map tags is determined that the chromosome starting point of checked order nucleic acid (such as cfDNA) molecule can realize the mapping of sequence label with comparing with reference to genomic sequence, and do not need concrete genetic sequence information.Multiple computerized algorithm may be used for aligned sequences, include, without being limited to the BLAST (people such as Ao Ciqiu (Altschul), 1990), BLITZ (MPsrch) (Si Teluoke and Collins (Sturrock & Collins), 1993), FASTA (the inferior and Lippmann (Pearson & Lipman) of pul, 1988), BOWTIE (the people such as youth's lattice rice (Langmead), genome biology (Genome Biology) 10:R25.1-R25.10 [2009]), or ELAND (Illumina Inc. of San Diego, CA, USA city (Illumina, Inc., San Diego, CA, USA)).In one embodiment, checking order with one end of the copy of clonal fashion amplification and being processed by the bioinformatics compare of analysis of Yi Lu meter Na gene element analyzer blood plasma cfDNA molecule, Yi Lu meter Na gene element analyzer uses the extensive efficient comparison of RiboaptDB (ELAND) software to carry out.In the embodiment comprising the method using NGS sequence measurement determination presence or absence aneuploidy and fetus mark, for determining that aneuploidy can allow the mispairing of less degree (0 to 2 mispairing of each sequence label) to the analysis that order-checking information is carried out, with the small polymorphism that may exist between the genome in explanation reference genome and biased sample.For determining that fetus mark can allow the mispairing of less degree to the analysis that order-checking information is carried out, this depends on polymorphic sequence.For example, if polymorphic sequence is STR, the mispairing of less degree can so be allowed.When polymorphic sequence is SNP, first all sequences of any one exact matching with two allele being arranged in SNP site is counted and filtered out from residual readings, for residual readings, the mispairing of less degree can be allowed.Can be as described in this, or use and adopt the median normalization (people such as model (Fan) of the median of interested chromosomal sequence label relative to the label of each in other autosomes, institute of NAS periodical (Proc Natl Acad Sci) 105:16266-16271 [2008]) or the number that compares unique reading of comparing with each chromosome and the reading sum of comparing with all chromosome to draw the substitution analysis of each chromosomal genomic expression number percent, determine that the quantification of the number of the sequence reads of comparing with each chromosome is to determine chromosomal aneuploidy.Produce " z mark " to represent that difference between interested chromosomal genomic expression number percent and the phase homologous chromosomes average expression number percent between euploid control group is divided by standard deviation people such as (, clinical chemistry (Clin Chem) 56:459-463 [2010]) Zhao (Chiu).In another embodiment, order-checking information can the title as application on January 19th, 2010 be " normalized biological test " U.S. Provisional Patent Application case 32047-768.101 described in determine, this application is in full incorporated into this with it by reference.
For determining that fetus mark can allow the mispairing of less degree to the analysis that order-checking information is carried out, this depends on polymorphic sequence.For example, if polymorphic sequence is STR, the mispairing of less degree can so be allowed.When polymorphic sequence is SNP, first all sequences of any one exact matching with two allele being arranged in SNP site is counted and filtered out from residual readings, for residual readings, the mispairing of less degree can be allowed.By checking order to nucleic acid, the inventive method determining fetus mark can combinationally use with additive method.
In step 760, the sum that fetus mark is based on reference to the informedness polymorphic site (such as SNP) comprised in genome being mapped to the first allelic label is determined with the sum being mapped to the second allelic label.For example, reference genome covers to comprise SNP rs560681, rs1109037, rs9866013, rs13182883, rs13218440, rs7041158, rs740598, rs10773760, rs4530059, rs7205345, rs8078417, rs576261, rs2567608, rs430046, rs9951171, rs338882, rs10776839, rs9905977, rs1277284, rs258684, rs1347696, rs508485, rs9788670, rs8137254, rs3143, rs2182957, the made Target sequence gene group of the polymorphic sequence of rs3739005 and rs530022.In one embodiment, artificial reference genome comprises the polymorphic target sequence (see example 24) of SEQ ID NO:7 to 62.
In another embodiment, artificial gene group is the made Target sequence gene group covering the polymorphic sequence comprising series connection SNP.In another embodiment, made Target genome covers the polymorphic sequence comprising STR.The composition of made Target sequence gene group changes depending on the polymorphic sequence for determining fetus mark.Therefore, made Target sequence gene group is not limited to SNP, series connection SNP or the STR sequence at this illustration.
Informedness polymorphic site (such as SNP) is identified by the amount of each in the difference of allelic sequence and possible allele.Fetus cfDNA exists with the concentration lower than parent cfDNA 10%.Therefore, relative to the main contributions of maternal allele, there is the allele that can the distribute to fetus minor contributions to fetus and maternal nucleic acids potpourri.Derive from the allele of maternal gene group referred to here as main allele, and the allele deriving from Fetal genome is referred to here as secondary allele.Maternal allele is represented with the allele that the similar level of mapped sequence label represents.To comprising SNP and the target nucleic acid deriving from Maternal plasma sample carries out the results are shown in Figure 12 of exemplary multiplex amplification.Informedness SNP is changed with the single nucleotide being positioned at polymorphic site and distinguishes, and foetal allele is by comparing with the main contributions of maternal nucleic acids to fetus and maternal nucleic acids potpourri, and it distinguishes the contribution of this potpourri in sample is relatively secondary.Therefore, for each in two allele at predetermined polymorphic site place, the relative abundance of fetus cfDNA in maternal sample can be determined, as the parameter of the sum of the unique sequences label be mapped to reference to the target nucleic acid sequence on genome.In one embodiment, for each informedness allele (allele x), as described in other places of the application, calculate the mark of fetal nucleic acid in fetus and maternal nucleic acids potpourri.
sTR sequence and capillary electrophoresis is used to estimate fetus mark
Because repetition number is different, individuality has different STR length.Because the polymorphism of STR is high, so most of individuality will be heterozygous, that is, most people will have two allele (version)---one by each parental generation heredity---, and each have different repetition number.The fetus STR sequence of non-maternal inheritance will be different from parental sequences in repetition number.These STR sequences that increase can produce one or both main amplified productions corresponding with maternal allele (with the foetal allele of maternal inheritance), the secondary product corresponding with foetal allele that is a kind of and non-maternal inheritance.When checking order, collected sample can be associated with corresponding allele and carry out counting to pass through to use equation 3 to determine relative fractions.
PCR is carried out by using the sample of fluorescently-labeled primer pair purifying.Artificial, semi-automatic or robotization electrophoresis can be used to be separated and to detect the PCR primer comprising STR.Automanual system is based on gel and electrophoresis, determination and analysis is combined into a unit.In automanual system, gel assembling and sample load and remain manual program; But once sample loads on gel, then electric ice, determination and analysis carry out automatically.As its name suggests, kapillary electricity ice carries out in microcapillary but not between glass plate.Once sample, gelatin polymer and damping fluid load on instrument, then kapillary is full of gelatin polymer and automatic load sample.When fluorescently-labeled fragment migrates across the detecting device of fixed point and can observe them along with collecting them, " in real time " carries out Data Collection.The sequence that kapillary electricity ice obtains altogether can be detected by the program measuring fluorescence labeling wavelength.The calculating of fetus mark is based on average all informedness labels.Informedness label is identified by the existence of electrophoresis pattern upward peak, and these peak values drop in the preset data case parameter for analyzed STR.
Secondary allelic mark for any appointed information label is calculated by the peak height of the submember peak height summation divided by major component, and this fraction representation is as follows for the number percent of each information gene seat:
equation 3
The fetus mark for the sample comprising two or more informedness STR can be calculated, as the fetus score average calculated for two or more informedness labels.
mixture model is used to estimate fetus mark
In embodiment disclosed here, existence reaches four kinds of different data types (distribution type situation), and they form the secondary gene frequency data of polymorphism under consideration.
As shown in Figure 13, situation 1 and situation 2 are polymorphism situations, and wherein mother is homozygous at a certain allele place.In situation 1, if baby and mother are homozygous, so much state property is situation 1 polymorphism.This situation is not typically make us interested especially, because collected data only exist the allele of a type at analyzed polymorphic site.In situation 2, if mother is homozygous and baby is heterozygous, so fetus mark f is obtained by 2 times of ratio of secondary equipotential gene count and coverage on paper.Coverage is defined as the reading or label (fetus and parent) sum that are mapped to polymorphism specific site.The equation carrying out approximate evaluation to fetus mark with the mark of fetus and maternal sample in situation 2 is as follows:
equation 4
In situation 3, wherein mother is heterozygous and baby is homozygous, and fetus mark is time equipotential gene count and 1-2 times of the ratio of coverage on paper.In situation 3, as follows with the equation that the mark reading number total in both fetus and maternal sample is similar to fetus mark:
equation 5
Finally, in situation 4, wherein mother and fetus are all heterozygous, and secondary equipotential mrna fraction should always 0.5 (not comprising error).For the polymorphism dropped in situation 4, fetus mark cannot be derived.
If the number that main allele reading summarized by table 7 is 300 and the number of secondary allele reading is 200, equation 4 and 5 is so used to estimate the example of fetus mark.Coverage can be 500.
table 7: use distribution type to estimate the example of fetus mark
In certain embodiments, the distribution type situation that polymorphism sets classification becomes two or more to propose by mixture model can be adopted, and estimate foetal DNA mark for each in these situations from average gene frequency simultaneously.In general, mixture model supposes that concrete data acquisition is made up of the mixing of dissimilar data, its each there is the distribution (such as normal distribution) of its expectation.This program attempts to find the mean value of each categorical data and other possible features.In embodiment disclosed here, existence reaches four kinds of different data types (distribution type situation), and it forms the secondary gene frequency data for polymorphism under consideration.
In some embodiment adopting mixture model, for one or more factorial moments that the position calculation being just thought of as polymorphism is provided by equation 1.Such as, multiple SNP position calculation factorial moment F considered in DNA sequence dna is used i(or factorial moment collection).As shown in hereafter equation 10, each different factorial moment F ito given position, for secondary gene frequency a iwith coverage d iratio, the summation that all different polymorphic position considered is set up.As shown in hereafter equation 11, these factorial moments also relate to the parameter alpha relevant with each in above-mentioned four kinds of distribution type situations and p i.Exactly, they relate to the Probability p for each situation i, and by α given, considered polymorphism concentrate four kinds of situations in each relative quantity.As explained above, Probability p iin the Cell-free DNA in mother's blood, the function of the mark of foetal DNA.As hereafter more fully explained, by calculating these factorial moments of sufficient amount, the method provides the expression formula of sufficient amount to obtain all unknown quantitys.Unknown quantity in the case can be in considered polymorphism population, the relative quantity of each in four kinds of situations and the probability relevant to each in these four kinds of situations (and being foetal DNA mark thus).The mixture model of other versions is used to obtain similar results.Some version only utilizes the polymorphism in the situation of dropping on 1 and situation 2, and wherein the polymorphism of situation 3 and situation 4 is filtered by threshold technology.
Therefore, factorial moment can be used as a part for mixture model, to identify the probability of any combination of four kinds of situations of distribution type.Further, as mentioned, these probability, or at least for these probability of situation 2 and situation 3, be directly involved in the foetal DNA mark in the total Cell-free DNA in mother's blood.
Should also be mentioned that, be can be used for the system complexity reducing the factorial moment equation that must solve by the sequencing error that e is given.In this, should be realized that in fact sequencing error can have any one (may each of bases corresponding to being arranged in four of any given polymorphism position) in four kinds of results.
Suppose that at the main allele counting of genomic locations j be B, at the first order statistic of the counting (counting of reading) of position j.Main allele, b is corresponding independent variable maximal value (arg max).As consideration more than one SNP, use subscript.Main allele counting is provided by following:
equation 6
The secondary equipotential gene count of assumed position j is A, the second-order statistic at the counting (that is, secondary the highest allele counting) of position j:
A &equiv; A i &equiv; { a j } = w j , i ( 2 ) Equation 7
Coverage is defined as the total reading number (fetus and parent) being mapped to the concrete site of polymorphism.The coverage of assumed position j is defined as D:
D ≡ D j={ d i}=A j+ B jequation 8
In this embodiment, secondary gene frequency A is the summation of four as shown in equation 9.Described four kinds of heterozygosity situations prompting is at point (a i, d i) a ithe following binomial mixture model of the distribution of individual equipotential gene count, wherein d icoverage:
A={ α i} ~ α 1data box (p 1, d i)+α 2data box (p 2, d i)+α 3data box (p 3, d i)+α 4data box (p 4, d i)
Wherein
1=α 1234
m=4
Equation 9
Each corresponds to one of four kinds of distribution type situations.Each is the product of the binomial distribution of polymorphism mark α and time gene frequency.These α represent the mark of the polymorphism dropped in four kinds of situations in each.Each binomial distribution has relevant probability, p, and coverage, d.The secondary allele probability of situation 2 is such as given by f/2, and wherein f is fetus mark.For making p ithe different models associated with sequencing error rate from fetus mark are described below.Parameter alpha i relates to group specificity parameter and relative to the race of such as parental generation and offspring, the ability allowing these values " float " can give these methods extra robustness.
Disclosed embodiment utilizes the factorial moment for the gene frequency data in consideration.As everyone knows, distribution average is first moment.It is the expectation value of time gene frequency.Variance is second moment.It calculates from the expectation value of gene frequency square.
For different heterozygosity situations, above equation 9 can solve fetus mark.In certain embodiments, fetus mark is solved by factorial moment method, and wherein hybrid parameter can represent with square, and these squares can easily estimate from observed data.
Gene frequency data across all polymorphisms can be used for calculating i-th factorial moment F i(the first factorial moment F 1, the second factorial moment F 2deng), as shown in equation 10.(SNP is only for the object of example.The polymorphism of other types can as other of the application local discuss use.) a given n SNP position, then factorial moment is as given a definition:
F 1 = 1 n &Sigma; i = 1 n a i d i
F 2 = 1 n &Sigma; i = 1 n a i ( a i - 1 ) d i ( d i - 1 )
F j = 1 n &Sigma; i = 1 n a i ( a i - 1 ) &CenterDot; &CenterDot; &CenterDot; ( a i - j + 1 ) d i ( d i - 1 ) ( d i - j + 1 ) Equation 10
As shown in these equatioies, factorial moment is above the summation of i item (the individual polymorphism of data centralization), and wherein data centralization exists n this type of polymorphism.The every of summation is time equipotential gene count a i, and coverage value d ifunction.
Usefully, factorial moment and α iand p ivalue relevant, illustrated by equation 11.Factorial moment can with { α i, p iassociation, thus
F 1 &ap; &Sigma; i = 1 m &alpha; i p i 1
F 2 &ap; &Sigma; i = 1 m &alpha; i p i 2
F j &ap; &Sigma; i = 1 m &alpha; i p i j
F g &ap; &Sigma; i = 1 m &alpha; i p i g Equation 11
From Probability p ifetus mark f can be determined.Such as, and therefore, reliable logic can obtain solution of equations, and this system of equations makes unknown quantity α associate with for the factorial moment expression formula across in considered multiple polymorphisms equipotential mrna fractions with p variable.Certainly, in the scope of disclosed embodiment, there are the other technologies that mixture model is solved.
When n > 2* (number of parameters that will estimate), by obtaining { α in the system of equations derived by above relational equation 8 i, p isolution can identify a solution.Obviously, this problem mathematically becomes much more difficult, because g is higher, needs { the α estimated i, p imore.
The data of situation 1 and situation 2 (or situation 3 and situation 4) typically can not be distinguished exactly by the simple threshold values under lower fetus mark.By point distinguish, can by the data of situation 1 and situation 2 easily with the data separating of situation 3 and situation 4, wherein A is time equipotential gene count and D is coverage and T is threshold value.Find to use T=0.5 to show satisfaction.
Note, the method with mixed model adopting equation 10 and equation 11 is the data utilizing all polymorphisms, but respectively sequencing error is not described.The data of the first and second situations can be illustrated sequencing error from the proper method of the data separating of the third and fourth situation.
In additional examples, the data set being supplied to mixture model only comprises the data of the polymorphism for situation 1 and situation 2.These are homozygous polymorphism for mother.Threshold technology can be adopted to eliminate the polymorphism of situation 3 and 4.Such as, before employing mixture model, the polymorphism wherein secondary gene frequency being greater than concrete threshold value is got rid of.Utilize data through suitably filtering and according to the factorial moment of hereafter equation 13 and 14 abbreviation, people can calculate fetus mark f, as shown in equation 15.Notice that equation 13 is the stating again of equation 9 for this implementation of mixture model.Also notice that, in this instantiation, the sequencing error relevant with machine reading is unknown.As a result, the error of system of equations must be obtained respectively, e.
Figure 14 shows the comparison of the result and known fetal mark (X-axis) that use this mixture model and the fetus mark (Y-axis) estimated.If mixture model ideally dopes fetus mark, the result so described will follow dash line.But the mark of estimation is good significantly, particularly considers that most of data were excluded before application mix model.
In order to be described in further detail, some additive methods can be utilized to carry out parameter estimation to the model from equation 7.In some cases, can by chi amount (chi-squared statistic) derivative being set as be zero find tractable solution.When can not be found easy solution by direct differentiation, it can be effective for carrying out Taylor series expansion to binomial probability distribution function (PDF) or other approximation polynomials.Minimum X2 estimator is well known is effective.The method of square solution is asked to can be used as the starting point of process of iteration from equation 9.Following card side estimator can be used:
equation 12
Wherein P icounting of counting i.The alternative manner [" asymptotic theory (Asymptotic Theory ofEstimation and Testing Hypotheses) of estimation and testability hypothesis " of Lycra grace (Le Cam), third time primary gram comes mathematical statistics and probability Conference Papers collection (Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability), 1st volume, Bai Ke comes, California (Berkeley CA): publishing house of University of California (University ofCA Press), 1956, 129th page to the 156th page] be the La Erfu-Newton iteration (Ralph-Newton iteration) used in likelihood function.
According to another kind application, discuss a kind of method of resolving mixture model, it relates to the expectation value maximization approach being mixed into line operate of pairing approximation β-distribution.
model 1: situation 1 and 2, sequencing error is unknown
That considers only to illustrate heterozygosity situation 1 and 2 reduces model.In this case, potpourri distribution can be write as:
A={a i}~a 1Bin(e,d i)+a 2Bin(f/2,d i)
Wherein
1=α 12
M=4 equation 13.
And by system of equations:
F 1=α 1e+(1-α 1)(f/2)
F 2=α 1e 2+(1-α 1)(f/2) 2
F 31e 3+ (1-a 1) (f/2) 3equation 14,
Solve e (sequencing error rate), α (ratio that situation) and f (fetus mark), wherein F at 1 ias in above equation 10 define.The closed-form solution of fetus mark is chosen as the real solution of following equation:
f &ap; ( F 1 - 1 ) F 2 &PlusMinus; F 2 4 F 1 3 + F 2 - 3 F 1 ( 2 + F 1 F 2 + 4 F 2 2 ) 2 ( F 1 2 - F 2 ) Equation 15,
This solution is between 0 and 1.
In order to measure the performance of reckoning formula, with being designed to { 1%, 3%, 5%, 10%, 15%, 20% and the fetus mark of 25%} and the constant sequencing error rate of 1% construct the simulated data sets (α of Ha Di-Wen Boge equilibrium point (Hardy-Weinberg Equilibrium points) i, d i).1% error rate is the ratio that used order-checking machine and scheme are current accepted, and with the Yi Lu meter Na genomic constitution part parser II data consistent shown in Figure 15.Equation 15 is applied to these data and finds except four points are except the upper deviation, unanimous on the whole with " known " fetus mark.Interestingly, according to estimates, sequencing error rate, e, just in time higher than 1%.
model 2: situation 1 and 2, sequencing error is known
In next mixture model example, threshold value is again adopted to determine or another kind of filtering technique removes the data for polymorphism of the situation of belonging to 3 and 4.But in this case, sequencing error is known.This measure simplifies fetus mark, f, gained expression formula, as shown in equation 16.The result of improvement is provided compared with the method that this version that Figure 16 shows mixture model and equation 15 adopt.In equation subsequently, order-checking machine error rate is made to be e.
A kind of similar method has been shown in equation 17 and 18.The method is recognized, only has some sequencing errors to add time equipotential gene count to.But, only have one should increase time equipotential gene count in every four sequencing errors.Figure 17 show use this technology time reality and estimate fetus mark between very well agreeing with property.
Because the sequencing error rate of machine used is known to a great extent, so deviation and the complicacy of calculating can be reduced by the e eliminated as the variable for solving.Therefore, we obtain the system of equations for fetus mark F:
F 1=α 1e+(1-α 1)(f/2)
F 21e 2+ (1-α 1) (f/2) 2equation 16, separate to obtain:
f &ap; 2 ( e F 1 - F 2 ) ( e - F 1 ) .
Figure 16 shows, and uses machine error rate can reduce a little to the upper deviation as known parameter.
model 3: situation 1 and 2, sequencing error is known, the error model of improvement
In order to improve the deviation in this model, we expand the error model of above equation so that the following fact to be described: in heterozygosity situation 1, are not that each sequencing error event can be increased to time equipotential gene count A=a i.In addition, we allow the following fact: sequencing error event may contribute to the counting of heterozygosity situation 2.Therefore, we determine fetus mark f by solving the system of following factor square relation:
F 1=α 1e/4+(1-α 1)(e+f/2)
F 2 = &alpha; 1 ( e 4 ) 2 + ( 1 - &alpha; 1 ) ( e + f / 2 ) 2 Equation 17.
Then the solution of this system is:
f &ap; - 2 ( e 2 - 5 e F 1 + 4 F 2 ) ( e - 4 F 1 ) Equation 18.
Figure 17 shows and uses machine error rate as known parameters, strengthens the simulated data of the error model of situation 1 and 2, make to the upper deviation be greatly reduced to be less than for lower than 0.2 the point of fetus mark.
fetus mark is used to classify to affected sample
In certain embodiments, fetus mark estimated value is adopted to characterize affected sample further.In some cases, fetus mark estimated value allows affected sample classification to be mosaic, complete aneuploidy or aneuploidy partly.A kind of computer-implemented method for obtaining this information is described relative to the process flow diagram of Figure 18.The classification that this and relevant method provides the estimation of fetus mark, the determination of CNV and CNV simultaneously can be carried out.In other words, identical label can be adopted to carry out any one in these three kinds of functions.
In order to use the method, adopt two kinds of patterns assessing fetus mark.A kind of pattern produces NCNFF value, and another kind of pattern produces CNFF value.As explained above, CNFF value be use depend on by determine to have copy number variation chromosome or chromosome segment technology and obtain.Do not need to rely on polymorphism to calculate fetus mark.Being used for an example of the non-polymorphic technology calculating fetus mark is described in example 17, and this example hypothesis exists copying or lack and adopting following formula of whole chromosome:
Ff (i)=2*NCV jAcV jUequation 28,
Wherein j represents the chromosomal identification of aneuploidy, and CV represents the coefficient of variation being used for determining for the mean value in the expression formula of NCV and standard deviation obtained from qualified samples.
NCNFF value be use depend on do not have copy number variation chromosome or chromosome segment technology and obtain.In other words, NCN fetus mark is when supposing to be used for the normal ploidy of the genomic part calculating fetus mark, by reliably determining that the technology of fetus mark is determined.CN fetus mark is by supposing that the technology that the sample of paying attention to has a kind of form of aneuploidy is determined.The CNV of affected chromosome or chromosome segment is used for calculating CN fetus mark.Hereafter present the technology calculated for it.
By comparing the estimated value of the estimated value contrast CN fetus mark of NCN fetus mark, a kind of method can determine the type of the aneuploidy that may exist in sample.Substantially, if NCN fetus mark and CN fetus fractional value coupling, the ploidy hypothesis so in the technology for assessment of CN fetus mark can be considered to be real.Such as, if the method assumes samples calculating CN fetus mark has complete chromosomal aneuploidy, this aneuploidy represents a chromosomal single additional copies or a chromosomal single disappearance, and NCN fetus fractional value coupling CN fetus fractional value, so the method can draw to draw a conclusion: this sample represents complete chromosomal aneuploidy.The basis of making this hypothesis is described in greater detail in hereinafter.
NCN fetus mark is determined by different technology.In some embodiments, the selected polymorphism in canonical sequence genome is used to estimate NCN fetus mark.The example of these technology is described in above.In other embodiments, NCN fetus mark uses and is not knownly aneuploid or has determined that the euploid chromosomal relative quantity of right and wrong is not determined.For example, in sample, the euploid chromosome of known not right and wrong may be the chromosome x in male fetus.Therefore, in other embodiments, the relative quantity (such as, so chromosomal chromosome dosage) comprised from the X chromosome in the sample of the DNA of the pregnant woman nourishing son or Y chromosome is used to determine NCN fetus mark.The genome of son should not comprise the second copy of X chromosome.Known this point, the relative quantity of X chromosome DNA can be used for the NCN value providing fetus mark.In the sample comprising female child DNA, the euploid chromosome of known not right and wrong can be known not compatible with life chromosome.Alternately, for the sample of the DNA comprised from sex fetus, sequence label determination chromosome dosage (with NCV or NSV) can be used to confirm that chromosome can be used for determining NCN fetus mark, determine the existence that can be used for the chromosomal normal ploidy determining NCN fetus mark.
Forward the process flow diagram 1800 of Figure 18 to, compare NCN fetus mark estimated value 1802 and CN fetus mark estimated value 1804.If their couplings, as institute of square frame 1806 place indicates, so this process is reached a conclusion, and determines estimating containedly in the technology of CN fetus mark suppose it is real.In different embodiments, this is assumed to be: there is trisomy or monosomy in one of chromosome of fetus.
On the other hand, point out if this compares, the value of two fetus marks does not mate (condition 1808) and in fact the estimated value of CN fetus mark is less than NCN fetus mark, so by as square frame 1810 place indicate and perform the subordinate phase of the method.
In this subordinate phase, the method determination sample comprises aneuploidy or the mosaic of part.In addition, if sample comprises the aneuploidy of part, so the method determination aneuploidy resides in the where on aneuploid chromosome.In certain embodiments, this is by first affected chromosome being cased into multiple matrix to realize.In an example, each matrix is about 100 ten thousand base-pairs in length.Certainly, other matrix length can be used, according to appointment 1 kilobase, about 10 kilobase, about 100 kilobase etc.These matrixs are not overlapping and cross over this chromosomal major part or all length.By these matrixs or data box compared to each other, and this compares the opinion provided about condition.In one approach, for each matrix or data box, the label mapped is counted and optionally changes into data box dosage.If any one in these data boxes or matrix is aneuploid, so these countings or data box dosage are just pointed out.As a part for the analysis of independent data box, can more suitably the information normalization from each data box be made a variation between data box, as G-C content to illustrate.The normalized data box of gained can be called the NBV for normalized data bin values; NBV is an example of chromosome segment, and this chromosome segment normalizes to the label (as in following instance 19) of the normalization section of the GC content being mapped to the section with similar GC content.In some embodiments, calculate fetus mark for each data box and compare the independent value of fetus fractional value.This sequential analysis of each data box is depicted in the square frame 1812 of Figure 18.If any data box or matrix are identified as having aneuploidy (by considering label densities, fetus mark or other information), so the method determines that this sample comprises the aneuploidy of part and the data box additionally using wherein label counting fully to depart from desired value locates this aneuploidy.See square frame 1814.
But if when analyzing chromosomal these ends separately of paying attention to, the method nonrecognition represents any chromosomal region of aneuploidy, and so the method determination sample comprises mosaic.See square frame 1816.
on the interested chromosome of affected sample and the euploid chromosome of known not right and wrong (such as, chromosome x) on use polymorphism, such as SNP, calculate and more real fetus mark, so as to determine presence or absence in male fetus complete or the aneuploidy of part
As explained above, use information polymorphic sequence, such as information SNP, the fetus mark (FF) determined can be used for distinguishing complete chromosomal aneuploidy and aneuploidy partly.
Presence or absence aneuploidy, it is no matter part or complete, can determine from the value of the fetus mark using polymorphic target sequence existing interested chromosome to determine, and compare from the value of the fetus mark using polymorphic target sequence existing on chromosome different in this sample to determine.Be in the sample of the male sex fetus, the FF on interested chromosome can be determined, and compare with the FF that determines for chromosome x in same sample.Such as, given maternal sample, from the mother nourishing the male fetus with trisomy 21, so selects polymorphic sequence, such as, comprise the sequence of at least one information SNP, to be presented on chromosome 21 with on chromosome x; Polymorphic target sequence is increased and checked order, and determines fetus mark as illustrated in other places of the application.
In given fetus mark and sample, the amount of fetal chromosomal is proportional, the fetus mark so using polymorphic sequence existing on trisomy chromosome in maternal sample to determine will be 1+1/2 times of the fetus mark using the polymorphic sequence in the male fetus in identical maternal sample in the euploid chromosome of known not right and wrong (such as, chromosome x) to determine.Such as, in normal specimens, determine fetus mark (FF when using the polymorphism group on chromosome 21 21) and use the polymorphism group on chromosome x to determine fetus mark (FF x) time, known chromosome x is unaffected in male fetus, so FF 21=FF x.But, if fetus is trisomys for chromosome 21, so for the fetus mark (FF of trisomy chromosome 21 21) will the fetus mark (FF of chromosome x in same sample be equaled x) one and 1/2nd times of (FF 21=1.5 *fF x).So, if FF 21< FF x, so analysis logic can draw to draw a conclusion: there is the disappearance of the part of chromosome 21 and/or there is mosaic.If FF 21> FF x, so analysis logic can draw to draw a conclusion: a part for chromosome 21 increases to some extent, the copying or double or complete copying of the part of such as chromosome 21, and chromosome 21 is not described in the technology for being calculated fetus mark by chromosome 21.Difference between two results can one be solved for part copying, < 1.5 will be produced *fF xfF.Alternately, copying, lack or existing of the part of mosaic is determined by such as increasing the polymorphic sequence number on chromosome 21 to obtain multiple FF value along this chromosomal length, the local for the dual of FF or multiple value is existed and shows that a chromosomal part increases to some extent.Alternately, as using as the situation for mosaic sample, the FF determined by polymorphic sequence remains unchanged in chromosomal whole length, show that complete chromosomal amount totally increases, but this increase is less than for FF xincrease, as described above.When there is whole chromosomal loss, such as chromosome x monosomy, so FF monosomy=1/2FF x.The fetus fractional value obtained by information polymorphic sequence can be used to and sequence dosage and its normalized dose value, and such as NCV, NSV combination, exists complete aneuploidy for confirming.
by the chromosome Rapid Dose Calculation fetus mark of aneuploid sequence
Calculate for interested chromosomal NCV according to following equation:
NCV ij = x ij - &mu; ^ j &sigma; ^ j Equation 19,
Wherein with be the estimation mean for the chromosome dosage of the jth in qualified sample sets and standard deviation accordingly, and x ijit is an observation jth chromosome dosage of test sample i.
Generally, the chromosome dosage for trisomy will increase pro rata with fetus mark (ff).Therefore, will increase pro rata relative to fetus mark for the ff containing the chromosome dosage in the sample of trisomy chromosome:
R jA = ( 1 + ff 2 ) R jU Equation 20;
Chromosome dosage for monosomy will reduce pro rata with fetus mark (ff).Therefore, will reduce pro rata relative to fetus mark for the ff containing the chromosome dosage in the sample of Monosomy:
R jA = ( 1 - ff 2 ) R jU Equation 21; In equation 20 and 21, R jAthe chromosome dosage (x for chromosome j in affected sample (maternal sample such as, to be tested) i ij); Ff is the expection fetus mark in unaffected (qualified) sample U; And R jUthe chromosome dosage in unaffected sample.The factor " 2 " is comprised: the compute sign in equation 20 for " plus sige ", namely exists an interested chromosomal extra copy based on following hypothesis; Compute sign in equation 21 is " minus sign ", namely lacks an interested chromosomal complete copy.If make different hypothesis (such as, this is copying of interested chromosomal part) in addition, so the factor " 2 " does not represent practical significance.
Substitute the chromosome dosage R in equation 19 a:
NCV jA = R jA - R jU &OverBar; &sigma; jU Equation 22
Wherein be equivalently represented, and σ jUbe equivalently represented; Solve ff as follows:
NCV jA = ( 1 + ff 2 ) R jU &OverBar; - R jU &OverBar; &sigma; jU Or NCV jA = ( 1 - ff 2 ) R jU &OverBar; - R jU &OverBar; &sigma; jU Equation 23
NCV jA = ( ff 2 ) R jU &OverBar; &sigma; jU Or NCV jA = - ( ff 2 ) R jU &OverBar; &sigma; jU Equation 24
NCV jA = ff 2 CV jU Or NCV jA = - ff 2 CV jU Equation 25.
Therefore, any chromosomal number percent " ff that can will suppose for trisomy chromosome (i)" be defined as:
Ff (i)=2*NCV jAcV jUequation 26.
Any chromosomal number percent " ff that can will suppose for Monosomy (i)" be defined as:
Ff (i)=-2*NCV jAcV jUequation 27.
The hypothesis of equation 27 is chromosomal complete copy disappearances.The NCV that this chromosome is corresponding jAnecessarily negative.Therefore, although equation 27 is containing negative sign, the fetus mark calculated remains positive.
Because fetus mark can not be negative, any chromosomal " ff (i)" can be calculated by following equation:
Ff (i)=2*|NCV jAcV jU| equation 28
fetus mark is used to solve without judging
Conclude the ability of the significant difference of the expression determining one or more sequences existing in two genomic potpourris relative to second genomic contribution based on first genomic Relative sequence contribution.Such as, use the non-invasive prenatal diagnosis of the cfDNA in maternal sample challenging, because only have sub-fraction DNA sample to derive from fetus.For pre-natal diagnosis analysis, the background of mother body D NA defines the physical constraints to sensitivity, and therefore, the mark of foetal DNA existing in maternal sample is an important parameter.By the molecular number that sensitivity DNA molecular being counted to the fetus aneuploidy detection carried out is depended on foetal DNA mark and counted.
Typically, in the parent test sample carrying out analyzing for fetus aneuploidy by extensive parallel order-checking, about 1% is " without judging " sample, for it, insufficient order-checking information, such as foetal sequence number of tags, hinders and determines the one or more fetus aneuploidy of presence or absence in maternal sample assertorically." without judge " determine may due to fetus cfDNA content relative to maternally contributing to for provide for the content of order-checking information sample too low so that by order-checking information determined in qualified sample to distinguish caused by aneuploid sample.In order to determine " " without judging " sample yes or no aneuploid sample, determines by rule of thumb and/or such as obtains fetus mark by NVC value, and for determining or negate the existence of chromosomal aneuploidy.As described elsewhere herein, ff can be used for the type of aneuploidy existing in characterization test sample.Such as, for the threshold value be located in " without judging " district between 2.5 and 4NCV value, having the NCV close to 4 times of NCV threshold values and show the test sample with lower (being such as less than 3%) fetus mark may be affected sample.Otherwise having the NCV close to 2.5NCV threshold value and show the test sample with higher (being such as greater than 40%) fetus mark may be unaffected sample.The one that fractionation " without judging " sample may depend on fetus mark is determined.Preferably, according to two or more diverse ways, or by using the NCV utilizing identical method to determine from two or more different chromosomes of sample to determine fetus mark, similarly, whether fetus mark can be used for evaluating NCV slightly larger than 4 or the sample that is slightly less than NCV 2.5 may be that false positive or false negative judge accordingly.
for determining equipment and the system of CNV
The algorithm typically use different computing machines to perform to the analysis of sequencing data and the diagnosis that stems from it and program are carried out.Therefore, some embodiment is adopted and relates to the technique storing data in one or more computer system or other disposal systems or shifted by it.Multiple embodiment of the present invention is also about the equipment for carrying out these operations.This equipment can construct especially for required object, or it can be the multi-purpose computer (or one group of computing machine) optionally being activated by the computer program stored in computing machine and/or data structure or reconfigured.In some embodiments, one group of processor is with cooperation mode and/or perform the some or all of analysis operation (such as by network or cloud computing) described simultaneously.A processor or one group of processor for performing method as herein described can belong to different types, comprise microcontroller and microprocessor, as programmable device (such as CPLD and FPGA) and non-programmable device, as gate array ASIC or general purpose microprocessor.
In addition, some embodiment is about the computer-readable media of tangible and/or non-transitory or computer program, these media or product comprise programmed instruction and/or data (comprising data structure), and these programmed instruction and/or data (comprising data structure) are different from computer-implemented operation for performing.The example of computer-readable media includes but not limited to semiconductor storage; Magnetic media, as disc driver, tape; Optical media, as CD; Magneto-optical media; And through being configured to especially store and the hardware unit of execution of program instructions, as read-only memory device (ROM) and random access memory (RAM).Computer-readable media directly can be controlled by final user, or media can be controlled indirectly by final user.Example by directly actuated media comprises the media being positioned at user's set and/or the media place of not sharing with other mechanisms.Example by the media indirectly controlled comprises user by external network and/or by providing the service (as " cloud ") of shared resource and accessible media indirectly.The example of programmed instruction comprises machine code (as produced by program compiler) and comprises the file of the high-level code that interpreter can be used to perform by computing machine.
In different embodiments, the data adopted in disclosed method and equipment or information provide in electronic format.These data or information can comprise the reading and label that stem from nucleic acid samples, with counting or the density of these labels of the specific region comparison (such as with chromosome or chromosome segment comparison) of canonical sequence, canonical sequence (comprising the canonical sequence only or mainly putting forward polymorphism), chromosome and section dosage, judge (as aneuploidy judges), normalized chromosome and section value, pairing chromosomes or section and corresponding normalization chromosome or section, consulting suggestion, diagnosis etc.As used herein, the data provided in electronic format or other information can be stored on machine and to transmit between machine.Routinely, the data in electronic format provide with digital form, and can be used as bit and/or bytewise is stored in different data structures, list, database.These data can the mode such as electronics, optics embody.
In one embodiment, the invention provides a kind of computer program, this product is for generation of the output of presence or absence aneuploidy (such as fetus aneuploidy) or cancer in instruction test sample.This computer product can containing the instruction for performing any one or the multiple said method for determining chromosome abnormality.As described, that this computer product can comprise non-transitory and/or tangible computer-readable media, this computer-readable media has the executable logic (such as instruction) that maybe can compile of record computing machine thereon so that start treatment device is to determine chromosome dosage and to exist in some cases still there is not fetus aneuploidy.In an example, this computer product comprises computer-readable media, this computer-readable media has the executable logic (such as instruction) that maybe can compile of record computing machine thereon so that start treatment device carrys out diagnosing fetal aneuploidy, this computer product comprises: a reception program, for receiving the sequencing data of the nucleic acid molecules at least partially from maternal biological sample, wherein this sequencing data comprises chromosome as calculated and/or section dosage; Area of computer aided logic, for the data analysis fetus aneuploidy according to this reception; And a written-out program, for generation of this fetus aneuploidy of instruction existence, do not exist or the output of kind.
Order-checking information from the sample of paying attention to can be mapped to chromosome canonical sequence to identify many sequence labels for each in any one or more interested chromosomes and to identify many sequence labels for the normalization sector sequence of each in any one or more interested chromosomes described.In different embodiments, these canonical sequences store in a database, such as relation curve or target database.
Should be understood that allow one do not use the people of aid to perform herein disclosed by the calculating operation of method be in most of the cases unpractical or even impossible.Such as, when assisting without calculation element, the single 30bp reading from sample is mapped to the effort that any one human chromosomal may need several years.Certainly, this problem judges to need generally to map one or more chromosomal thousands of (such as at least about 10,000) or even millions of readings and complicated due to reliable aneuploidy.
Method herein can use computer-readable media to perform, and this computer-readable media has computer-readable instruction stored thereon, for performing for identifying any CNV, and the such as method of the aneuploidy of chromosome or part.Therefore, in one embodiment, the invention provides a kind of computer-readable media, this computer-readable media has computer-readable instruction stored thereon, for performing such as, for differentiating chromosomal aneuploidy that is complete and part, the method for fetus aneuploidy.These instructions can comprise such as carrying out the instruction of following operation: (a) obtain for the fetus in a sample and maternal nucleic acids sequence information and/or at least temporarily these information are stored in computer-readable media; B () uses the sequence information that stores from the many sequence labels be selected from the interested chromosome of chromosome 1-22, X and Y each for any one or more of the potpourri Computer identification of fetus and maternal nucleic acids, and identify many sequence labels at least one the normalization chromosome sequence in this one or more interested chromosome each; And (c) uses for the sequence label number of each identification in this one or more interested chromosome and the sequence label number for the identification of each normalization chromosome sequence, by each interested chromosomal single chromosome dosage of computer calculate.These instructions can use one or more processor through suitably design or configuration to perform.These instructions additionally can comprise and each chromosome dosage and dependent thresholds being compared, and determine any four kinds of presence or absence or more kind part in this sample thus or complete different fetal chromosomal aneuploidies.As described above, there is many change programmes about this technique.All these change programmes can be implemented when using process as described here and storing feature.
In some embodiments, these instructions may further include for providing in the patient medical records of the human experimenter of parent test sample the information automatically recorded about the method, as chromosome dosage and presence or absence fetal chromosomal aneuploidy.This patient medical records can be preserved by such as laboratory, doctor's office, hospital, HMO, insurance company or individual medical records website.In addition, based on the result of the analysis implemented by processor, the method can relate to further prescribe, treatment that initial and/or change obtains the human experimenter of parent test sample.This additional samples that may relate to taking from this experimenter carries out one or more additional testings or analysis.
Disclosed method can also use computer processing system to perform, and this computer processing system performs for identifying any CNV through adjusting or being configured to, such as the method for the aneuploidy of chromosome or part.Therefore, in one embodiment, the invention provides a kind of computer processing system, it is through adjusting or be configured to perform method as described herein.In one embodiment, this equipment comprises a sequencing device, and this sequencing device checks order to obtain the sequence information type herein described in other parts to the nucleic acid molecules at least partially in sample through adjusting or being configured for.This equipment can also comprise the device for the treatment of sample.These devices are described in other parts of this paper.
Sequence or other data can be input in computing machine directly or indirectly or be stored on computer-readable media.In one embodiment, computer system is directly connected on the sequencing device that can read and/or analyze from the nucleotide sequence of sample.The sequence or other information that derive from these instruments are provided in computer systems, which by interface.As an alternative, store source by sequence, as database or other thesauruss provide the sequence by system process.After with this treating apparatus, memory storage or mass storage device at least temporarily cushion or store the sequence of nucleic acid.In addition, memory storage can store for different chromosome or genomic label counting etc.This storer can also store subroutine for analyzing the sequence of existence or the different of mapping (enum) data and/or program.These program/subroutines can comprise the program etc. for performing statistical study.
In an example, user provides a sample in sequencing device.Collect by the sequencing device being connected to computing machine and/or analyze data.Software on this computing machine allows Data Collection and/or analysis.Data can store, show (by monitor or other similar devices) and/or be sent to another location.This computing machine can be connected to the Internet, for transferring data in handheld type devices that long-distance user (such as doctor, scientist or analyst) uses.Should be understood that and can store before being transmitted and/or analyze data.In some embodiments, collect raw data and send to the long-distance user of this data analysis and/or storage or device.Transmit by the Internet, but also can be undertaken by satellite or other connections.As an alternative, can store data on computer-readable media, and these media can be delivered to final user place (such as passing through mail).This long-distance user can be in identical or different geographic position, includes but not limited to buildings, city, state, country or continent.
In some embodiments, these methods also comprise collect about multiple polynucleotide sequence data (such as reading, label and/or with reference to chromosome sequence) and these data are sent to computing machine or other computing systems.Such as, this computing machine can be connected to laboratory equipment, such as sample collection device, amplification oligonucleotide device, nucleotide sequencing device or hybrid device.Then, this computing machine can collect the proper data gathered by lab setup.In any step, such as, in real time, before transmitting, during sending or simultaneously or after sending these data can be stored on computers when collecting.These data can be stored on the computer-readable media that can extract from this computing machine.Data that are collected or that store can be transferred to remote location from this computing machine, such as, by LAN (Local Area Network) or wide area network, as the Internet.At this remote location place, different operations can be carried out to transmitted data as mentioned below.
The type of the electronic format data that can store, transmit, analyze and/or operate in the system disclosed by this paper, device and method is as follows:
By carrying out to the nucleic acid in test sample the reading obtained that checks order
By the label obtained that reading and reference gene group or other canonical sequences are compared
This reference gene group or sequence
Sequence label density-for reference to the counting of each in two or more regions (typically being chromosome or chromosome segment) of genome or other canonical sequences or number of tags
For interested specific chromosome or the normalization chromosome of chromosome segment or the consistance of chromosome segment
For available from the chromosome of interested chromosome or section and corresponding normalization chromosome or section or the dosage of chromosome segment (or other regions)
For judge chromosome dosage influenced, uninfluenced or without judge threshold value;
The actual judgement of chromosome dosage
Diagnosis (judging relevant clinical condition to these)
Stem from the suggestion for other tests that these judge and/or diagnose
Stem from treatment and/or the monitoring plan of these judgements and/or diagnosis
These different data types can use in one or more position different devices to obtain, store, transmit, analyze and/or operation.Processing selecting crosses over relative broad range.In one end of this scope, in the position of this test sample of process, such as doctor's office or other clinical settings store this information all or most and use.In another kind is extreme, sample is obtained a position, in different positions, it processed and optionally check order, judge at one or more different position comparison reading, and make diagnosis, suggestion and/or plan another position (it can be the position obtaining sample) again.
In different embodiments, utilize this sequencing device to produce these readings, be then transferred to remote site, at this remote spots place, it is processed to produce aneuploidy and judge.At this remote location, for example, these readings and canonical sequence are compared to produce label, it is counted and distributes to interested chromosome or section.Same at this remote location, use relevant normalization chromosome or section that these countings are changed into dosage.Further again, at this remote location, these dosage are used for produce aneuploidy and judge.
The process operation that can adopt at diverse location is as follows:
Sample collection
Sample preparation before order-checking
Order-checking
Analytical sequence data and aneuploidy of deriving judge
Diagnosis
To patient or nursing supplier's report diagnostic and/or judgement
Formulate for the plan for the treatment of further, test and/or monitor
Perform this plan
Consulting
Any one or more in these operations can robotization as described elsewhere herein.Typically, order-checking and sequence data is analyzed and derive aneuploidy judge will perform on computers.Other operations can artificially or automatically perform.
The example that can carry out the position of sample collection comprises health worker office, clinic, patient family's (wherein sampling collection kit or kit) and Mobile nursing vehicle.The example of position of front sample preparation of can carrying out checking order comprises the facility that health worker office, clinic, patient family's (wherein sampling treating apparatus or kit), Mobile nursing vehicle and aneuploidy analyze supplier.The example that can carry out the position of checking order comprises the facility that health worker office, clinic, health worker office, clinic, patient family's (wherein sampling sequencing device and/or kit), Mobile nursing vehicle and aneuploidy analyze supplier.The position of carrying out checking order can provide dedicated Internet access for the sequencing data (typically be reading) of transmission in electronic format.This connection can be wired or wireless, and and can may process and/or the website of combined data data are sent to before being transferred to process points through configuration.Data summarization device can be safeguarded by health care organization, as HMO (HMO).
Analysis and/or derivation operation in any above-mentioned position, or as an alternative, can be carried out at another remote site being devoted to calculating and/or nucleic acid sequence data Analysis Service.These positions comprise such as cluster, as generic server district, aneuploidy Analysis Service industry facility etc.In some embodiments, the calculation element for execution analysis is leased or is rented.Computational resource can be a part for processor accessible set in the Internet, as being commonly called as the process resource for cloud.In some cases, calculate and performed by associated with each other or not associated parallel or Massively Parallel Processor group.Process can use distributed treatment to realize, as PC cluster, grid computing etc.In these embodiments, the cluster of computational resource or grid are concentrated to be formed and are worked to perform by one the super virtual machine that multiple processor of analysis as herein described and/or derivation or computing machine form.These technology and how conventional supercomputer can be used for processing sequence data as described herein.Separately for depending on the parallel computing form of processor computer.When grid computing, these processors (being often complete computing machine) are connected by conventional network protocol (as Ethernet) by network (private, public or the Internet).On the contrary, supercomputer has many processors connected by local high-speed computer bus.
In certain embodiments, diagnosis (such as fetus suffers from the cancer that Down syndrome or patient suffer from particular type) is produced in the position identical with analysis operation.In other embodiments, it performs in different positions.In some instances, report diagnostic performs in the position obtaining sample, but situation is also not necessarily like this.Can to produce or report diagnostic and/or the example of position of making a plan comprise health worker office, clinic, the accessible internet site of computing machine and have the handheld type devices of the wired or wireless connection being connected to network, as mobile phone, flat board, smart phone etc.The example carrying out the position of seeking advice from comprises health worker office, clinic, the accessible internet site of computing machine, handheld type devices etc.
In some embodiments, carry out sample collection, sample preparation and sequencing procedures first position, and carry out derivation operation second position.But in some cases, sample collection collects in a position (such as health worker office or clinic), and sample preparation and order-checking carry out a different position, and this position is optionally for carrying out the same position analyzed and derive.
In different embodiments, the order of operation listed above can by sample collection, the user of sample preparation and/or order-checking or mechanism trigger.After starting to perform these operations one or more, other operations can naturally subsequently.Such as, sequencing procedures can make reading be automatically collected and be sent to treating apparatus, and then this treating apparatus usually automatically and may when without carrying out sequential analysis and the operation of derivation aneuploidy when other user interventions.In some implementations, then the result that this process operates automatically is sent (may with reformat as diagnose) to system component or mechanism, this system component or mechanism's process information and be reported to fitness guru and/or patient.As described, this information, together with consultation information, may can also process to produce treatment, test and/or monitoring plan through automatic.Therefore, starting early stage operation can trigger end opposite end order, provides diagnosis wherein, plans, seeks advice from and/or can be used for act on other information of physical condition to fitness guru, patient or other associated groups.Even if each several part of whole system is separated physically and may away from the position of such as sample and sequence device, this measure also can realize.
Figure 19 shows an implementation for producing the dispersant system judged or diagnose from test sample.Sample collection position 01 is for from patient, and the cancer patient place as pregnant female or supposition obtains test sample.Then sample is provided to process and order-checking position 03, wherein can as described above test sample is processed and be checked order.Position 03 comprises for the treatment of the device of sample and the device for checking order to treated sample.Sequencing result is as described elsewhere herein the set of reading, and these readings typically provide in electronic format and are provided to network, and as the Internet, this network indicates with reference numbering 05 in Figure 19.
This sequence data is provided to remote location 07 place, carries out wherein analyzing and judge to produce.This position can comprise one or more efficient calculation device, as computing machine or processor.The computational resource setting to 07 places in place has completed their analysis and after producing a judgement from received sequence information, this judgement point journey has been delivered to network 05.In some embodiments, not only 7 places that set to 0 in place produce judgement, but also produce dependent diagnostic.Then as illustrated in fig. 19 by this judgement and or diagnosis pass sample collection position 01 back by Internet Transmission.As described, this only about how distributes between different locations and produces one of many change programmes of judging or diagnosing the different operation of being correlated with.A common change programme relates to be collected and processes in the sampling of single position and order-checking.Another change programme relate to analysis with judge to produce identical position and provide and process and check order.
Figure 20 describes in detail for the selection performing different operations in different positions.In the most comprehensive described in fig. 20 meaning, each following operation carries out in the position separated: sample collection, sample preparation, order-checking, read-around ratio to, judge, diagnosis and report and/or plan.
Gather these operation in some embodiment in, carry out sample preparation and order-checking a position, and carry out a position separated read-around ratio to, judge and diagnosis.See Figure 20 by the part identified with reference to alphabetical A.In the another kind of implementation identified by the letter b in Figure 20, sample collection, sample preparation and order-checking are all carried out in same position.In this implementation, read-around ratio is carried out second position with judgement.Finally, diagnosis and report and/or program launched carry out the 3rd position.By in the implementation described in the letter C in Figure 20, sample collection carries out first position, sample preparation, order-checking, read-around ratio to, judge and diagnosis carry out second position all together, and report and/or plan carry out the 3rd position.Finally, in the implementation marked by the alphabetical D in Figure 20, sample collection carries out first position, sample preparation, order-checking, read-around ratio to and judge all carry out second position, and diagnosis and report and/or plan process carry out the 3rd position.
In one embodiment, the invention provides a kind of system, for determining that the parent comprising fetus and maternal nucleic acids tests presence or absence any one or multiple different complete fetal chromosomal aneuploidy in sample, this system comprises: an order-checking device, for receiving nucleic acid samples and providing the fetus and maternal nucleic acids sequence information that derive from this sample; A processor; And a machine-readable medium, comprise the instruction for performing on the processor, these instructions comprise:
A () is for obtaining the code of the sequence information of these fetuses in this sample and maternal nucleic acids;
B () identifies the many sequence labels for each being selected from any one or more interested chromosomes of chromosome 1-22, X and Y for using described sequence information by computing machine from these fetuses and maternal nucleic acids, and identify the code at least one normalization chromosome sequence of each in any one or more interested chromosomes described or many sequence labels of normalization chromosome segment sequence;
C () is for using for each the described sequence label number identified in any one or more interested chromosomes described and the code calculating the single chromosome dosage of each for this in any one or more interested chromosomes for the described sequence label number of each normalization chromosome sequence or normalization chromosome segment recognition sequence; And
D () for comparing the corresponding threshold value of each single chromosome dosage of each for this in any one or more interested chromosomes and each in any one or more interested chromosomes for this, and determines the code of presence or absence any one or multiple complete different fetal chromosomal aneuploidies in this sample thus.
In some embodiments, for the code calculated for the single chromosome dosage of each in any one or more interested chromosome comprise for by selected interested chromosomal chromosome Rapid Dose Calculation for for the code of selected interested chromosomal sequence label number to the ratio of the sequence label number identified for selected at least one interested chromosomal corresponding normalization chromosome sequence or normalization chromosome segment sequence.
In some embodiments, this system comprises further for the code of double counting for the chromosome dosage of each in all the other chromosome segments any of any one or more interested any one or more sections chromosomal.
In some embodiments, this the one or more interested chromosome being selected from chromosome 1-22, X and Y comprises the chromosome that at least two ten are selected from chromosome 1-22, X and Y, and wherein these instructions comprise the instruction for determining the complete fetal chromosomal aneuploidy that presence or absence at least two ten kinds is different.
In some embodiments, this at least one normalization chromosome sequence is the group chromosome being selected from chromosome 1-22, X and Y.In other embodiments, this at least one normalization chromosome sequence is the monosome being selected from chromosome 1-22, X and Y.
In another embodiment, the invention provides a kind of system, parent for determining to comprise fetus and maternal nucleic acids tests the fetal chromosomal aneuploidy of presence or absence any one or multiple different part in sample, this system comprises: an order-checking device, for receiving nucleic acid samples and providing the fetus and maternal nucleic acids sequence information that derive from this sample; A processor; And a machine-readable medium, comprise the instruction for performing on the processor, these instructions comprise:
A () is for obtaining the code of the sequence information of described fetus in described sample and maternal nucleic acids;
B () identifies the many sequence labels of each for being selected from any one or more interested any one or more sections chromosomal of chromosome 1-22, X and Y for using described sequence information by computing machine from these fetuses and maternal nucleic acids, and identify the code of the many sequence labels at least one the normalization sector sequence of each in any one or more interested chromosomal any one or more sections described;
C () calculates the code for the single chromosome segment dosage of each in any one or more interested chromosomal any one or more sections described for using for each the described sequence label number identified in any one or more interested chromosomal any one or more sections described and for the described sequence label number of described normalization sector sequence identification; And
(d) for compare for each in the described single chromosome segment dosage of each in any one or more interested chromosomal any one or more sections described with for any one or more interested chromosomal described in the corresponding threshold value of each in any one or more chromosome segments, and determine the code of the fetal chromosomal aneuploidy of one or more different parts of presence or absence in described sample thus.
In some embodiments, the code for calculating single chromosome segment dosage comprises the code of the ratio for the sequence label number chromosome segment Rapid Dose Calculation of selected chromosome segment identified to the corresponding normalization sector sequence for selected chromosome segment for the sequence label number identified for selected chromosome segment.
In some embodiments, this system comprises further for the code of double counting for the chromosome segment dosage of each in all the other chromosome segments any of any one or more interested any one or more sections chromosomal.
In some embodiments, this system comprises (i) further for repeating the code of (a)-(d) for the test sample from different female subject, and (ii) is for determining the code of the fetal chromosomal aneuploidy of any one or more different parts of presence or absence in each in described sample.
In other embodiments of any system provided in this article, this code comprises further for providing the code automatically recording presence or absence fetal chromosomal aneuploidy in the patient medical records of the human experimenter of parent test sample, wherein makes purpose processor perform this record for determining according to institute in (d).
In some embodiments of any system provided in this article, order-checking device is through being configured to perform order-checking (NGS) of future generation.In some embodiments, device is checked order through being configured to use synthetic method order-checking, utilize reversible dye-terminators to perform extensive parallel order-checking.In other embodiments, device is checked order through being configured to perform connection method order-checking.In other embodiments again, order-checking device is through being configured to perform single-molecule sequencing.
for determining the equipment of fetus mark
A kind of equipment for carrying out medical analysis to sample can be used to provide about one or two genome is to the information of the mark that mixtures of nucleic acids is contributed, carry out the analysis to the sequence label deriving from order-checking sample (such as maternal sample).For example, plurality of devices is provided to the fetal nucleic acid mark determined from the sequence label analysis that obtains of order-checking maternal sample in the potpourri of fetus and the maternal nucleic acids existed in maternal sample.The medical supply provided comprises a series of device, these devices for carry out as described by other parts of the application for determining the step of the method for fetus mark.
Figure 65 shows an a kind of embodiment of medical analysis equipment, and this medical analysis equipment is used for determining fetus mark in the parent test sample of the potpourri comprising fetus and maternal nucleic acids.This equipment comprises:
A device (a), for receiving from the described fetus in described parent test sample and the multiple sequence reads of maternal nucleic acids;
A device (b), for described multiple sequence reads and one or more chromosome reference sequences being compared, and provides the multiple sequence labels corresponding to these sequence reads thus;
A device (c), for identifying a number of those sequence labels from one or more interested chromosome or interested chromosome segment, these chromosomes or chromosome segment are selected from chromosome 1-22, X and Y and section thereof, and for for each in described one or more interested chromosome or interested chromosome segment, identify a number of those sequence labels from least one normalization chromosome sequence or normalization chromosome segment sequence, to determine a chromosome dosage or chromosome segment dosage, wherein, described interested chromosome or interested chromosome segment have copy number variation, and
A device (d), for using the dosage of described interested chromosomal dosage or described interested chromosome segment to determine described fetus mark.
Preferably, the signal output part of this device (a) is connected with this dress (b), the signal output part of this device (b) is connected with this device (c), and the signal output part of this device (c) is connected with this device (d).
In certain embodiments, described copy number variation determines by being compared with a respective threshold for each chromosome in described one or more interested chromosome or interested chromosome segment or chromosome segment by the described chromosome dosage of each chromosome in described one or more interested chromosome or interested chromosome segment or chromosome segment.
Fetus can with copy number variation comprises that complete chromosome copies, complete chromosome disappearance, partial replication, part double, partial insertion and excalation.
In certain embodiments, the ratio of the number of the sequence label that the number that the chromosome determined by device (c) or section Rapid Dose Calculation are the sequence label identified for described selected interested chromosome or section identifies at least one corresponding normalization chromosome sequence or the normalization chromosome segment sequence for selected interested chromosome or section.In certain embodiments, the described chromosome dosage determined by device (c) or section Rapid Dose Calculation are sequence label density ratio and each described selected interested chromosome of described selected interested chromosome or section or the ratio of at least one corresponding normalization chromosome sequence of section or the sequence label density ratio of normalization chromosome segment sequence.
In certain embodiments, this equipment comprises device (e) further, this device (e) is for calculating a normalization chromosome value (NCV) or a normalization section value (NSV), wherein calculating this NCV makes this chromosome dosage associate to the mean value of the corresponding chromosome dosage in one group of qualified samples, as:
NCV iA = R iA - R iU &OverBar; &sigma; iU
Wherein and σ iUthe estimation mean value for i-th chromosome dosage in this group qualified samples and standard deviation accordingly, and R iAbe the chromosome dosage calculated for i-th chromosome in test sample, wherein said i-th chromosome is described interested chromosome; Wherein calculating this NSV makes this chromosome segment dosage associate to the mean value of the corresponding chromosome segment dosage in one group of qualified samples, as:
NSV iA = R iA - R iU &OverBar; &sigma; iU
Wherein and σ iUthe estimation mean value for i-th chromosome segment dosage in this group qualified samples and standard deviation accordingly, and R iAbe the chromosome segment dosage calculated for i-th chromosome segment in test sample, wherein said i-th chromosome segment is described interested chromosome segment.Preferably, the signal output part of device (c) is connected with device (e).
In certain embodiments, the device (d) of this equipment then determines fetus mark according to following formula:
ff=2×|NCV iACV iU|
Wherein ff is fetus fractional value, NCV iAthe normalized chromosome value in an influenced sample (such as, maternal sample to be tested) on i-th chromosome, and CV iUit is the coefficient of variation of the interested chromosomal dosage determined in these qualified samples; Or determine fetus mark according to following formula:
ff=2×|NSV iACV iU|
Wherein ff is fetus fractional value, NSV ix is the normalized chromosomal region segment value in an influenced sample (such as, maternal sample to be tested) on i-th chromosome segment, and CV iUbe the coefficient of variation of i-th the chromosomal dosage determined in these qualified samples, wherein said i-th chromosome is described interested chromosome.Preferably, the signal output part of device (e) is connected with device (d).
In certain embodiments, interested chromosome is the X chromosome of autosome or male fetus, and interested chromosome segment is selected from the X chromosome of autosome or male fetus.
In certain embodiments, this at least one normalization chromosome sequence or normalization chromosome segment sequence be for a kind of interested chromosome of being associated or section a chromosome selecting or section, this carries out in the following manner, that is: (i) identifies the multiple qualified samples for this interested chromosome or section; (ii) use multiple potential normalization chromosome sequence or normalization chromosome segment sequence come for this chromosome selected or chromosome segment double counting chromosome dosage or chromosome segment dosage; And (iii) individually or in one combination this normalization chromosome sequence or normalization chromosome segment sequence are selected, thus provide minimum variability or maximum resolvability in calculated chromosome dosage or chromosome segment dosage.In certain embodiments, normalization chromosome sequence is a monosome any one or more in chromosome 1 to 22, X and Y; Alternately, normalization sequence is any chromosomal group chromosome in chromosome 1 to 22, X and Y.In certain embodiments, normalization sector sequence is a single section any one or more in chromosome 1 to 22, X and Y; Alternately, normalization sector sequence is one group of section any one or more in chromosome 1 to 22, X and Y.
In certain embodiments, for determining that the equipment of fetus mark comprises a device further, this device is used for the described fetus mark using chromosome dosage or chromosome segment dosage to determine to show the fetus mark that the unbalanced information being present in non-described chromosomal one or more polymorphism interested of allele determines and compare with using to test in the fetus of sample and maternal nucleic acids from parent.
In certain embodiments, this equipment comprises a sequencing device (10) further, and this sequencing device (10) is configured to for checking order to the fetus in a parent test sample and maternal nucleic acids and obtain sequence reads.Preferably, the signal output part of sequencing device (10) is connected with device (a).
In certain embodiments, sequencing device (10) is configured to for carrying out synthetic method order-checking.Synthetic method order-checking can use reversible dye-terminators to carry out.In other embodiments, sequencing device (10) is configured to for carrying out connection method order-checking.In other other embodiments, sequencing device (10) is configured to for carrying out single-molecule sequencing.
In certain embodiments, sequencing device (10) is arranged in device (a)-(d) place separated, and the signal output part of sequencing device (10) is connected by network with device (a).
In certain embodiments, comprise as described in this equipment of sequencing device comprise device (11) further, this device (11) is for obtaining parent test sample from a pregnant mothers.Device (11) for obtaining parent test sample can be arranged in device (a)-(d) and (10) place separated.Except comprising device (a)-(d) and (10), this equipment may further include device (12), and this device (12) is for testing sample extraction Cell-free DNA from this parent.In certain embodiments, be arranged in same place for the device (12) extracting Cell-free DNA with sequencing device (10), and be arranged in a remote site for the device (11) obtaining parent test sample.
In certain embodiments, this determines that the equipment of fetus mark also comprises a memory storage, for the sequence reads that at least temporarily memory storage (a) accepts.Preferably, the signal output part of device (a) is connected with memory storage, and the signal output part of memory storage is connected with device (b).
for determining the extra equipment of fetus mark-classify to copy number variation
Additionally provide a kind of extra medical analysis equipment, for classifying to the copy number variation in the Fetal genome comprised in a maternal sample of fetus and maternal nucleic acids (such as Cell-free DNA).This extra equipment comprises the device for determining fetus mark and the device for comparing the fetus fractional value determined by diverse ways.The fetus mark that this extra equipment use two calculates is classified to the copy number variation in Fetal genome.Blood, blood plasma, serum or urine samples can be selected from for the maternal sample analyzed by this equipment.In certain embodiments, maternal sample is plasma sample.Figure 66 shows an embodiment of this type of medical analysis equipment.
In one embodiment, provide a kind of for the medical analysis equipment of classifying that makes a variation to the copy number in Fetal genome, this equipment comprises:
Device (1), for receiving the sequence reads from the fetus in a test sample and maternal nucleic acids;
Device (2), for described sequence reads and one or more chromosome reference sequences being compared, and provides the multiple sequence labels corresponding with these sequence reads thus;
Device (3), identifies the number from one or more interested these sequence labels chromosomal, and determines that a first interested chromosome in this fetus makes a variation with a kind of copy number;
Device (4), for calculating a first fetus fractional value by a kind of first method, this first method does not use the information from these first interested these labels chromosomal;
Device (5), for calculating a second fetus fractional value by a kind of second method, this second method uses the information from these labels of this first chromosome; And
Device (6), classifies for being compared with this second fetus fractional value by this first fetus fractional value and using this to compare and to make a variation to this copy number of this first chromosome.
Preferably, the signal output part of device (1) is connected with device (2), the signal output part of device (2) is connected with device (3), device (2) is connected with device (4) with the signal output part of (3), device (2) is connected with device (5) with the signal output part of (3), and device (4) is connected with device (6) with the signal output part of (5).This first interested chromosome can be selected from any one in chromosome 1 to 2, X and Y.
In certain embodiments, this extra equipment also comprises a memory storage, for the sequence reads that at least temporarily memory storage (1) accepts.Preferably, the signal output part of device (1) is connected with memory storage, and the signal output part of memory storage is connected with device (2).
In certain embodiments, device (4) for calculating the first method of the first fetus mark comprises the information that uses from allele one or more polymorphisms unbalanced represented in this parent test fetus of sample and maternal nucleic acids to calculate an assembly of this first fetus fractional value, and described polymorphism is present in non-described first chromosomal chromosome interested; Comprise with the device (5) of this second method for calculating the second fetus fractional value:
A () assembly (5-1), for calculating number from the sequence label of this first interested chromosome and at least one normalization chromosome sequence to determine chromosome dosage; With
B () assembly (5-2), for using this second method from this fetus fractional value of this chromosome Rapid Dose Calculation.In certain embodiments, device (2) is connected with assembly (5-1) with the signal output part of (3), and the signal output part of assembly (5-1) is connected to assembly (5-2), and the signal output part of assembly (5-2) is connected with device (6).
In certain embodiments, the information that the device (4) of the first method uses comprises by carrying out the sequence label obtained that checks order to predetermined polymorphic sequence, and each of described polymorphic sequence comprises described one or more polymorphic site.The information that the device (4) of the first method uses be may not be and obtained by sequence measurement, such as, is obtained by the non-sequence measurement such as qPCR, digital pcr, mass spectroscopy or capillary gel electrophoresis.
In certain embodiments, the device (4) for the first method comprises the assembly using and come from and do not have the chromosome of copy number variation or this first fetus fractional value of tag computation of chromosome segment.For example, when this first interested chromosome is chromosome 21, can compare using the sequence label determined fetus mark coming from chromosome 21 with according to the determined fetus mark of sequence label coming from the chromosome x in male fetus.Knownly not occur with aneuploid state, or to be determined in the test sample by any method described here be not that any chromosome of aneuploid (such as by calculating itself NCV or NSV to determine) or chromosome segment may be used to determine fetus mark by device (4).
In certain embodiments, device (5) for calculating this second method of this fetus fractional value comprises the assembly (5-3) for calculating a normalization chromosome value (NCV) further, this assembly (5-3) wherein for calculating this NCV makes this chromosome dosage associate to the mean value of the corresponding chromosome dosage in one group of qualified samples, as:
NCV iA = R iA - R iU &OverBar; &sigma; iU
Wherein and σ iUthe estimation mean value for i-th chromosome dosage in this group qualified samples and standard deviation accordingly, and R iAbe the chromosome dosage calculated for i-th chromosome in test sample, wherein said i-th chromosome is described interested chromosome.
Preferably, the signal output part of assembly (5-1) is connected with assembly (5-3), and the signal output part of assembly (5-3) is connected with assembly (5-2).
In certain embodiments, for using this normalization chromosome value by the second method from the assembly (5-2) of this this fetus fractional value of chromosome Rapid Dose Calculation.Assembly (5-2) for the device (5) calculating this second method of this fetus fractional value assesses this fetus mark according to following formula:
ff=2×|NCV iACV iU|
Wherein ff is the second fetus fractional value, NCV iAthe normalized chromosome value in an influenced sample (such as, maternal sample to be tested) on i-th chromosome, and CV iUbe the coefficient of variation of i-th the chromosomal dosage determined in described qualified samples, wherein said i-th chromosome is described interested chromosome.
In certain embodiments, the device (4) calculating the first method of the first fetus mark comprising: (a) assembly (4-1), for calculating sequence label number from non-described first chromosomal chromosome interested and at least one normalization chromosome sequence to determine this non-described first chromosomal chromosome dosage interested; And (b) assembly (4-2), for passing through this first method from this first fetus fractional value of this chromosome Rapid Dose Calculation; With, the device (5) calculating the second method of the second fetus mark comprising: (a) assembly (5-1), for calculating sequence label number from this first interested chromosome and at least one normalization chromosome sequence to determine a chromosome dosage; And (b) assembly (5-2), for passing through this second method from this second fetus fractional value of this chromosome Rapid Dose Calculation.
Preferably, the device (4) of the first method comprises an assembly (4-3) further, the device (5) of the second method comprises an assembly (5-3) further, assembly (4-3) and assembly (5-3) calculate normalized chromosome value (NCV) respectively, the chromosome dosage that assembly (4-1) and assembly (5-1) are determined is associated to the mean value of the corresponding chromosome dosage in one group of qualified samples by assembly (4-3) and assembly (5-3) respectively, as:
NCV iA = R iA - R iU &OverBar; &sigma; iU
Wherein and σ iUthe estimation mean value for i-th chromosomal dosage in this group qualified samples and standard deviation respectively, and R iAi-th chromosomal dosage in the test sample calculated,
Wherein, for the device (4) of this first method, described i-th chromosome is described non-described first chromosomal chromosome interested; For the device (5) of this second method, described i-th chromosome is described first interested chromosome.
Preferably, the signal output part of assembly (4-1) is connected with assembly (4-3), and the signal output part of assembly (4-3) is connected with assembly (4-2), wherein assembly (4-2) is by using described first method of corresponding normalized chromosome value from corresponding chromosome Rapid Dose Calculation first fetus fractional value; The signal output part of assembly (5-1) is connected with assembly (5-3), and the signal output part of assembly (5-3) is connected with assembly (5-2), wherein assembly (5-2) is by using described second method of corresponding normalized chromosome value from corresponding chromosome Rapid Dose Calculation second fetus fractional value.
In certain embodiments, the assembly (5-2) of the assembly (4-2) of the device (4) of the first method and the device (5) of the second method is by following formula evaluation:
ff=2×|NCV iACV iU|
Wherein ff is fetus fractional value, VCV iAthe normalized chromosome value in an influenced sample (such as, maternal sample to be tested) on i-th chromosome, and CV iUit is the coefficient of variation of i-th chromosomal dosage in described qualified samples;
Wherein, for the device (4) for this first method, described i-th chromosome is described non-described first chromosomal chromosome interested; For the device (5) for this second method, described i-th chromosome is described first interested chromosome.Preferably, when described fetus is the male sex, described non-described first chromosomal chromosome interested is X chromosome.
In certain embodiments, the device (6) of more described first fetus fractional value and described second fetus fractional value determines two fetus fractional values whether approximately equal.In certain embodiments, determine when device (6) is included in described two fetus fractional value approximately equals further that a kind of ploidy implied in described second method supposes real assembly.The described ploidy hypothesis implied in described second method can be, described first interested chromosome has a kind of complete chromosome aneuploidy, such as, described first interested chromosomal complete chromosome aneuploidy is a kind of monosomy or a kind of trisomy.
In certain embodiments, described extra equipment comprises the device (7) analyzing described first interested chromosomal label information further, with determine whether (i) first interested chromosome with a kind of part aneuploidy, or (ii) this fetus is a chimera, the device (7) wherein analyzing this first interested chromosomal label information be configured to the described device (6) comparing the first fetus fractional value and the second fetus fractional value indicate these two fetus fractional values not approximately equal time perform.Preferably, device (2), (3) are connected with device (7) with the signal output part of (6).
In certain embodiments, in described extra equipment, the device (4) of the first method comprises the information that uses from allele one or more polymorphisms unbalanced represented in the fetus of this parent test sample and maternal nucleic acids to calculate an assembly of this first fetus fractional value, and described polymorphism is present in non-described first chromosomal chromosome interested; The device (5) of the second method comprises the information that uses from allele one or more polymorphisms unbalanced represented in the fetus of this parent test sample and maternal nucleic acids to calculate an assembly of this second fetus fractional value, and described polymorphism is present in described first interested chromosome.The information that the device (4) of the first method uses can comprise by carrying out the sequence label obtained that checks order to predetermined polymorphic sequence, and each of described polymorphic sequence comprises described one or more polymorphic site.The information that the device (4) of the first method uses be may not be and obtained by sequence measurement, such as, is obtained by the non-sequence measurement such as qPCR, digital pcr, mass spectroscopy or capillary gel electrophoresis.
In certain embodiments, the device (6) for comparing comprising: determine that described first interested chromosome is a diplontic assembly when the ratio of described second fetus fractional value and the first fetus fractional value is approximately 1; Determine that described first interested chromosome is a triploid assembly when the ratio of described second fetus fractional value and the first fetus fractional value is approximately 1.5; With, determine that described first interested chromosome is a haploid assembly when the ratio of described second fetus fractional value and the first fetus fractional value is approximately 0.5.
Preferred, this extra equipment of classifying for making a variation to copy number comprises the device (7 ') analyzing described first interested chromosomal label information further, with determine whether (i) first interested chromosome with a kind of part aneuploidy, or (ii) this fetus is a chimera, the device (7 ') wherein analyzing this first interested chromosomal label information is configured to indicate the ratio of the second fetus fractional value and the first fetus fractional value not to be be approximately 1 at the described device (6) comparing the first fetus fractional value and the second fetus fractional value, perform when 1.5 or 0.5.Preferably, device (2), (3) are connected with device (7 ') with the signal output part of (6).
In certain embodiments, dissecting needle comprising the device (7) of this first interested chromosomal label information or (7 '): (a) assembly (7-1), for this first interested chromosomal sequence vanning is entered multiple part; Whether (b) assembly (7-2), comprise the nucleic acid significantly more or significantly more less than other parts one or more for any one determining in described part; And (c) assembly (7-3), if if for determining when any one contains significantly more or significantly less nucleic acid in described part compared with other parts one or more with a kind of part aneuploidy or when described part compared with other parts one or more does not all comprise significantly more or significantly less nucleic acid, this first interested chromosome determines that this fetus is a chimera.Preferably, device (2), (3) are connected with assembly (7-1) with the signal output part of (6), and the signal output part of assembly (7-1) is connected to assembly (7-2), and the signal output part of assembly (7-2) is connected to assembly (7-3).In certain embodiments, assembly (7-3) determines that this first interested chromosomal part comprising the nucleic acid significantly more or significantly more less than other parts one or more is with part aneuploidy further.
In certain embodiments, the first interested chromosome is selected from lower group, and this group is made up of chromosome 1-22, X and Y.
In certain embodiments, device (6) comprises for the variation of this copy number is categorized into the assembly of the classification being selected from lower group, and this group is made up of the following: complete chromosome inserts or multiplication, complete chromosome disappearance, chromosome dyad copy and chromosome dyad disappearance and chimera.
In certain embodiments, this extra medical analysis equipment comprises further:
I () device (8), for determining that copy number variation is caused by part aneuploidy or chimera; And
(ii) device (9), if caused by part aneuploidy for the variation of this copy number, then determines the locus of the part aneuploidy on this first interested chromosome.
Wherein device (8) and (9) device (6) be configured in for this first fetus fractional value and this second fetus fractional value are compared determine this first fetus fractional value and this second fetus fractional value not approximately equal time perform.Preferably, the signal output part of device (6) is connected to device (8), and the signal output part of device (8) is connected to device (9).In certain embodiments, the assembly for these first interested these sequence labels chromosomal being divided into nucleic acid data box in this first interested chromosome or matrix is comprised for the device (9) of the locus determining the part aneuploidy on this first interested chromosome; And the assembly for counting these map tags in each data box.
In certain embodiments, this extra equipment comprises a sequencing device (10) further, this sequencing device is configured to carry out checking order to the fetus in parent test sample (such as, blood, blood plasma, serum or urine samples) and maternal nucleic acids and obtain these sequence reads.Preferably, fetus and maternal nucleic acids are Cell-free DNA (cfDNA).Preferably, the signal output part of sequencing device (10) is connected with this device (1).
In certain embodiments, sequencing device (10) is configured to carry out synthetic method order-checking.Reversible dye-terminators can be used to carry out synthetic method order-checking.Or sequencing device (10) is configured to carry out connection method order-checking.Or sequencing device (10) is configured to carry out single-molecule sequencing.In certain embodiments, sequencing device (10) and this device (1)-(6) being used for the extras of classifying are arranged in place separately.Preferably, the signal output part of sequencing device (10) is connected with this device (1) by a network.
In certain embodiments, these extras for classifying comprise the device (11) obtaining this parent test sample from mother of pregnancy further.Device (11) and device (1)-(6) can be arranged in place separately.In addition, this extra equipment can further include the device (12) from this parent test sample extraction Cell-free DNA.The device (12) extracting Cell-free DNA can be arranged in same place with this sequencing device (10), and the device (11) wherein obtaining this parent test sample is arranged in a remote site.
In certain embodiments, device (2) comparison is at least about 100 ten thousand readings.
kit
In different embodiments, provide kit for implementing method as herein described.In certain embodiments, these kits comprise one or more positive internal control for the aneuploidy of aneuploidy completely and/or part.Typically, but may not, these contrasts comprise internal positive control, and these positive controls comprise the nucleotide sequence of the type for screening.Such as, can comprise for the contrast of the test determining presence or absence fetal trisomic (such as trisomy 21) in maternal sample with trisomy 21 is the DNA (such as, available from the DNA of individual with trisomy 21) of feature.In some embodiments, this contrast comprises the potpourri of the DNA available from two or more with the individual of different aneuploidy.Such as, for the test determining presence or absence 13 trisomy, 18 trisomys, trisomy 21 and X monosomy, this contrast can comprise available from respectively nourishing the combination that has the DNA sample of the pregnant woman of the fetus of one of tested trisomy.Except complete chromosomal aneuploidy, IPC can also be produced to provide positive control for test, to determine the aneuploidy of presence or absence part.
In certain embodiments, one or more nucleic acid comprising trisomy 21 (T21) and/or 18 trisomys (T18) and/or 13 trisomys (T13) should be comprised by (these) positive control.In certain embodiments, comprising existing each trisomy is all that the nucleic acid of T21 is provided in container separately.In certain embodiments, the nucleic acid comprising two or more trisomys is provided in single container.Therefore, such as, in certain embodiments, container can comprise T21 and T18, T21 and T13, T18 and T13.In certain embodiments, container can contain T18, T21 and T13.In the embodiment that these are different, trisomy can equal amount/concentration provide.In other embodiments, trisomy can specifically provide by estimated rate.In different embodiments, " deposit " solution that contrast can be used as concentration known provides.
In certain embodiments, the contrast for detecting aneuploidy comprises the potpourri of the cell genomic dna available from two experimenters, and a people is the genomic contributor of this aneuploid.Such as, as described above, produce in contrast for determine the internal positive control (IPC) of the test of fetal trisomic (such as trisomy 21) can comprise from the sex experimenter carrying this trisomy chromosome genomic DNA with from known combination of not carrying the genomic DNA of the female subjects of this trisomy chromosome.In certain embodiments, this genomic DNA is sheared to provide the fragment about between 100-400bp, about between 150-350bp or about between 200-300bp to simulate the circulation cfDNA fragment in maternal sample.
In certain embodiments, in this contrast from carry aneuploidy (such as trisomy 21) experimenter fragmentation DNA ratio through selection with the ratio simulating the circulation fetus cfDNA found in maternal sample, to provide the IPC of the potpourri comprising fragmentation DNA, this potpourri comprises about 5%, about 10%, about 15%, about 20%, about 25%, about 30% is from the DNA of experimenter carrying this aneuploidy.In certain embodiments, this contrast comprises the DNA from the different experimenters respectively carrying different aneuploidy.Such as, IPC can comprise about 80% unaffected women DNA, and all the other 20% can be DNA from three that respectively carry trisomy chromosome 21, trisomy chromosome 13 and trisomy chromosome 18 different experimenters.
In certain embodiments, should (these) contrast comprise available from the known cfDNA nourishing the parent of the fetus with known chromosomal aneuploidy.Such as, these contrasts can comprise the cfDNA available from the pregnant woman nourishing the fetus with trisomy 21 and/or 18 trisomys and/or 13 trisomys.This cfDNA can extract from maternal sample, and to be cloned in bacteria carrier and to grow to provide continual IPC to originate in bacterium.As an alternative, can be increased through the cfDNA of clone by such as PCR.
Although contrast existing in kit states relative to trisomy above, it is without the need to being so limited.Should be appreciated that, positive control existing in kit can be produced to embody the aneuploidy of other parts, comprise such as different section amplifications and/or disappearance.Therefore, such as, in known different cancer to the specific amplification of chromosome arm complete in fact or when lacking relevant, any one or more galianconism or long-armed can should to be comprised in chromosome 1-22, X and Y by (these) positive control.In certain embodiments, this contrast comprises the amplification of one or more arm being selected from lower group, and this group is made up of the following: 1q, 3q, 4p, 4q, 5p, 5q, 6p, 6q, 7p, 7q, 8p, 8q, 9p, 9q, 10p, 10q, 12p, 12q, 13q, 14q, 16p, 17p, 17q, 18p, 18q, 19p, 19q, 20p, 20q, 21q and/or 22q (see such as table 2).
In certain embodiments, these contrasts comprise for known and specific amplification or the aneuploidy (breast cancer of being such as correlated with the amplification at 20Q13 place) lack relevant any region.Illustrative area includes but not limited to 17q23 (being correlated with from breast cancer), 19q12 (being correlated with oophoroma), 1q21-1q23 (being correlated with sarcoma and different solid tumors), 8p11-p12 (being correlated with breast cancer), ErbB2 amplicon etc.In certain embodiments, these contrasts comprise amplification or the disappearance of the chromosomal region as shown in any one in table 3-6.In certain embodiments, these contrasts comprise amplification or the disappearance of the chromosomal region of the gene comprised as shown in any one in table 3-6.In certain embodiments, these contrasts comprise and comprise multiple nucleotide sequence, and these nucleotide sequences comprise the amplification of the nucleic acid comprising one or more oncogene.In certain embodiments, these contrasts comprise multiple nucleotide sequence, these nucleotide sequences comprise and comprise the amplification that one or more is selected from the nucleic acid of the gene of lower group, consisting of of this group: MYC, ERBB2 (EFGR), CCND1 (cyclin D1), FGFR1, FGFR2, HRAS, KRAS, MYB, MDM2, CCNE, KRAS, MET, ERBB1, CDK4, MYCB, ERBB2, AKT2, MDM2 and CDK4.
Above-mentioned contrast is intended to be illustrative and not restrictive.Use the content taught provided in this article, those of ordinary skill in the art can identify other contrasts many being applicable to being attached in kit.
In different embodiments, what contrast except these contrasts or as these substitutes, and these kits comprise one or more provides applicable tracking and nucleic acid and/or the nucleic acid mimics of determining the label sequence of sample integrity.In certain embodiments, these labels comprise antigene strand sequence.In certain embodiments, the length of these label sequences at about 30bp to up to about in 600bp length or about 100bp to about 400bp length range.In certain embodiments, should the length of (these) label sequence be at least 30bp (or nt).In certain embodiments, this label is connected to aptamer, and the length of the marker molecules of this aptamer connection is between about 200bp (or nt) and about 600bp (or nt), between about 250bp (or nt) and 550bp (or nt), between about 300bp (or nt) and 500bp (or nt) or about between 350 and 450.In certain embodiments, the length of the marker molecules of this aptamer connection is about 200bp (or nt).In certain embodiments, the length of marker molecules can be about 150bp (or nt), about 160bp (or nt), 170bp (or nt), about 180bp (or nt), about 190bp (or nt) or about 200bp (or nt).In certain embodiments, the length of label is in about 600bp (or nt) scope.
In certain embodiments, this kit provides at least two or at least three or at least four or at least five or at least six or at least seven or at least eight or at least nine or at least ten or at least 11 or at least 12 or at least 13 or at least 14 or at least 15 or at least 16 or at least 17 or at least 18 or at least 19 or at least 20 or at least 25 or at least 30 or at least 35 or at least 40 or at least 50 different sequences.There is provided the different nucleic acid of this (these) label sequence and/or nucleic acid mimics can be stored in container/bottle separately.Alternately, different marker molecules can be kept in identical container/bottle.
In different embodiments, these labels comprise one or more DNA, or these labels comprise one or more DNA analogs.Applicable analogies include but not limited to morpholinyl-derivatives, peptide nucleic acid (PNA) and phosphorothioate DNA.In different embodiments, these labels are attached in these contrasts.In certain embodiments, these labels are attached in aptamer and/or provide be connected to aptamer.
In certain embodiments, this kit comprises one or more order-checking aptamer further.These aptamers include but not limited to the order-checking aptamer of indexing.In certain embodiments, these aptamers comprise sub-thread arm, and this sub-thread arm comprises an index sequence and one or more PCR priming site.
In certain embodiments, this kit comprises a sample collection device further for collection of biological sample.In certain embodiments, this sample collection device comprises one for collecting the device and optionally of blood, and one for holding the container of blood.In certain embodiments, this kit comprises one for holding the container of blood, and this container comprises anticoagulant and/or cell fixative and/or one or more antigene strand label sequences.
In certain embodiments, this kit comprises DNA extraction reagent (such as isolation medium and/or elution solution) further.This kit can also comprise for preparing the reagent checked order to library.These reagent include but not limited to the solution for end DNA plerosis and/or the solution for dA tail DNA and/or connect the solution of DNA for aptamer.
In certain embodiments, this kit comprises a kind of composition comprising one or more primer set further, this or these primer set is used for increasing to the previously selected polymorphic nucleic acid of at least one in maternal sample, wherein each previously selected polymorphic nucleic acid comprises at least one polymorphic site, and the primer forward or backwards wherein in each primer set enough hybridizes to be included in by carrying out the previously selected polymorphic nucleic acid through increasing in sequence reads that described extensive parallel order-checking produces close to the DNA sequence dna of described polymorphic site with one.Carrying out order-checking to the previously selected polymorphic sequence through amplification can as described in other place of the application, for determining the fetus mark in maternal sample.Previously selected polymorphic nucleic acid can comprise SNP or STR.In certain embodiments, at least one primer in primer set described in each is designed to be identified in the polymorphic site existed in the sequence reads of about 25bp, about 40bp, about 50bp or about 100bp.In certain embodiments, primer set and described DNA sequence dna are hybridized, and produce at least about 100bp, at least about 150bp or the amplicon at least about 200bp.Primer set can be hybridized with the DNA sequence dna existed on phase homologous chromosomes, or primer set can be hybridized with the DNA sequence dna existed on coloured differently body.In certain embodiments, primer set not with chromosome 13,18,21, DNA sequence dna that X or Y exists hybridizes.
For implementing these methods and the embodiment of the kit combinationally using with multiple device as described herein and provide is illustrated in Figure 67 and 68.In one embodiment, kit is for determining that fetus mark provides.As shown in figure 67, kit comprises a kit main body (1), is arranged in kit main body for the clamping slot of bottle rack, the bottle (2) comprising internal positive control; Comprise the bottle (3) being suitable for the label nucleic acid following the trail of and determine sample integrity and the bottle (4) comprising buffer solution.
Kit can comprise multiple extra bottle, and each in wherein said multiple bottle comprises different internal positive controls or different label nucleic acid.
In certain embodiments, bottle (2) comprises two or more internal positive controls.This internal positive control comprises the trisomy being selected from lower group, and this group is made up of the following: trisomy 21, trisomy 18, trisomy 21, trisomy 13, trisomy 16, trisomy 13, trisomy 9, trisomy 8, trisomy 22, XXX, XXY and XYY.In certain embodiments, internal positive control comprises the trisomy being selected from lower group, and this group is made up of the following: trisomy 21 (T21), trisomy 18 (T18) and trisomy 13 (T13).In other embodiments, the internal positive control be loaded in bottle (2) comprises trisomy 21 (T21), trisomy 18 (T18) and trisomy 13 (T13).Alternately, included in kit positive control can comprise amplification or the disappearance of the one or more part in chromosome 1 to 22, X and Y.In certain embodiments, positive control comprises galianconism any one or more in chromosome 1 to 22, X and Y or a long-armed amplification or disappearance.In certain embodiments, bottle (2) comprises amplification or the disappearance of the one or more arms being selected from lower group, and this group is made up of the following: 1q, 3q, 4p, 4q, 5p, 5q, 6p, 6q, 7p, 7q, 8p, 8q, 9p, 9q, 10p, 10q, 12p, 12q, 13q, 14q, 16p, 17p, 17q, 18p, 18q, 19p, 19q, 20p, 20q, 21q and 22q.In other embodiments, bottle (2) comprises the amplification in the region being selected from lower group, and this group is made up of the following: 20Q13,19q12,1q21-1q23,8p11-p12 and ErbB2.Alternately, the positive control be loaded in bottle (2) is included in the amplification of a region or the gene shown in table 3, table 4, table 5 and table 6.In certain embodiments, the positive control be loaded in bottle (2) comprises and is selected from a region of lower group or the amplification of a gene, and this group is made up of the following: MYC, ERBB2 (EFGR), CCND1 (cycle element D1), FGFR1, FGFR2, HRAS, KRAS, MYB, MDM2, CCNE, KRAS, MET, ERBB1, CDK4, MYCB, ERBB2, AKT2, MDM2 and CDK4.
Label nucleic acid (having another name called marker molecules (MM)) included in multiple embodiments of kit is antigene strand label sequence.The length of these label sequences can in the length range from about 30bp to about 600bp.In other embodiments, the length of these label sequences is in the length range from about 100bp to about 400bp.In certain embodiments, this kit comprises at least 2, or at least 3, or at least 4, or at least 5, or at least 6, or at least 7, or at least 8, or at least 9, or at least 10, or at least 11, or at least 12, or at least 13, or at least 14, or at least 15, or at least 16, or at least 17, or at least 18, or at least 19, or at least 20, or at least 25, or at least 30, or at least 35, or at least 40, or at least 50 bottles for different label sequences.
In certain embodiments, included in kit label comprises one or more DNA.In other embodiments, label comprises one or more analogies being selected from lower group, and this group is made up of the following: morpholino derivant, peptide nucleic acid (PNA) and phosphorothioate DNA.
In certain embodiments, label is attached in described contrast.In other embodiments, label is attached in aptamer.In certain embodiments, the bottle (3) of kit can load one or more order-checking aptamer further.Aptamer comprises the order-checking aptamer of indexing.These aptamers may further include sub-thread arm, and this sub-thread arm comprises an index sequence and one or more PCR priming site.
Figure 68 shows the sketch of kit, and this kit may further include a sample collection device for collection of biological sample.This sample collection device comprises one for collecting the device (5) of blood and one for holding the container (6) of blood.In certain embodiments, this is used for collecting the device of blood and the described container for holding blood comprises anticoagulant and cell fixative.
In certain embodiments, kit may further include bottle (7), and this bottle (7) is loaded with DNA and extracts reagent.Should can comprise a kind of isolation medium and/or a kind of elute soln by (these) DNA extraction reagent.
In certain embodiments, this kit comprises bottle (8) further, and this bottle (8) is loaded with the reagent for the preparation of sequencing library.These reagent for the preparation of sequencing library can comprise for the solution of end DNA plerosis, for carrying out the solution of dA tailing and the solution for carrying out aptamer connection to DNA to DNA.
In other embodiments, this kit comprises bottle (9) further, and this bottle (9) comprises the composition for the primer increased to intended target nucleic acid.
In certain embodiments, this kit comprise further teach use described reagent to determine the guiding material of the fetus mark in biological sample.These guiding materials teach and use these materials to detect trisomy or monosomy.In certain embodiments, these guiding materials teach and use these materials to detect the liability of cancer or cancer.
In addition, these kits optionally comprise mark and/or guiding material, provide guidance (such as scheme) for using the reagent that provides in this kit and/or device.Such as, these guiding materials can be taught and use these reagent to prepare sample and/or to determine the copy number variation in biological sample.In certain embodiments, these guiding materials are taught and are used these materials to detect trisomy.In certain embodiments, these guiding materials are taught and are used these materials to detect the liability of cancer or cancer.
Although the guiding material in different kits typically comprises material that is hand-written or printing, they are not limited thereto.Contain herein and can store these instructions and by any media of they and end user UNICOM.These media include but not limited to electronic storage medium (such as magnetic disc, tape, pick-up head, chip), optical media (such as CD ROM) etc.These media can comprise the address arriving and provide the internet site of these guiding materials.
Describe diverse ways, device, system and purposes in further detail in the following example, these examples are never intended to the scope of the invention required by restriction.The ingredient being considered this instructions and the present invention's explanation wished by accompanying drawing.There is provided following instance with the present invention illustrated instead of required by restriction.
experiment
example 1
sample preparation and cfDNA extract
From being in gravidic first trimenon or second trimenon and collecting peripheral blood sample in the pregnant woman's body being considered to exist fetus aneuploidy risk.Letter of consent is obtained from each participant before blood drawing.Blood is collected before amniocentesis or chorionic villi sampling.Chorionic villi or amniocentesis sample is used to carry out karyotyping to determine fetal karyotype.
The peripheral blood extracted from each experimenter is collected in ACD pipe.One pipe blood sample (about 6 to 9 milliliters/pipe) is transferred in 15 milliliters of low-speed centrifugal pipes.Use Beckman Allegra 6R hydro-extractor and GA 3.8 type rotor, 2640rpm, at 4 DEG C by centrifugal blood 10 minutes.
Cell-free plasma is extracted, top plasma layer is transferred in 15 milliliters of high speed centrifugation pipes, and use Beckman Ku Erte Avanti J-E hydro-extractor and JA-14 rotor, 16000 × g, at 4 DEG C centrifugal 10 minutes.After blood is collected, in 72 hours, carry out two centrifugation step.At the cell-free plasma comprising cfDNA is stored in-80 DEG C, and increases at blood plasma cfDNA or only thaw once before cfDNA purifying.
Use QIAamp blood DNA Mini Kit (Kai Jie) (QIAamp Blood DNA Mini kit (Qiagen)), substantially from cell-free plasma, extract purified Cell-free DNA (cfDNA) according to manufacturer specification.One milliliter of buffer A L and 100 μ l protein enzyme solutions are added in 1ml blood plasma.At 56 DEG C, this potpourri is hatched 15 minutes.One milliliter of 100% ethanol is added in blood plasma digestive juice.Gained potpourri is transferred in the QIAamp micro-column combined with VacValve and VacConnector that provide in QIAvac 24Plus column combination part (Kai Jie) (QIAvac 24Plus column assembly (Qiagen)).Apply vacuum to sample, and with 750 μ l buffer A Wl, the cfDNA be trapped on post filtrator is washed under vacuo, then carry out second time washing with 750 μ l buffer A W24.Under 14,000RPM by centrifugal for this post 5 minutes to remove any remaining buffer from filtrator.By buffer A E elution cfDNA centrifugal under 14,000RPM, and use Qubit tMquantize platform (Qubit tMquantitation Platform) (hero (Invitrogen)) determine concentration.
example 2
initial and through the preparation of the sequencing library of enrichment and order-checking
a. sequencing library-shortening stipulations (ABB) are prepared
All sequencing libraries, namely initial and through the library of enrichment, all prepared by the purified cfDNA of the about 2ng extracted from Maternal plasma.Use reagent N EBNext tMdNA sample prepares DNA reagent collection 1 (NEBNext tMdNA Sample Prep DNA Reagent Set 1) (Item Number E6000L; Knob Great Britain biology laboratory (New England Biolabs), Ipswich, Massachusetts) following carry out library preparation.Because cell-free plasma DNA is actually into fragment, therefore no longer this plasma dna sample is made to become fragment by spray-on process or sonication.According to end reparation module ( end Repair Module), by by cfDNA and NEBNext tM5 μ l 10 × Phosphorylation Buffer, 2 μ l deoxynucleotide solution mixtures (the every dNTP of 10mM), 1 μ l 1: 5DNA polymerase I dilution, 1 μ l T4DNA polymerase and 1 μ l T4 polynucleotide kinase that prepared by DNA sample provide in DNA reagent collection 1 hatch 15 minutes together in 1.5ml microcentrifugal tube at 20 DEG C, and the jag of the purified cfDNA fragment of about 2ng contained in 40 μ l is changed into the blunt end through phosphorylation.Then hot deactivation was carried out to this enzyme in 5 minutes by being hatched by this reaction mixture at 75 DEG C.This potpourri is cooled to 4 DEG C, and uses 10 μ l to comprise the main mixed liquor (NEBNext of dA tailing of Klenow fragment (3 ' to 5 ' exo minus) tMdNA sample prepares DNA reagent collection 1) and at 37 DEG C, hatch 15 minutes to realize the dA tailing of blunt end DNA.Subsequently, hot deactivation was carried out to Klenow fragment in 5 minutes by being hatched by this reaction mixture at 75 DEG C.After Klenow fragment deactivation, use NEBNext tMdNA sample prepares the 4 μ l T4DNA ligases provided in DNA reagent collection 1, uses 1 μ l Yi Lu meter Na genome aptamer oligomeric mixture (Illumina Genomic Adaptor Oligo Mix) (Item Number 1000521 by being hatched by reaction mixture 15 minutes at 25 DEG C; Illumina Inc., Hayward, California) 1: 5 dilution by the DNA of Yi Lu meter Na aptamer (non-index Y aptamer (Non-Index Y-Adaptors)) to band dA tail.This potpourri is cooled to 4 DEG C, and uses An Jinkete (Agencourt) AMPure XP PCR purification system (Item Number A63881; Beckman Ku Erte genome, Dan Fusi, Massachusetts) in the magnetic bead that provides, be purified into the cfDNA connected through aptamer in the aptamer never connected, aptamer dimer and other reagent.Use the main mixed liquor of high-fidelity (25 μ l; Fragrant appearance closes (Finnzymes), Wo Ben, Massachusetts) carry out 18 PCR circulations with Yi Lu meter Na PCR primer (each 0.5tM) (Item Number 1000537 and 1000537) that compensate aptamer so that the optionally cfDNA (25 μ l) that is connected of enrichment aptamer.Use Yi Lu meter Na Genomic PCR primer (Item Number 100537 and 1000538) and NEBNext tMdNA sample prepares the main mixed liquor of Phusion HF PCR provided in DNA reagent collection 1, according to manufacturer specification to the DNA that aptamer connects carry out PCR (98 DEG C, 30 seconds; 98 DEG C, 10 seconds, 18 circulations; 65 DEG C, 30 seconds; And 72 DEG C, 30 seconds; Final extension 5 minutes at 72 DEG C, and remain on 4 DEG C).Use An Jinkete AMPure XP PCR purification system (Agencourt AMPure XP PCR purification system) (An Jinkete biotechnology company (Agencourt Bioscience Corporation), Billy's Buddhist, Massachusetts), carry out the product of purifying through amplification according to the manufacturer specification that can obtain at WWW.beckmangenomics.com/products/AMPureXPProtocol_000387 v001.pdf place.The purified amplification product of elution in the triumphant outstanding EB damping fluid of 40 μ l (Qiagen EB BufferQiagen EB Buffer), and use for 2100 bioanalysis devices (Agilent technology company (Agilent technologies Inc.), Santa Clara, California) Agilent DNA 1000 kit come concentration and the size distribution in analysing amplified library.
b. sequencing library-total length stipulations are prepared
Total length stipulations described herein are the Standards Code that Yi Lu meter Na provides substantially, and only different from Yi Lu meter Na stipulations in the purifying in amplification library.Yi Lu meter Na stipulations indicate, and use gel purified amplification library, and stipulations as herein described use magnetic bead to carry out identical purification step.Use for nEBNext tMdNA sample prepares DNA reagent collection 1 (Item Number E6000L; Knob Great Britain biology laboratory, Ipswich, Massachusetts), substantially according to manufacturer specification, use the purified cfDNA extracted from Maternal plasma of about 2ng to prepare initial sequencing library.Carry out except final purifying (this step uses An Jinkete magnetic bead and reagent instead of purification column to carry out) except connecting product to aptamer, institute is in steps all according to genome dna library sample preparation NEBNext tMstipulations appended by reagent are carried out, and this DNA library uses gAII checks order.NEBNext tMstipulations follow the stipulations that Yi Lu meter Na provides substantially, and Yi Lu meter Na stipulations can obtain at grcfjhml.edu/hts/protocols/11257047_ChIP_Sample_Prep.pdf place.
According to end repairs module, by by 40 μ l cfDNA and NEBNext tM5 μ l 10 × Phosphorylation Buffer, 2 μ l deoxynucleotide solution mixtures (the every dNTP of 10mM), 1 μ l 1: 5DNA polymerase I dilution, 1 μ l T4DNA polymerase and 1 μ l T4 polynucleotide kinase that prepared by DNA sample provide in DNA reagent collection 1 hatch 30 minutes together in 200 μ l microcentrifugal tubes in recirculation heater at 20 DEG C, and the jag of the purified cfDNA fragment of about 2ng contained in 40 μ l is changed into the blunt end through phosphorylation.Sample is cooled to 4 DEG C, and uses the QIAQuick post provided in QIAQuick PCR purification kit (Kai Jie company, Valencia, California) to carry out purifying as follows.By 50 μ l reactant transfer in 1.5ml microcentrifugal tube, and add the triumphant outstanding damping fluid PB of 250 μ l.Gained 300 μ l is transferred in QIAquick post, in microcentrifuge under 13,000RPM by its centrifugal 1 minute.With the triumphant outstanding damping fluid PE of 750 μ l, this post is washed, and centrifugal again.Within centrifugal 5 minutes, residual ethanol is removed by additional under 13,000RPM.In the triumphant outstanding damping fluid EB of 39 μ l by centrifugal come elution DNA.16 μ l are used to comprise the main mixed liquor (NEBNext of dA tailing of Klenow fragment (3 ' to 5 ' exo minus) tMdNA sample prepares DNA reagent collection 1) and according to manufacturer dA tailing module, hatches 30 minutes to realize the dA tailing of 34 μ l blunt end DNA at 37 DEG C.Sample is cooled to 4 DEG C, and uses the post provided in MinElute PCR purification kit (Kai Jie company, Valencia, California) to carry out purifying as follows.By 50 μ l reactant transfer in 1.5ml microcentrifugal tube, and add the triumphant outstanding damping fluid PB of 250 μ l.300 μ l are transferred in MinElute post, in microcentrifuge under 13,000RPM by its centrifugal 1 minute.With the triumphant outstanding damping fluid PE of 750 μ l, this post is washed, and centrifugal again.By within centrifugal 5 minutes, removing residual ethanol again under 13,000RPM.By centrifugal elution DNA in the triumphant outstanding damping fluid EB of 15 μ l.According to ten microlitre DNA eluants are hatched 15 minutes by quick connection module together with 1 μ l 1: 5 Yi Lu meter Na genome aptamer oligomeric mixture dilution (Item Number 1000521), 15 μ l 2X quick coupled reaction damping fluids and the quick T4DNA ligase of 4 μ l at 25 DEG C.Sample is cooled to 4 DEG C, and uses MinElute post to carry out purifying as follows.The triumphant outstanding damping fluid PE of 150 microlitre is added in 30 μ l reactants, and whole volume is transferred in MinElute post, in microcentrifuge under 13,000RPM by its centrifugal 1 minute.With the triumphant outstanding damping fluid PE of 750 μ l, this post is washed, and centrifugal again.By within centrifugal 5 minutes, removing residual ethanol again under 13,000RPM.By centrifugal elution DNA in the triumphant outstanding damping fluid EB of 28 μ l.Use Yi Lu meter Na Genomic PCR primer (Item Number 100537 and 1000538) and NEBNext tMdNA sample prepares the main mixed liquor of Phusion HF PCR provided in DNA reagent collection 1, according to manufacturer specification to 23 microlitres through the DNA eluant that aptamer connects carry out 18 PCR circulation (98 DEG C, 30 seconds; 98 DEG C, 10 seconds, 18 circulations; 65 DEG C, 30 seconds; And 72 DEG C, 30 seconds; Final extension 5 minutes at 72 DEG C, and remain on 4 DEG C).Use An Jinkete AMPure XP PCR purification system (An Jinkete biotechnology company, Billy's Buddhist, Massachusetts), carry out purifying amplification product according to the manufacturer specification that can obtain at www.beckmangenomics.com/products/AMPureXPProtocol_000387 v001.pdf place.An Jinkete AMPure XP PCR purification system will remove unconjugated dNTP, primer, primer dipolymer, salt and other pollutants, and reclaim the amplicon being greater than 100bp.From elution amplification product An Jinkete bead in the triumphant outstanding EB damping fluid of 40 μ l, and use for 2100 bioanalysis device (Agilent technology companys, Santa Clara, California) Agilent DNA 1000 kit analyze the size distribution in library.
c. the sequencing library prepared according to shortening (a) and total length (b) stipulations is analyzed
The electrophoretogram produced by bioanalysis device is in shown in Figure 21 A and 21B.Figure 21 A shows the electrophoretogram of the library DNA using the total length stipulations described in (a) to be prepared by the cfDNA come from plasma sample M24228 purifying, and Figure 21 B shows the electrophoretogram of the library DNA using the total length stipulations described in (b) to be prepared by the cfDNA come from plasma sample M24228 purifying.In both figures, peak value 1 and 4 represents mark and 1,500 upper interior marks in 15bp bottom all accordingly; The migration number of times of the bright library fragments of the digital watch above peak value; And horizontal line shows the setting threshold value of integration.Electrophoretogram in Figure 21 A shows a main peak value of a minor peaks with the fragment of 187bp and the fragment with 263bp, and the electrophoretogram in Figure 21 B only shows the peak value at a 265bp place.Carry out integration to peak area, the DNA calculating concentration obtaining 187bp peak value in Figure 21 A is 0.40ng/ μ l, and in Figure 21 A, the DNA concentration of 263bp peak value is 7.34ng/ μ l, and in Figure 21 B, the DNA concentration of 265bp peak value is 14.72ng/ μ l.The Yi Lu meter Na aptamer of the known cfDNA of being connected to is 92bp, when it being deducted from 265bp, shows that the peak value size of cfDNA is 173bp.The minor peaks at 187bp place may represent the fragment of two primers of end-to-end link.When using shortening stipulations, from the product of final library, eliminate Linear Double primer segments.Shorten stipulations and also can eliminate other the more small fragments being less than 187bp.In this example, purified aptamer connects the concentration of cfDNA is the twice that the aptamer using total length stipulations to produce connects the concentration of cfDNA.Point out, the concentration that these aptamers connect cfDNA fragment is greater than use total length stipulations winner (data are not shown) all the time.
Therefore, the advantage using shortening stipulations to prepare sequencing library is, the library obtained only is included in a main peak within the scope of 262-267bp all the time, and the quality in the library using total length stipulations to prepare can change, as the number of the peak value except except the peak value representing cfDNA and mobility embody.Non-cfDNA product will occupy the space on flow cell and reduce the quality of cluster amplification and sequencing reaction imaging subsequently, and this is the basis of the overall assignment of aneuploid state.According to the show, the order-checking that stipulations do not affect library is shortened.
Use and shorten another advantage that stipulations prepare sequencing library and be, the step cost that blunting, dA tailing and aptamer connect these three enzymes is less than one hour and can completes, thus supports checking and the enforcement of rapid aneuploidy diagnosis service.
Another advantage is, the step that blunting, dA tailing and aptamer connect these three enzymes is carried out in same reaction tube, thus avoid repeatedly sample transfer, sample transfer may cause loss of material, and the more important thing is and may cause sample mix and sample contamination.
example 3
sequencing library is prepared: the aptamer in solution connects by the cfDNA do not repaired
In order to determine whether that can will shorten stipulations further shortens to accelerate sample analysis further, makes sequencing library by the cfDNA do not repaired and uses that Yi Lu meter Na gene element analyzer II is as discussed previously to check order.
As described hereinly prepare cfDNA by peripheral blood sample.Do not carry out 5 ' phosphatic blunting and phosphorylation required by the open stipulations for Yi Lu meter Na platform, to provide the cfDNA do not repaired sample.
Can determine, omit quality or productive rate (data are not shown) that DNA reparation or DNA reparation and phosphorylation do not affect sequencing library.
for 2 footworks in the solution of the DNA do not repaired do not indexed
Concentrate at first experiment, with T4-DNA ligase, dA tailing is carried out with aptamer to the cfDNA do not repaired simultaneously be connected by combining Ke Lienuo Exo-in same reaction mixture, as follows: dA tailing (5 μ l 10X2 NEB damping fluids are carried out to the cfDNA of 30 lli between 20-150pg/ μ l, 2 μ l 10nM dNTP, 1 μ l 10nM ATP and 1 μ l 5000U/ml gram of row promise Exo-), and use 1 μ l 400, 000U/ml T4-DNA ligase, Yi Lu meter Na Y aptamer (1: 15 dilutions of 1 μ l, 3 μMs of storing solutions) is connected in the reaction volume of 50 μ l.The Y aptamer of not indexing derives from Yi Lu meter Na.The reactant of combination is hatched 30 minutes at 25 DEG C.At 75 DEG C, hot deactivation 5 minutes is carried out to enzyme, and at reactor product is stored in 10 DEG C.
The product that aptamer connects uses SPRI bead (An Jinkete AMPure XP PCR purification system, Beckman Ku Erte genomics) to carry out purifying and carries out 18 PCR circulations.Use SPRI to carry out purifying to through the library of pcr amplification, and use Yi Lu meter Na gene element analyzer IIx or HiSeq to check order according to manufacturer specification, to obtain the single-ended reading of 36bp.Obtain many 36bp readings, cover the genome of about 10%.After completing sample order-checking, Yi Lu meter Na " order-checking device control software design/real-time analysis " base is judged file transfer in a binary format connect memory storage network on to carry out data analysis.Utilization is designed for the software run in Linux server and carrys out analytical sequence data, binary format base judges to change into human readable text by this software application Yi Lu meter Na " BCLConverter ", then " Bowtie " program of increasing income is called sequence to be compared with reference to human genome, this reference human genome stems from the hg18 genome (NCBI36/hg18 that NCBI (National Center for Biotechnology Information) provides, can on the world wide web (www with http://genome.ucsc.edu/cgi-bin/hgGateway? org=Human & db=hg18 & hgsid=166260105 obtains).
This software reads the sequence data of process that above program produces and the genome exporting (bowtieout.txt file) from Bowtie comparison uniquely.Allow to have the sequence alignment of 2 base mispairings at the most, and only itself and genome uniquely comparison time be included in during comparison counts.Get rid of the sequence alignment (copy) with identical beginning and end coordinate.To there are 2 or about 500 to 2,500 ten thousand 36bD labels being less than 2 mispairing are mapped to human genome uniquely.All map tags are counted and is included in the chromosome Rapid Dose Calculation in test and qualified samples.Base 2 × 10 is extended to from base 0 6, base 10 × 10 6to base 13 × 10 6and base 23 × 10 6region to chromosome Y end is got rid of, definitely because stem from label mapping these regions to Y chromosome of sex fetus from analysis.
Figure 22 A shows and works as according to shortening stipulations (ABB; When ◇) preparing sequencing library and when according to without repairing 2 footwork (INSOL; The mean value (n=16) of the number percent (% chromosome N) of the sum of the sequence label of each human chromosomal is mapped to when) preparing sequencing library.These data show, when with when being mapped to compared with corresponding chromosomal label number percent during use shortening method, use without repair 2 footworks prepare sequencing library produce larger number percent be mapped to the chromosomal label with lower GC content and less number percent be mapped to the chromosomal label with more high GC content.Figure 22 b about sequence label number percent along with chromosome size variation, and show without restorative procedure reduce sequence offset.Available from according to shortening stipulations (ABB; Δ) and solution in without repairing stipulations (2 steps; The regression coefficient of the map tags of sequencing library) prepared is R accordingly 2=0.9332 and R 2=0.9806.
Table 8. number percent GC content/chromosome
Size (Mbp) GC(%) Size (Mbp) GC(%)
Chr1 247 41.37 Chr13 114 38.24
Chr2 243 39.44 Chr14 106 40.85
Chr3 199 38.74 Chr15 100 41.80
Chr4 191 38.60 Chr16 89 44.64
Chr5 181 39.35 Chr17 79 45.01
Chr6 171 39.94 Chr18 76 39.66
Chr7 159 39.78 Chr19 63 48.21
Chr8 146 40.30 Chr20 62 42.05
Chr9 140 40.17 Chr21 47 40.68
Chr10 135 40.43 Chr22 50 47.64
Chr11 134 41.37 ChrX 155 39.26
Chr12 132 40.59 ChrY 58 37.74
Shortening method and without repairing more also being counted as when using without being mapped to independent chromosomal label number percent during restorative procedure and the ratio that is mapped to independent chromosomal label number percent when using shortening method changes along with each chromosomal GC percentage composition of 2 footworks.Calculate (people such as Constantine Buddhist nun (Constantini), genome research (Genome Res) 16:536-541 [2006]) based on the public information of chromosome sequence and GC content subregion and provide in table 8 relative to the GC percentage composition of chromosome size.Result is provided in Figure 22 C, and the chromosomal ratio that the figure shows for having high GC content significantly reduces, and increases for the chromosomal ratio with low GC content.The clear display of these data, without the normalization effect for overcoming GC skew that restorative procedure has.
These data show, and have modified GC skew to a certain extent without restorative procedure, this GC known offsets relevant to the order-checking of DNA amplification.
In order to determine whether affect without restorative procedure the ratio that fetus contrasts the parent cfDNA checked order, determine the number number percent of the label being mapped to chromosome x and Y.Figure 23 A and 23B shows bar chart, and these figure provide and are mapped to chromosome x (Figure 23 A; % chromosome x) and Y (Figure 23 B; % chromosome Y) the average of number percent of label and standard deviation, this number percent is checked order by 10 the cfDNA samples come purifying in the blood plasma from 10 pregnant woman and obtains.Figure 23 A shows the number relative to using shortening method to obtain, when use is larger without the number of the label being mapped to X chromosome during restorative procedure.Figure 23 B show when use without be mapped to during restorative procedure Y chromosome label number percent be not from use shortening method time different.
These data show, and can not introduce for or resist any skew of checking order to fetus contrast mother body D NA without restorative procedure, namely when using without repairing method, and the constant rate of the foetal sequence checked order.
Generally speaking, these data show, and can not adversely affect the quality of sequencing library, also can not affect by the obtained information that checks order to library without restorative procedure.Get rid of open DNA reparation step needed for stipulations will reduce reagent cost and accelerate the preparation of sequencing library.
for 2 footworks in the solution of the DNA do not repaired indexed
Concentrate second experiment, carry out dA tailing to the cfDNA do not repaired, the hot deactivation then carrying out Ke Lienuo Exo-is connected with aptamer.When using the Yi Lu meter Na aptamer (it carries the sub-thread arm with 21 bases) of not indexing to connect, the hot deactivation getting rid of Ke Lienuo Exo-does not affect productive rate or the quality of sequencing library.
In order to determine whether can be applicable to multiple order-checking without restorative procedure, the Y aptamer that the self-control that use comprises the index sequence with 6 bases is indexed is to produce library by comprising or get rid of Ke Lienuore deactivation.Be different from the aptamer of not indexing, the aptamer of indexing comprises the sub-thread arm with 43 bases, and it comprises index sequence and PCR priming site.
With available from integrated DNA technique (Integrated DNA Technologies) (Ke Laerweier, Iowa) oligonucleotides be starting material, manufacture 12 kinds of different aptamers of indexing consistent with Yi Lu meter Na TruSeq aptamer.The adaptor sequence that oligonucleotide sequence is indexed available from disclosed Yi Lu meter Na TruSeq.Oligonucleotides is dissolved, obtains the annealing buffer (10mM Tris, 1mM EDTA, 50mM NaCl, pH 7.5) of 300 μMs of ultimate densities.To comprise the potpourri such as mole oligonucleotide such as grade of two cantilevers of the aptamer that any appointment is indexed, usual 10 μ l (each 300 μMs) mix, and permission annealing (95 DEG C, 6 minutes; Then controlled cooling model is slowed down to 10 DEG C from 95 DEG C).Final 150 μMs of aptamers are diluted to 7.5tM in 10mM Tris, 1mM EDTA (pH 8) and until use at being stored in-20 DEG C.
Data show, and when using the aptamer of indexing, if active Ke Lienuo Exo-and ligase are present in same reaction together with the aptamer of indexing, so carry out library by 2 footworks and preparing infeasible.But if first carry out hot deactivation 5 minutes to Ke Lienuo Exo-at 75 DEG C, then add ligase and add the aptamer of indexing, so 2 footworks are very feasible.May work as the aptamer of indexing when existing together with active Ke Lienuo Exo-, the stock displacement activity of Ke Lienuo Exo-causes the longer single-stranded DNA arm of the aptamer of indexing digested, thus eliminates PCR primer site.When not carrying out or carrying out hot inactivation step, comprised after the hot deactivation of Ke Lienuo Exo-can obtain the library (data are not shown) with expection characteristic curve (wherein main peak is at 290bp place) in Ke Lienuo Exo-reaction and display 2 footwork before adding ligase and the aptamer of indexing, use identical cfDNA and enzyme to obtain the electrophoretogram of sequencing library.Therefore, owing to being applicable to multiple order-checking without repairing method, therefore to the hot deactivation using all experiment corrections of the Y aptamer of indexing to comprise Ke Lienuo Exo-.
example 4
sequencing library is prepared: on solid surface (SS), carrying out aptamer connecting the 1 step solid surface method of DNA for not indexing by the cfDNA do not repaired
In order to determine whether can simplify further without repairing library technique, to being configured to carry out on a solid surface without repairing sequencing library preparation method described in example 3.As described in example 3, checked order in prepared library.
As described in example 1, prepare cfDNA by peripheral blood sample.With streptavidin painting polypropylene pipe, washing, and make first assembly through biotinylated aptamer of indexing be incorporated on the pipe of streptavidin coating, as follows.By at 4 DEG C by SA overnight incubation, by 8 hole PCR pipe row (U.S.'s science and technology (USA Scientific), OK a karaoke club difficult to understand, Florida) pipe on coating containing 0.5 nanomole streptavidin (the silent science and technology (Thermo Scientific) of match, Rockford, Illinois) 50 μ l PBS.With 1XTE, pipe is washed four times, each 200 μ l.7.5 picomoles, 3.75 picomoles, 1.8 picomoles and 0.9 picomole are in separately adding in duplicate in the pipe be coated with through SA through biotinylated index 1 aptamer in 50 μ l TE, and at room temperature hatch 25 minutes.Remove unconjugated aptamer and with 200 μ l TE, pipe washed four times.As described in example 3, use manufacturing through biotin labeled index 1 aptamer through the sub-oligonucleotides of biotinylated General adaptive purchased from IDT.
use the 1 step SS method from the cfDNA of non-pregnant subject
In second row PCR pipe, in No. 2 NEB damping fluids containing 20 nanomole dNTP and 10 nanomole ATP, in 50 μ l reaction volumes, control sample (NTC: without Template Controls) or 30 μ l are about 120pg/ μ l, namely about 32 to fly mole, the purified cfDNA available from non-pregnant female hatches 15 minutes together with 5 unit of gram row promise Exo-at 37 DEG C.Subsequently, by reaction mixture being hatched 5 minutes at 75 DEG C by Klenow enzyme deactivation.Ke Lienuo-DNA potpourri is transferred to comprise SA combine in the respective tube of biotinylated aptamer; and by potpourri being hatched 15 minutes together with 400 unit T4-DNA ligases in 10 μ l 1XT4-DNA ligase damping fluids at 25 DEG C, cfDNA is connected to through fixing aptamer.Subsequently, be connected to by being hatched 15 minutes together with 200 unit T4-DNA ligases without biotinylated index 1 aptamer by 7.5 picomoles in 10 μ l damping fluids at 25 DEG C the cfDNA be combined with solid phase.Remove reaction mixture, and with 200 μ l TE damping fluids, pipe is washed 5 times.Used by PCR and comprise P5 and P7 primer (IDT; Each 1tM) 50 μ l Phusion PCR potpourri [Niu Yinglun biology laboratory] cfDNA that aptamer connects is increased and is circulated as follows: [30 seconds, 98 DEG C; (10 seconds, 98 DEG C; 10 seconds, 50 DEG C; 10 seconds, 60 DEG C; 10 seconds, 72 DEG C) X 18 circulations; 5 minutes, 72 DEG C; Hatch for 10 DEG C].SPRI clean [Beckman Ku Erte genomics] is carried out to gained library product, and according to use high-sensitivity biological analyzer chip [Agilent technology, Santa Clara, California] carry out the quality analyzing obtained characteristic curve evaluation library.These characteristic curvees show, and the solid phase sequencing library preparation of the cfDNA do not repaired provides high yield and high-quality sequencing library (data are not shown).
use the 1 step SS method from the cfDNA of pregnant subject
The cfDNA sample available from pregnant woman is used to carry out testing solid surface (SS) method.
As described in example 1, prepare cfDNA by 8 peripheral blood sample available from pregnant woman, and prepare sequencing library by purified cfDNA as described above.Checked order in library, and analytical sequence information.
Figure 24 shows the number of 5 samples canonical sequence genome (hg18) separately not being got rid of site (NE site) and is mapped to these ratios not getting rid of the sum of the label in site, cfDNA be by these sample preparations and for according in the solution described in shortening stipulations (ABB) (packing) described in example 2, example 18 without repairing stipulations (2 steps; Hollow strips) and the solid surface described in this example without reparation stipulations (1 step; Grey bar) construct sequencing library.
Data display shown in Figure 24, the expression of the pcr amplification sequence prepared according to three kinds of stipulations is suitable, shows that solid surface method can not make sequence variation form skew expressed in library.
Figure 25 A shows when being mapped to each chromosomal sequence label number uniquely with suitable without the number obtained when repairing 2 footwork when using in above-mentioned solution to what obtain when checking order according to the library standby without reparation solid surface legal system.Data show, and two kinds are all reduced the GC skew of sequencing data without restorative procedure.
Relation between the number of tags that Figure 25 B display maps and the chromosomal size that label maps.Regression coefficient available from the map tags of the sequencing library prepared without reparation stipulations (1 step) according to nothing reparation stipulations (2 step) in shortening stipulations (ABB), solution and solid surface is R accordingly 2=0.9332, R 2=0.9802 and R 2=0.9807.
Figure 25 C shows available from the sequence label/chromosome of the number percent mapping of repairing sequencing library prepared by 2 step stipulations according to nothing and available from the function (◇) according to the label/chromosomal ratio shortening sequencing library prepared by stipulations (ABB) being each chromosomal number percent GC content, and available from the sequence label/chromosome mapped according to the number percent of the sequencing library prepared without reparation 1 step stipulations and available from the function () according to the label/chromosomal ratio shortening sequencing library prepared by stipulations (ABB) being each chromosomal number percent GC content.Generally speaking, the data display in Figure 25 B and 25C, 1 step and 2 footworks show similar GC homogenization effect, because the DNA both omitting library technique repairs step.
In order to determine whether affect without restorative procedure the ratio that fetus contrasts the parent cfDNA checked order, determine the number number percent of the label being mapped to chromosome x and Y.Figure 26 A and 26B display is mapped to the mean of label number percent and the comparison of standard deviation of chromosome x (Figure 26 A) and Y (Figure 26 B), and 5 cfDNA samples that these data are come available from the plasma purification to 5 pregnant woman by ABB, 2 steps and 1 footwork check order.Figure 26 A shows the number (packing) relative to using shortening method to obtain, when use is larger without the number of tags being mapped to X chromosome time restorative procedure (2 steps and 1 step).Figure 26 B shows when use is mapped to the label number percent of Y chromosome and different when using shortening method without when repairing 2 steps and 1 footwork.
These data show, and can not introduce for or resist any skew of checking order to fetus contrast mother body D NA without repairing solid surface 1 footwork, namely when using without when repairing solid surface method, and the constant rate of the foetal sequence checked order.
Generally speaking, data display produces sequencing library is on a solid surface an easy and feasible selection for checking order for sample formulation.
example 5
without the high operational throughput compatibility of repairing solid surface 1 step library preparation method
In order to determine that the nothing of being undertaken checking order by NGS technology is repaired 1 step library preparation method and whether be can be applicable to high operational throughput sample preparation, in 96 hole PCR plate of the aptamer of the indexing coating combined through SA, prepare 96 kinds of cfDNA libraries by 96 peripheral blood sample.As described in example 5, checked order in prepared library.
Carry out being coated with first PCR plate with SA as described in example 4, and be connected through biotinylated aptamer of indexing.The coating of each row hole of 96 orifice plates is comprised unique index, through biotinylated aptamer.Use second 96 hole PCR plate, when each there is 10 μ l Ke Lienuo main mixed liquor, at 37 DEG C, dA tailing is carried out 15 minutes to 37 different cfDNA in 30 μ l, then at 75 DEG C, carries out Klenow enzyme deactivation 5 minutes.In multiple hole, use several cfDNA, amount to 94 holes containing cfDNA; 2 holes are used as without Template Controls.CfDNA potpourri through dA tailing to be transferred in the first PCR plate and when existence 10 μ l quick ligase main mixed liquor 1 at 25 DEG C, use PCT-225 tetrad gradient recirculation heater (Bole (BioRad); Heracles, California) be connected to combined, through biotinylated aptamer.The 10 μ l added for the aptamer customization of respectively indexing connect main mixed liquor 2 and connect 15 minutes at 5 DEG C.Remove unconjugated DNA, and with TE damping fluid, the DNA-combined is washed five times through biotinylated aptamer complex compound.In each hole, add the main mixed liquor of 50 μ l PCR, and the DNA that aptamer connects to be increased and as described in example 4, carry out SPRI clean.Library is diluted and uses HiSens BA chip to analyze.
For using standby 61 clinical samples (Figure 27 A) of ABB legal system and using 35 study samples (Figure 27 B) prepared without reparation SS 1 footwork, obtain for the preparation of the correlativity between the amount of the purified cfDNA of sequencing library and the obtained quantity of library product.These data show, as the correlativity (R2=0.1534 that the library standby with the shortening legal system described in use-case 2 obtains; Figure 27 B) when comparing, for using without repairing for library prepared by SS 1 footwork, the significantly larger (R2=0.5826 of correlativity; Figure 27 A).Attention: this relatively in cfDNA sample not identical because clinical sample for research and development unavailable.But these results show, without repairing SS 1 footwork, always there is compared with ABB method the correlativity that larger cfDNA inputs and library exports.Subsequently, for all three kinds of methods, use the identical purified cfDNA of serial dilution amount to compare 3 kinds of methods, namely ABB, nothing repair 2 steps and the correlativity without reparation SS 1 footwork.As shown in Figure 28, best correlation (R is obtained when preparing library according to SS 1 footwork 2=0.9457; Δ), be 2 footwork (R then 2=0.7666; ) with the ABB method (R with significantly more low correlation 2=0.0386; ◇).These data show, no matter as compared to the method for end modified [DNA repairs and phosphorylation] cfDNA, without restorative procedure, be in the solution or on a solid surface, all provide consistent and predictable productive rate, no matter comprise or do not comprise the purifying of DNA and the dA tailing product of reparation.
According to the solid surface legal system described in this example for library institute's time spent than when according to shortening legal system for sequencing library time institute time spent minority times.Such as, in about 4 hours, use ABB method can manually prepare 10 to 14 samples, and when using SS 1 footwork, in 4 and 5 hours, manually can prepare 96 or 192 libraries accordingly.Further, easily can make the robotization of SS 1 footwork, to use NGS technology to prepare library when repeatedly 96 multiple order-checking.Therefore, SS method will be suitable for business automation high operational throughput sample analysis.
The solid phase sequencing library preparation analysis of DNA library being shown to the cfDNA do not repaired provides high yield and high-quality sequencing library, and these sequencing libraries can be passed through configuration and for automation process to accelerate further to need to use NGS technology to carry out the sample analysis of extensive parallel order-checking.Solid surface method is applicable to the DNA repaired.
example 6
multiple order-checking is carried out to the library standby according to 1 step SS legal system
With multiple form, each Yi Lu meter Na HySeq order-checking device flow cell swimming lane sample that six kinds of differences are indexed is checked order to the library sample prepared on 96 orifice plates by SS 1 footwork (example 20).As described in example 2, checked order in prepared library.Data shown in Figure 29 compare index efficiency, as by multiple order-checking between 2 steps (packing) and SS 1 step (hollow strips) assessed.These data show, and prepare library on a solid surface and do not damage index efficiency.Figure 30 A and 30B show when according to 1 step solid surface legal system for total number percent (the % chromosome N of sequence label being mapped to each human chromosomal during sequencing library; Figure 30 A); And Figure 30 B (R2=0.9807) display sequence label number percent is the function of chromosome size.Figure 30 A and 30B shows, and the GC skew of SS 1 footwork is identical with 2 footworks, because two kinds of techniques all use repair sample preparation zymetology without DNA.
The sequence label that Figure 31 display is mapped to Y chromosome relative to the number percent of label being mapped to X chromosome, uses Yi Lu meter Na order-checking by by reversible terminator technology synthesize 42 libraries of checking order to check order with the aptamer of indexing preparation with multiple form available to use SS 1 footwork.Data obviously distinguished available from the pregnant woman nourishing male fetus with available from the sample of pregnant woman of nourishing female child.
example 7
sample preparation and DNA extract
From being in gravidic first trimenon or second trimenon and collecting peripheral blood sample in the pregnant woman's body being considered to exist fetus aneuploidy risk.Letter of consent is obtained from each participant before blood drawing.Blood is collected before amniocentesis or chorionic villi sampling.Chorionic villi or amniocentesis sample is used to carry out karyotyping to determine fetal karyotype.
The peripheral blood extracted from each experimenter is collected in ACD pipe.One pipe blood sample (about 6 to 9 milliliters/pipe) is transferred in 15 milliliters of low speed centrifuge pipes.Use Beckman Allegra 6R hydro-extractor and GA 3.8 type rotor 2640rpm, at 4 DEG C by centrifugal blood 10 minutes.
Cell-free plasma is extracted, top plasma layer is transferred in 15 milliliters of high speed centrifugation pipes, and use Beckman Ku Erte AvantiJ-E hydro-extractor and JA-14 rotor, 16000xg, at 4 DEG C centrifugal 10 minutes.After blood is collected, in 72 hours, carry out two centrifugation step.At cell-free plasma being stored in-80 DEG C, and only thaw once before DNA extracts.
By using QIAamp DNA blood Mini Kit (Kai Jie), from cell-free plasma, extract Cell-free DNA according to manufacturer specification.Five milliliters of buffer A L and the triumphant outstanding proteinase of 500 μ l are added in the cell-free plasma of 4.5ml to 5ml.With phosphate buffered saline (PBS) by volume-adjustment to 10ml, and at 56 DEG C, potpourri is hatched 12 minutes.Use multiple post by Beckman trace hydro-extractor under 8,000RPM the centrifugal cfDNA being separated Shen Dian from solution.Wash with AW1 and AW2 damping fluid coupled columns, and with 55 μ l nuclease free water elution cfDNA.About 3.5 are extracted to 7ng cfDNA from plasma sample.
All sequencing libraries are all prepared by the purified cfDNA of the about 2ng extracted from Maternal plasma.Use reagent N EBNext tMdNA sample prepares DNA reagent collection 1 (Item Number E6000L; Knob Great Britain biology laboratory, Ipswich, Massachusetts) carry out library preparation as follows.Because cell-free plasma DNA becomes fragment in essence, therefore no longer this plasma dna sample is made to become fragment by spray-on process or sonication.By the jag of the cfDNA fragment of about 2ng purifying that comprises in 40 μ l according to NEB end Repair Module and change into the blunt end of phosphorylation, this is by cfDNA being used in NEBNext in 1.5ml microcentrifugal tube tMthe buffering agent of the phosphorylation of the 5 μ l 10X provided in DNA Sample Prep DNA Reagent Set 1,2 μ l deoxynucleotide solution mixtures (every part of dNTP has 10mM), 1 μ l 1: 5 the dilution of DNA polymerase i, 1 μ l T4 archaeal dna polymerase and 1 μ l T4 polynucleotide kinase at 20 DEG C, hatch 15 minutes carry out.Then by this reaction mixture hatched 5 minutes at 75 DEG C by hot for these enzymes deactivation.This potpourri is cooled to 4 DEG C, and use 10 μ l containing Klenow fragment (3 ' to 5 ' exo-) (NEBNext tMdNA Sample Prep DNA Reagent Set 1) the main mixed liquor of dA tailing complete the dA tailing of the DNA of blunt end, and hatch 15 minutes at 37 DEG C.Subsequently, by this reaction mixture is hatched 5 minutes at 75 DEG C by hot for these Klenow fragments deactivation.After by Klenow fragment deactivation, be used in NEBNext tMthe T4DNA ligase of the 4 μ l provided in DNA Sample Prep DNA Reagent Set 1, by this potpourri is hatched 15 minutes at 25 DEG C, by the dilution (Item Number: 1000521 of 1: 5 of the Illumina Genomic Adaptor Oligo Mix of 1 μ l; Illumina Inc., Hayward, CA) these Illumina aptamers (Non-Index Y-Adaptors) are connected on the DNA of band dA tail.This potpourri is cooled to 4 DEG C, and uses Agencourt AMPure XP PCR purification system (Item Number: A63881; Beckman Coulter Genomics, Danvers, MA) in be purified in the magnetic bead that provides aptamer, aptamer dimer and other reagent that the cfDNA that aptamer connects never is connected.The circulation carrying out 18 PCR, with the cfDNA of optionally enrichment aptamer connection, uses the PCR primer (Part No.1000537 and 1000537) of High-Fidelity Master Mix (Finnzymes, Woburn, MA) and the Illumina with aptamer complementation.Use Illumina Genomic PCR primer (Item Number 100537 and 1000538) and at NEBNext tMphusion HF PCR Master Mix (explanation according to manufacturer) provided in DNA Sample Prep DNA Reagent Set 1, the DNA that aptamer is connected stand PCR (at 98 DEG C 30 seconds; 18 circulation continuous 10 seconds at 98 DEG C, at 65 DEG C 30 seconds, and at 72 DEG C 30 seconds; At finally extending in 72 DEG C 5 minutes, and at remaining on 4 DEG C).Use Agencourt AMPure XP PCR purification system (Agencourt Bioscience Corporation, Beverly, MA) carry out purifying according to the product of explanation (can obtain at www.beckmangenomics.com/products/AMPureXPProtocol_000387 v001.pdf place) to amplification of manufacturer.By the wash-out in the Qiagen EB damping fluid of 40 μ l of the amplification product after purifying, and use 2100Bioanalyzer (Agilent technologies Inc., Santa Clara, CA) Agilent DNA 1000Kit to amplification library analytical concentration and Size Distribution.
The gene element analyzer II of Illumina is used to check order, to obtain the single-ended reading of 36bp the DNA after amplification.Belonging to a specific human chromosome to identify a sequence, only needing the random series information of about 30bp.Longer sequence can identify target more specifically uniquely.In current situations, obtain numerous 36bp reading, cover genomic about 10%.Once complete the order-checking of sample, image and base are judged that file is transferred to one and run in the Unix server of Illumina " gene element analyzer streamline (Genome Analyzer Pipeline) " software version 1.51 by Illumina " sequence control software design (Sequencer Control Software) ".Run Illumina " Gerald " program, with by sequence with reference to human genome comparison, this reference human genome is derived from the hg18 genome that NCBI (National Center for Biotechnology Information) provides (NCBI36/hg18, at website, world http://genome.ucsc.edu/cgi-bin/hgGateway? org=Human & db=hg18 & hgsid=166260105 place can obtain).With the unique comparison of this genome, the sequence data that produces from above program runs a program (c2c.pl) by the computing machine running Linnux operating system at and reads from Gerald Output rusults (export.txt file).Allow to have the sequence alignment of base mispairing and be only just included in when they only align with this genome uniquely during comparison counts.The sequence alignment (replisome) with identical initial sum termination coordinate forecloses.
By have 2 or less mispairing about 500 ten thousand to 1,500 ten thousand between 36bp label be mapped to human genome uniquely.The label of all mappings is carried out counting and within the calculating being included in test and chromosome dosage both qualified samples.From base 0 to the base 2x10 of chromosome Y 6, base 10x10 6to base 13x10 6and base 23x10 6get rid of definitely outside analysis to the region at end, because the label obtained from masculinity and femininity fetus is all mapped to these regions of Y chromosome.
Should point out, some change on the total number of sequence label is mapped to throughout the individual chromosome carrying out the sample checked order in same round (interchromosomal variability), but notices in the order-checking (variability between sequence process) of different round, to there occurs substantive larger change.
example 8
for chromosome 13,18,21, the dosage of X and Y and change
In order to check the degree of variability between interchromosomal variability and sequencing on the number of the sequence label of mapping for all chromosome, be extracted the blood plasma cfDNA that obtains from the peripheral blood of the experimenter of 48 volunteer's pregnancies and as illustrated example 7 and check order, and analyze as follows.
Determine the total number (sequence label density) being mapped to each chromosomal sequence label.Alternately, the number of the sequence label of mapping can be normalized to this chromosomal length, to produce a sequence label density ratio.Be normalized to the optional step of chromosomal length, but can the figure place of the numeral reduced in a number carried out separately thus be simplified for human interpretation.Can be used for these sequence labels being counted normalized chromosome length can be the length provided at genome.ucsc.edu/goldenPath/stats.html#hg18 place, website, the world.
The sequence label density obtained for each chromosome is associated with each remaining chromosomal sequence label density, to obtain a qualified chromosome dosage, this dosage be calculated as interested chromosome (such as chromosome 21) sequence label density with for remaining chromosome (namely chromosome 1-20,22 and X) the ratio of sequence label density.Table 9 provide for interested chromosome 13,18,21, the example of qualified chromosome dosage that calculates of X and Y, this dosage measures in a qualified samples wherein.Chromosome dosage is determined for all chromosomes in all samples, and for the interested chromosome 13 in qualified samples, 18,21, the mean dose of X and Y provides, and is illustrated in Figure 32-36 in table 10 and table 11.Figure 32 to 36 it also illustrates each interested chromosomal chromosome dosage in the chromosome dosage qualified samples of test sample and provides the one that the total number of the sequence label that (relative to each remaining chromosome) maps for each interested chromosome changes and measure.Therefore, qualified chromosome dosage can identify following chromosome or a group chromosome, namely, at the variability of sample room and the close best normalization chromosome of interested chromosomal variability, and this normalization chromosome is using as the ideal sequence be normalized the value of further statistical estimation.Figure 37 and 38 depicts for chromosome 13,18 and 21, and chromosome x and Y measure in a qualified sample group, the average chromosome dosage that calculates.
In some cases, perhaps, this best normalization chromosome does not have minimum variability, but the one distribution of qualified dosage may be had, one or more test sample is distinguished with these qualified samples by this distribution best mutually, that is: perhaps best normalization chromosome do not have minimum variability, but may have maximum resolvability.Therefore, the distribution of the change of chromosome dosage and the dosage in qualified samples is taken into account by resolvability.
Table 10 and 11 provides the coefficient of variation and measures as variability, and provide t test value as chromosome 18,21, the measuring of the resolvability of X and Y, wherein t test value is less, and resolvability is larger.The resolvability of chromosome 13 measures as chromosome dosage average in qualified samples and the ratio only testing the difference of the dosage of the chromosome 13 in sample and the mean value standard deviation of qualified dosage at T13.
When identifying aneuploidy as explained below in the test sample, qualified chromosome dosage is also as the basis measuring threshold value.
table 9. for chromosome 13,18,21, the qualified chromosome dosage (n=1 of X and Y; Sample number into spectrum 11342,46XY)
table 10. is for the qualified chromosome dosage of chromosome 21,18 and 13, change and resolvability
table 11. is for the qualified chromosome dosage of chromosome 13, X and Y, change and resolvability
T21, T13, T18 of using normalization chromosome, chromosome dosage and resolvability for interested chromosome to obtain and the diagnosis example of a Turner syndrome case are illustrated in example 9.
example 9
use normalization chromosome diagnosis fetus aneuploidy
Being applicable to make the purposes of chromosome dosage assess the aneuploidy in biological test sample, obtaining maternal blood test sample from the volunteer of pregnancy and having prepared cfDNA, and carrying out checking order and analyzing illustrated by example 1 and 2.
trisomy 21
Table 12 provides the dosage calculated for chromosome 21 in an exemplary test sample (#11403).The threshold value calculated for the positive diagnosis of T21 is set at the standard deviation place of the mean value > 2 apart from these qualified (normally) samples.The threshold value that the diagnosis of T21 sets based on the chromosome dose ratio in test sample provides greatly.Employ chromosome 14 and 15 using independent result of calculation as normalization chromosome, to show that the chromosome having minimum variability (such as chromosome 14) or have maximum resolvability (such as chromosome 15) can be used for identifying aneuploidy.Use the chromosome dosage calculated to have identified 13 T21 samples, and confirm that these aneuploidy samples are T21 by caryogram.
table 12. is for the chromosome dosage (sample #11403,47XY+21) of T21 aneuploidy
trisomy 18
Table 13 provides the dosage calculated for chromosome 18 in test sample (#11390).The threshold value calculated for the positive diagnosis of T18 is set as the standard deviation of the mean value > 2 leaving qualified (normally) sample.The threshold value that the diagnosis of T18 sets based on the chromosome dose ratio in test sample provides greatly.Use chromosome 8 as normalization chromosome.In this example, chromosome 8 has minimum variability and maximum resolvability.Use chromosome dosage to have identified 18 T18 samples, and to be turned out to be by caryogram be T18.
These data show, a normalization chromosome can have minimum variability and maximum resolvability.
table 13. is for the chromosome dosage (sample #11390,47XY+18) of T18 aneuploidy
trisomy 13
Table 14 provides the dosage calculated for chromosome 13 in test sample (#51236).The threshold value calculated for the positive diagnosis of T13 is set as leaving the standard deviation of the mean value > 2 of qualified sample.The threshold value that the diagnosis of T13 sets based on the chromosome dose ratio in test sample provides greatly.Use chromosome 5 or 3,4, the genome of 5 and 6 calculates chromosome dosage as normalization chromosome for chromosome 13.Have identified a T13 sample.
table 14. is for the chromosome dosage (sample #51236,47XY+13) of T13 aneuploidy
The sequence label density of chromosome 3 to 6 is average label countings of chromosome 3 to 6.
These data show, the combination of chromosome 3,4,5 and 6 provides a variability lower than chromosome 5, and is greater than any one maximum resolvability in other chromosomes.
Therefore, a group chromosome can be used as normalization chromosome to determine chromosome dosage and to identify aneuploidy.
turner syndrome (monosomy X)
Table 15 provides the dosage calculated for chromosome x and Y in test sample (#51238).The threshold value calculated for the positive diagnosis of Turner syndrome (monosomy X) is set to for X chromosome at mean value <-2 the standard deviation places apart from qualified (normally) sample, and pin is in there is not Y chromosome apart from qualified (normally) sample mean <-2 standard deviation from average places.
table 15. is for the chromosome dosage of Tener (XO) aneuploidy (sample #51238,45X)
The sample that the X chromosome dosage had is less than setting threshold value is identified as having and is less than an X chromosome.Same sample is confirmed as having the Y chromosome dosage being less than setting threshold value, and this shows that this sample does not have Y chromosome.Therefore, the combination of the dosage of X and Y is used to identify Turner syndrome (monosomy X) sample.
Therefore, the method provided makes it possible to determine chromosomal CNV.Specifically, the method is by carrying out extensive parallel order-checking and normalization chromosome being identified to the chromosomal aneuploidy making it possible to determine excessively representative and represent deficiency for carrying out statistical study to sequencing data to Maternal plasma cfDNA.The sensitivity of the method and reliability allow the aneuploidy of Accurate Measurement first and second trimenons.
example 10
the determination of part aneuploidy
The purposes of sequence dosage is applied to assessing the part aneuploidy by the cfDNA Biological test sample prepared from blood plasma, and checks order illustrated by example 7.Confirm that this sample obtains from an experimenter with chromosome 11 excalation by karyotyping.
For the analysis of the sequencing data of part aneuploidy (chromosome 11, the i.e. excalation of q21-q23) as carried out illustrated by for the chromosome aneuploidy in example before.In a test sample, sequence label shows of label counting for the label counting that obtains relative to the corresponding sequence for the chromosome 11 in qualified samples between chromosomal long-armed middle base-pair 81000082-103000103 and significantly loses (data are not shown) to the mapping of chromosome 11.The sequence label (810000082-103000103bp) employing the interested sequence being mapped to chromosome 11 in each qualified samples and the sequence label (namely qualified sequence label density) being mapped to all 20 megabasse fragments in the whole genome of qualified samples determine the ratio of qualified sequence dosage as the label densities in all qualified samples.Mean sequence dosage, standard deviation and the coefficient of variation for all 20 the megabasse fragment computations in whole genome, and the 20-megabase sequences with minimum variability is identified as normalization sequence (13000014-33000033bp) (see table 16) on chromosome 5, this normalization sequence is used to calculate the dosage (see table 17) for interested sequence in test sample.Table 16 provides the sequence dosage of the interested sequence (810000082-103000103bp) on chromosome 11 in the test sample, and this sequence dosage is calculated as the sequence label that is mapped to interested sequence and the ratio of sequence label being mapped to the normalization sequence identified.Figure 40 shows in 7 qualified samples (O) for the sequence dosage for corresponding sequence in the sequence dosage of interested sequence and test sample (◇).By solid line, mean value is shown, and the threshold value that the positive diagnosis of part aneuploidy is calculated shown by dashed lines, it is set at anomaly average 5 standard deviation places.The diagnosis of part aneuploidy provides based on the threshold value of the sequence dose ratio setting in test sample is little.Confirm that this test sample has disappearance q21-q23 on chromosome 11 by karyotyping.
Therefore, except identifying chromosome aneuploidy, method of the present invention can also be used to identification division aneuploidy.
table 16. is for the qualified normalization sequence of sequence C hr11:81000082-103000103, dosage and change (qualified samples n=7)
table 17. is for the sequence dosage (test sample 11206) of sequence (81000082-103000103) interested on chromosome 11
example 11
the displaying that aneuploidy detects
For to illustrate in example 2 and 3 and the sequence data that sample shown in Figure 32 to 36 obtains further is analyzed, to show that the method is successfully identifying the sensitivity in the aneuploidy in maternal sample.For chromosome 21,18,13, the normalized chromosome dosage of X and Y analyzes as a distribution (Y-axis) relative to standard deviation from average, and shown in Figure 41 A-41E.The normalization chromosome used illustrates (X-axis) as denominator.
Figure 41 (A) shows when using chromosome 14 as normalization chromosome for chromosome 21, for unaffected sample (o) and trisomy 21 sample (T21; Δ) in chromosome 21 dosage chromosome dosage relative to a distribution of standard deviation from average.Figure 41 (B) shows when using chromosome 8 as normalization chromosome for chromosome 18, for unaffected sample (o) and trisomy 18 sample (T18; Δ) in chromosome 18 dosage chromosome dosage relative to a distribution of standard deviation from average.Figure 41 (C) shows for unaffected sample (o) and trisomy 18 sample (T13; Δ) in chromosome 13 dosage chromosome dosage relative to a distribution of standard deviation from average, use 3,4,5 and 6 genomic mean sequence label densities as normalization chromosome to determine the chromosome dosage of chromosome 13.Figure 41 (D) shows when using chromosome 4 as normalization chromosome for chromosome x, for unaffected women's sample (o), unaffected male sample (Δ) and monosomy X sample (XO; +) in chromosome x dosage chromosome dosage relative to a distribution of standard deviation from average.Figure 41 (E) show a genomic mean sequence label densities as use 1 to 22 and X as normalization chromosome to determine the chromosome dosage of chromosome Y time, for the distribution of the chromosome Y dosage in unaffected male sample (o), unaffected women's sample (Δ) and monosomy X sample (+) relative to standard deviation from average.
These data show, trisomy 21, trisomy 18, trisomy 13 and unaffected (normally) sample can be known and distinguish.When the chromosome x dosage had is starkly lower than the dosage of unaffected women's sample (Figure 41 (D)), and when the chromosome Y dosage had is starkly lower than the dosage of unaffected male sample (Figure 41 (E)), monosomy X sample can easily identify.
Therefore, the method provided is sensitive and for determining that in a maternal blood sample, presence or absence chromosome aneuploidy is specific.
example 12
extensive parallel DNA sequencing is used to determine fetal chromosomal aneuploidy to the acellular foetal DNA from maternal blood: independent of the test group 1 of training group 1
This research is carried out according to the human experimenter's scientific experimentation plan got permission by the Institutional Review Board (IRB) of each mechanism in 13 U.S. clinical areas by qualified fixed point clinical research personnel between in April, 2009 and in October, 2010.Written consent book was obtained from every experimenter before participation research.This scientific experimentation plan is designed to provide blood sample and clinical data to support the development of non-invasive PGD method.18 years old or the age larger qualified participation of gravid woman.The patient pierced through for chorionic villi sampling (CVS) or the amnion of experience clinical indication collected blood before carrying out this program, and same result of collecting fetal karyotype.Extract peripheral blood sample (two pipes or altogether about 20mL) from all experimenters and be placed in acid citrate dextrose (ACD) pipe (Becton Dickinson).All samples is all removed identity and specifies an anonymous patient No. ID.Blood sample is spent the night in the temperature control type conveying containers provided for research institute and is transported to laboratory.The time spent between blood drawing and sample accept is recorded as the part that sample is ascended the throne.
Case study coordination personnel uses anonymous patient No. ID Pregnancy current with patient and history-sensitive clinical data typing to be studied in CRF (CRF).Sample from non-invasive antenatal program carried out to the CYTOGENETIC ANALYSIS OF ONE of fetal karyotype in each laboratory and result be recorded in equally in research CRF.In the clinical database in all data that CRF obtains all typing laboratory.After the venipuncture sampling of 24 to 48 hours, utilize two step centrifuge method to obtain acellular blood plasma from independent blood tube.Blood plasma from single blood tube enough carries out sequencing analysis.Cell-free DNA is extracted from cell-free plasma according to the explanation of manufacturer by using QIAamp DNABlood Mini kit (Qiagen).Because these acellular DNA fragmentations known are about 170 base-pairs (bp) (Fan et al., Clin Chem 56:1279-1286 [2010]) in length, do not require DNA cracked before order-checking.
For the sample of this training group, cfDNA is delivered to Prognosys Biosciences, Inc. (La Jolla, CA) uses standard to manufacture business test plan Illumina Genome Analyzer IIx instrument (http://www.illumina.com/) to check order for sequencing library preparation (blunting and the cfDNA be connected on common aptamer).Obtain the single-ended reading of 36 base-pairs.After completing order-checking, collect all bases and judge file and analyze.For test group sample, prepare sequencing library and check order on Illumina Genome Analyzer IIx instrument.Being prepared as follows of sequencing library is carried out.Illustrated total length scientific experimentation plan is the Standards Code that provides of Illumina mainly, and only different from Illumina scientific experimentation plan on the purifying in the library of amplification.Illumina scientific experimentation plan indicates: the library of amplification uses gel electrophoresis to carry out purifying, and scientific experimentation plan described herein uses magnetic bead to carry out identical purification step.Use the cfDNA of the about 2ng purifying extracted from Maternal plasma to prepare an elementary sequencing library, this mainly uses nEBNext tMdNA Sample Prep DNA Reagent Set 1 (Part No.E6000L; New England Biolabs, Ipswich, MA) carry out according to the explanation of manufacturer.Except using Agencourt magnetic bead to replace purification column to carry out except final purifying to the product that aptamer is connected with reagent, in steps all according to the adjoint NEBNext for the sample preparation of genome dna library of scientific experimentation plan tMreagent (uses gAII checks order) carry out.NEBNext tMnEBNext tMwhat mainly provide according to Illumina carries out, and this can obtain at grcf.jhml.edu/hts/protocols/11257047_ChIP_Sample_Prep.pd f place.
By the jag of the cfDNA fragment of about 2ng purifying that comprises in 40 μ l by cfDNA being used in NEBNext in 1.5ml microcentrifugal tube tMthe buffering agent of the phosphorylation of the 5 μ l 10X provided in DNA Sample Prep DNA Reagent Set 1,2 μ l deoxynucleotide solution mixtures (every part of dNTP has 10mM), 1 μ l 1: 5 the dilution of DNA polymerase i, 1 μ l T4DNA polymerase and 1 μ l T4 polynucleotide kinase at 20 DEG C, hatch 15 minutes, according to NEB end Repair Module and change into the blunt end of phosphorylation.This sample is cooled to 4 DEG C, and uses a quick post of QIA provided in QIAQuick PCR Purification Kit (QIAGEN Inc., Valencia, CA) to carry out purifying.50 μ l reactant liquors are transferred in 1.5ml centrifuge tube, and adds the Qiagen Buffer PB of 250 μ l.By the 300 μ l that obtain in a quick post of QIA, by its in a microcentrifuge under 13,000RPM centrifugal 1 minute.The Qiagen Buffer PE of this post with 750 μ l is washed, and centrifugal again.Remaining ethanol by removing under 13,000RPM for centrifugal 5 minutes again.By DNA in the Qiagen Buffer EB of 39 μ l by centrifugal come wash-out.Use 16 μ l containing Klenow fragment (3 ' to 5 ' exo-) (NEBNext tMdNA Sample Prep DNA Reagent Set 1) the main mixed liquor of dA tailing complete the dA tailing of the DNA of 34 μ l blunt ends, and according to the NEB of manufacturer dA-tailing module (NEB dA-Tailing Module) at 37 DEG C, hatch 30 minutes.This sample is cooled to 4 DEG C, and uses a post provided in MinElute PCR Purification Kit (QIAGEN Inc., Valencia, CA) to carry out purifying.50 μ l reactant liquors are transferred in 1.5ml microcentrifugal tube, and adds the Qiagen damping fluid PB (Qiagen Buffer PB) of 250 μ l.300 μ l are transferred in a MinElute post, by its in a microcentrifuge under 13,000RPM centrifugal 1 minute.The Qiagen damping fluid (PE Qiagen Buffer PE) of this post with 750 μ l is washed, and centrifugal again.Remaining ethanol by removing under 13,000RPM for centrifugal 5 minutes again.By DNA in the Qiagen Buffer EB of 15 μ l by centrifugal come wash-out.According to NEB rapid connecting module (NEB quick Ligation Module), the DNA eluent of ten microlitres Illumina Genomic Adapter Oligo Mix (Item Number 1000521) dilution of 1: 5 of 1 μ l, the 2X Quick Ligation Reaction Buffer of 15 μ l and the quick T4DNA ligase of 4 μ l are hatched 15 minutes at 25 DEG C.Sample is cooled to 4 DEG C, and uses a following MinElute post.150 microlitre Qiagen Buffer PE are added in 30 μ l reactant liquors, and whole volume is transferred in a MinElute post, by its in a microcentrifuge under 13,000RPM centrifugal 1 minute.The Qiagen Buffer PE of this post with 750 μ l is washed, and centrifugal again.Remaining ethanol by removing under 13,000RPM for centrifugal 5 minutes again.By DNA in the Qiagen Buffer EB of 28 μ l by centrifugal come wash-out.Use Illumina Genomic PCR primer (Item Number 100537 and 1000538) and at NEBNext tMphusion HF PCR Master Mix (explanation according to manufacturer) provided in DNA Sample Prep DNA Reagent Set 1, the DNA eluent that the aptamer of 23 microlitres is connected stands 18 PCR circulation (at 98 DEG C 30 seconds; 18 circulation continuous 10 seconds at 98 DEG C, at 65 DEG C 30 seconds, and at 72 DEG C 30 seconds; At finally extending in 72 DEG C 5 minutes, and at remaining on 4 DEG C).Use Agencourt AMPure XP PCR purification system (Agencourt Bioscience Corporation, Beverly, MA) according to the explanation (can obtain at www.beckmangenomics.com/products/AMPureXPProtocol_000387 v001.pdf place) of manufacturer, the product of amplification is carried out purifying.Agencourt AMPure XP PCR purification system eliminates unassembled dNTP, primer, primer dimer, salt and other pollutants, and has reclaimed the amplicon being greater than 100bp.By Qiagen EB damping fluid from the Agencourt bead wash-out of the product of the amplification after purifying at 40 μ l, and use 2100Bioanalyzer (Agilent technologies Inc., Santa Clara, CA) Agilent DNA 1000Kit Size Distribution is analyzed to library.For training and test sample sets, the monolateral reading of 36 base-pairs is checked order.
data analysis and sample classification
Be that the sequence reads of 36 bases compares (http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/) with the human genome assembly hg18 obtained from UCSC database by length.Be used in comparison process and allow the Bowtie short data records section comparative device (version 0.12.5) (Langmead et al., Genome Biol 10:R25 [2009]) of maximum two base mispairings to compare.The clear reading be mapped on a term single gene group position is only had just to be included.Carry out counting and being included in (see following content) in the calculating of chromosome dosage to the genomic locus that reading maps.From masculinity and femininity fetus sequence label without any differentiation the region mapped on the Y chromosome of part be excluded analysis beyond (exactly, from base 0 to base 2x10 6, base 10x10 6to base 13x10 6; And base 23x10 6to the end of Y chromosome.)
In the chromosome distribution of sequence reads, the distribution of fetus aneuploidy to mapped sequence site can be made not obvious with the order-checking change between round and round.In order to correct this change, calculate a chromosome dosage, because be normalized to for the presetting viewed counting of normalization chromosome sequence for the counting in the interested chromosomal mapping site provided.As previously described, a normalized chromosome sequence by a monosome or can be made up of a group chromosome.In a sample subset in the training group of unaffected (namely qualified) sample, first normalized chromosome sequence is identified as the diploid karyotype with interested chromosome 21,18,13 and X, consider using each autosome in the ratio with our interested chromosomal counting as potential denominator.The change of the chromosome dosage between denominator chromosome (i.e. normalized chromosome sequence) is selected as making checking order batch is minimum.Each interested chromosome is confirmed as having a significant normalization chromosome sequence (denominator) (table 10).There is no the normalization chromosome sequence that individual chromosome can be identified as chromosome 13, because neither one chromosome is confirmed as decreasing the change of the dosage of chromosome 13 in sample, that is, the extension of the NCV value of chromosome 13 is not reduced to is enough to allow correctly to identify T13 aneuploidy.Chromosome 2 to 6 is selected at random and tests as a group ability that they imitate the behavior of chromosome 13.The group of chromosome 2 to 6 is found substantially to reduce for the change on the dosage of chromosome 13 in training group sample, and is therefore selected as the normalization chromosome sequence of chromosome 13.As mentioned above, the change for the chromosome dosage of chromosome Y is greater than 30, with its independently, monosome is used as normalization chromosome sequence when determining the dosage of chromosome Y.The group of chromosome 2 to 6 is found substantially to reduce for the change on the dosage of chromosome Y in training group sample, and is therefore selected as the normalization chromosome sequence of chromosome Y.
The total number providing the sequence label of mapping for each interested chromosome in qualified samples for each interested chromosomal chromosome dosage is measured relative to of change of the total number of the sequence label of the chromosomal mapping of each residue.Therefore, qualified chromosome dosage can identify this chromosome or a group chromosome, namely have in the sample to which best close to interested chromosomal variability a variability and using as daring to the normalization chromosome sequence of ideal sequence of normalized value of further statistical estimation.
In training group (namely qualified and affected), the chromosome dosage of all samples is also as the basis when identifying the aneuploidy in test sample as described below for definite threshold.
table 18. is for determining the normalization chromosome sequence of chromosome dosage
For interested chromosome each in each sample of test group, determine a normalized value and be used to determine presence or absence aneuploidy.This normalized value is as being calculated by calculating to provide the chromosome dosage of a normalized chromosome value (NCV) further.
chromosome dosage
For test group, for each sample each interested chromosome 21,18,13, X and Y calculate a chromosome dosage.As provided in above table 18, the chromosome dosage of chromosome 21 calculates with the ratio of the number of tags be mapped in the test sample of the chromosome 9 tested in sample as the number of tags in the test sample of the chromosome 21 be mapped in test sample; The chromosome dosage of chromosome 18 calculates with the ratio of the number of tags be mapped in the test sample of the chromosome 8 tested in sample as the number of tags in the test sample of the chromosome 18 be mapped in test sample; The chromosome dosage of chromosome 13 calculates with the ratio of the number of tags be mapped in the test sample of the chromosome 2 to 6 tested in sample as the number of tags in the test sample of the chromosome 13 be mapped in test sample; The chromosome dosage of chromosome x calculates with the ratio of the number of tags be mapped in the test sample of the chromosome 6 tested in sample as the number of tags in the test sample of the chromosome x be mapped in test sample; The chromosome dosage of chromosome Y calculates with the ratio of the number of tags be mapped in the test sample of the chromosome 2 to 6 tested in sample as the number of tags in the test sample of the chromosome Y be mapped in test sample.
normalized chromosome value
To use in each test sample for each interested chromosomal chromosome dosage and the corresponding chromosome dosage determined in the qualified samples of training group, use following equation to calculate normalized chromosome value (NCV):
NCV ij = x ij - &mu; ^ j &sigma; ^ j
Wherein with the estimation training cell mean for a jth chromosome dosage and standard deviation accordingly, and for the viewed jth of a test sample i chromosome dosage.When chromosome dosage is normalized distribution, NCV is equivalent to a statistics z mark for these dosage.Do not observe in the fractile-fractile of the NCV from unaffected sample is drawn and significantly the departing from of the linearity.In addition, the standard testing of the normalizing degree of NCV is failed to veto the null hypothesis of normality.
For test group, for each sample each interested chromosome 21,18,13, X and Y calculate a NCV.In order to ensure a safe and efficient classification schemes, border conservative for aneuploidy categorizing selection.In order to classify to autosomal aneuploid state, require that chromosome classifies as affected (that is, for this chromosome for for aneuploidy) by NCV; And chromosome classifies as unaffected by NCV < 2.5.The sample of the NCV that autosome has between 2.5 and 4.0 is classified as " without judging ".
In testing, heterosomal classification is by all being undertaken by following content sequential use NCV for X and Y:
If NCV Y >-2.0 male sample standard deviation from average, then this sample is classified as the male sex (XY).
If NCV Y <-2.0 male sample standard deviation from average, and NCV Y >-2.0 women sample standard deviation from average, then this sample is classified as women (XX).
If NCV Y <-2.0 male sample standard deviation from average, and NCV Y <-3.0 women sample standard deviation from average, then this sample is classified as monosomy X, i.e. Turner syndrome.
If NCV does not meet any above standard, then this sample cup classifies as sex is " without judging ".
result
research demography
1 is registered altogether, 014 patient between in April, 2009 and in July, 2010.The mean age that the demographics of patient, invasive Program Type and results of karyotype are summarised in study population in table 19 be 35.6 years old (scope was at 17 to 47 years old) and pregnant age scope be 6 weeks 1 day to 38 weeks 1 day (average out to 15 weeks 4 days).The overall incidence of abnormal fetus karyotype is 6.8%, and wherein the T21 incidence of disease is 2.5%.In 946 experimenters with single pregnancy and caryogram, 906 (96%) presents the clinical generally acknowledged risk factors of at least one for the fetus aneuploidy of prenatal course.Even if remove those only have the experimenter of high conceived age as its unique indication, data still illustrate for the very high false positive rate of current examination mode one.By ultrasonic result of ultrasonography of carrying out be: the nuchal translucency of increase, cystic hygroma or other structural birth defect, these are abnormal karyotypes that in this age group, foresight is the strongest.
table 19. patient demographics
* comprise the result of the fetus of multifetation, * * is assessed by clinician and reports
Abbreviation: AMA=high pregnant age, NT=nuchal translucency
The distribution of the various ethnic background shown in this study population is also shown in table 19.Generally, in this research, the patient of 63% is Caucasian, and 17% is Spaniard, and 6% is Asian, and 5% is multi-national, and 4% is African Americans.Notice, the difference of race is changed significantly in different places.Such as, the three unities registers the Spain of 60% and the Caucasia experimenter of 26%, and three that are positioned at same state clinical points do not register Spain experimenter.As expected, in our not agnate result, recognizable difference is not observed.
training dataset 1
This training group research from collect between year Dec in April, 2009 to 2009, pick 71 samples 435 samples that the initial stage accumulates in succession.All experimenters in the experimenter of this First Series with affected fetus (abnormal karyotype) are included for order-checking, and have a random choose of suitable sample and data and the unaffected experimenter of random number.The Clinical symptoms of training group patient is consistent with the demographics of the holistic approach shown in table 19.The scope in pregnant age of the sample in training group is the scope from 10 weeks 0 day to 23 weeks 1 day.38 people experienced by CVS, and 32 people experienced by amniocentesis and 1 patient does not have the type (unaffected caryogram 46, XY) of the invasive program of specifying.The patient of 70% is Caucasian, and 8.5% is Spaniard, and 8.5% is Asian, and 8.5% is multi-ethnic.In order to the object of training, in this collection, eliminate six samples checked order.4 samples are from experimenter's (discussing in detail below) of gemellary pregnancy, and 1 sample has T18, contaminated in preparation process, and 1 sample has fetal karyotype 69, XXX, and remaining 65 samples are this training group.
The 13.7M (improvement due in time on sequencing technologies) of the number in unique sequence site (that is, in genome with the label that unique site identifies) from the 2.2M of the commitment of this training group research to later stage and changing.Exceed any potential change of this scope of 6 times to monitor chromosome dosage in the site of uniqueness, research beginning and at the end of run different, unaffected sample.For the round of front 15 unaffected samples, the average number in unique site is 3.8M and average chromosome dosage for chromosome 21 and chromosome 18 is 0.314 and 0.528 respectively.For the round of rear 15 unaffected samples, the average number in unique site is 10.7M and average chromosome dosage for chromosome 21 and chromosome 18 is 0.316 and 0.529 respectively.Along with the passage of time of training group research between chromosome 21 and the chromosome dosage of chromosome 18, there is no statistically difference.
Figure 42 illustrates the training group NCV for chromosome 21,18 and 13.Result shown in Figure 42 is consistent with a kind of hypothesis of normality, and this hypothesis is: the dliploid NCV of about 99% will fall into mean value ± 2.5 standard deviation.In 65 samples in this collection, the NCV scope that 8 samples with the clinical caryogram indicating T21 have is for from 6 to 20.The NCV scope that the sample that four clinical caryogram had indicate fetus T18 has is from 3.3 to 12, and the NCV that the sample that two clinical caryogram had indicate fetal trisomic 13 (T13) has is 2.6 and 4.In affected sample, the distribution of NCV is because they are to the dependence of the number percent of the fetus cfDNA in single sample.
Similar with autosome, in training group, determine heterosomal mean value and standard deviation.Heterosomal threshold value allows the masculinity and femininity fetus in 100% ground discriminating training group.
test data set 1
Establish chromosome dosage mean value and with the standard deviation from average of training group after, from the sample collected from 575 samples altogether between year June in January, 2010 to 2010, have selected a test group of 48 samples.One of them sample from gemellary pregnancy is removed from final analysis, so remaining 47 samples in test group.It is blind for making for the preparation of the sample of order-checking and the personnel of operating equipment clinical karyotype information.Pregnant age scope to similar (table 19) seen in training group.58% of invasive program is CVS, procedural demographic higher than overall, but also similar with training group.The experimenter of 50% is Caucasian, and 27% is Spaniard, and 10.4% is Asian and 6.3% is African Americans.
In test group, the number of unique sequence label is different from about 13M to 26M.For unaffected sample, for chromosome 21 and chromosome 18, chromosome dosage is respectively 0.313 and 0.527.For chromosome 21, chromosome 18 and chromosome 13, test group NCV is shown in Figure 43 and be sorted in table 20 and provide.
table 20. fractions tested class data test group grouped data
* MX is the monosomy of X chromosome, and Y chromosome does not have sign
In test group, 13/13 experimenter with the caryogram being designated as fetus T21 is correctly identified as having the NCV of scope from 5 to 14.Eight/eight experimenters with the caryogram being designated as fetus T18 are correctly identified as having the NCV of scope from 8.5 to 22.In this test group, the simple sample with the C classifying as T13 is classified as the nothing judgement that wherein NCV is approximately 3.
For test data set, all male sample are correctly identified, comprise and there is complex karyotype 46, the sample (table 11) of XY+ marker chromosome (can not be identified by cytogenetics). have 19 to be correctly validated in 20 women's samples, and women's sample is classified as without judging.For three samples that caryogram in test group is 45, X, in three, have two to be correctly validated as monosomy X, and 1 is classified as without judging (table 20).
twins
From gemellary pregnancy for having four in the sample that training group is selected at first and having one in test group.Threshold value may be subject to the puzzlement of the different values of the cfDNA expected in the environment of gemellary pregnancy as used herein.In training group, the caryogram from one of them twins sample is single chorion 47, XY+21.Second twins sample is different ovum and amniocentesis is carried out separately each fetus.In this gemellary pregnancy, fetus has the caryogram of 47, XY+21 and another has a normal caryogram 46, XX.In these two cases, be T21 based on the acellular classification of method discussed above by sample group.Other two gemellary pregnancies in training group are correctly classified as T21 unaffected (all twins all show the diploid karyotype for chromosome 21).For the gemellary pregnancy in test group, only caryogram (46, XX) is established to twins B, and this algorithm correctly to be classified as T21 be unaffected.
conclusion
These data show that extensive parallel sequencing can be used to from the blood of pregnant woman, measure multiple abnormal fetal karyotype.These data show, independently test group data can be used to identify to 100% of the sample with trisomy 21 and trisomy 18 correct classification.Even when having the fetus of abnormality karyotype, neither one sample utilizes the algorithm of the method to be sorted out mistakenly.Importantly, this algorithm is being determined to show equally in presence or absence T21 in the group of two gemellary pregnancies well equally.In addition, this research checked the many continuous print samples from multiple center, not only represent the scope of the abnormal karyotype that people may see in commercial clinical environment, also illustrate the importance gestation do not affected by common trisomy accurately sorted out, to emphasize that the height existed in current Prenatal Screening is to unacceptable false positive rate.These data provide valuable opinion for utilizing the great potential of the method in future.The increase that analysis shows in the consistent Poisson counting statistics value of variance of the subset of unique gene loci.
These data are set up on the basis of the discovery of Fan and Quake, Fan with Quake confirms: use extensive parallel order-checking to determine that the sensitivity of fetus aneuploidy is only by restriction (Fan and Quake of counting statistics from Maternal plasma without wound, PLos One 5, e10439 [2010]).Because order-checking information gathers throughout whole genome, institute in this way can determine any aneuploidy or the variation of other copy numbers, comprises and inserting and disappearance.Caryogram from one of them sample has a little disappearance in chromosome 11 between q21 and q23, when being analyzed in 500k base data box by sequencing data, observe the minimizing of the region interior label relative number about 10% of a 25Mb initial at q21 place.In addition, in training group, the property caryogram that three to have due to the mosaicism in cytogenetic minute in sample, is had.These caryogram are: i) 47, XXX [9]/45, X [6], ii) 45, X [3]/46, XY [17], and iii) 47, XXX [13]/45, X [7].Show some sample ii containing the cell of XY and correctly classified as XY.All show the sample i (from CVS process) of the potpourri of XXX and X cell and iii (from amniocentesis) by cytogenetic (consistent with chimera Turner syndrome) to be classified as respectively without judging and monosomy X.
When testing this algorithm, for the chromosome 21 of a sample (Figure 43) carrying out self-test group, another interesting data point is observed the NCV had between-5 and-6.Although this sample is dliploid by cytogenetics on chromosome 21, this caryogram illustrates triploid chimerism: 47, the XX+9 [9]/46, XX [6] with part for chromosome 9.Due to the chromosome 9 chromosome dosage (table 18) determining chromosome 21 in the denominator, it reduce total NCV value.The result provided in following instance 13 confirms the ability using normalization chromosome to determine fetal trisomic 9 in this sample.
Fan etc. are only only correct about the conclusion of the sensitivity of these methods when used algorithm can consider any random or systematic bias that sequence measurement brings.If this sequencing data is not by suitably normalization, then the analysis result of gained will be inferior to counting statistics.The people such as Chiu notice in the paper that they are recent, the measurement result of the chromosome 18 and 13 that they use extensive parallel sequence measurement to obtain is coarse, and conclusion needs to carry out more studying mensuration people such as (, BMJ 342:c7401 [2011]) Chiu the method being applied to T18 and T13.The method used in the paper of the people such as Chiu simply employs the number of interested chromosomal sequence label in their case chromosome 21, and this number has carried out normalization by the total number of the label in this order-checking round.The challenge part of this approach is: the distribution of label on each chromosome can be different to order-checking round from order-checking round, and measure because this increasing aneuploidy the entire change measured.In order to the result of Chiu algorithm and the chromosomal dosage used in this example be contrasted, the method test data of chromosome 21 and 18 being used the people such as Chiu to recommend is analyzed again, as shown in Figure 44.Generally, compression in the scope of NCV be observed for each of chromosome 21 and 18, and observed the reduction determining rate, the NCV threshold value 4.0 that wherein make use of for aneuploidy classification correctly identifies the T18 sample of the T21 and 5/8 of 10/13 from our test group.
The people such as Ehrich also only focus on T21 and use the algorithm (Ehrich et al., Am J Obstet Gynecol 204:205e1-e11 [2011]) identical with the people such as Chiu.In addition, after the test group z score measures and outer non-economic data (i.e. training group) of observing them one offsets, they have carried out retraining to establish classification boundaries to test group.Although this method is feasible in principle, in reality by challenging be determine require that how many samples carry out training and need how long once to carry out retraining to guarantee the correct of these grouped datas.A kind of method alleviating this problem comprises contrast in each order-checking round, and these are to amount of illumination baseline and calibrate for quantitative behavior.
The data using this method to obtain show, when the algorithm for chromosome counting data being normalized is optimised, extensive parallel order-checking can determine multiple fetal chromosomal abnormalities from the blood plasma of pregnant woman.Not only the Stochastic sum systematic variation between order-checking round is reduced to minimum for quantitative this method, also allowing to classify to aneuploidy throughout whole genome, is that T21 and T18. requires that larger sample collection tests the algorithm measured for T13 the most significantly.For this purpose, a clinical research that is prospective, blind, many places is being carried out to prove the diagnostic accuracy of this method further.
example 13
the chromosome aneuploidy that presence or absence at least 5 kinds is different is determined in all chromosomes of single test sample
In order to prove that this method is used for determining each group parent test sample (test group 1; Example 12) in the ability of any chromosome aneuploidy of presence or absence, in unaffected test group sample (training group 1; Example 12) in identify the normalization chromosome sequence systematically determined, and these normalization chromosome sequences are used to calculate for all chromosomal chromosome dosage of each test sample.Determine that presence or absence any one or multiple different complete fetal chromosomal aneuploidy in each test and training group sample are the order-checking information realizations that obtained by the round that checks order from the single carried out each single sample.
Use chromosome density, namely for the number of sequence label of each chromosome identification in the sample of each test group illustrated in example 12, determine by calculating a monosome dosage for each in chromosome 1-22, X and Y be made up of a monosome or a group chromosome, a normalization chromosome sequence systematically determined.Systematically to calculate as denominator by using each possible chromosomal to determine for each chromosomal chromosome dosage for each in chromosome 1-22, X and Y, the normalization chromosome sequence systematically determined.Such as, for chromosome 21 as interested chromosome, the number of the sequence label that the number of the sequence label obtained for chromosome 21 (interested chromosome) as (i) and (ii) obtain for each residue chromosome and for all possible ratio combining the number of tags sum obtained remaining chromosome (not comprising chromosome 21), calculate chromosome dosage, that is: 1,2,3,4,5 etc. until 20,21,22, X and Y; 1+2,1+3,1+4,1+5 etc. are until 1+20,1+22,1+X and 1+Y; 1+2+3,1+2+4,1+2+5 etc. are until 1+2+20,1+2+22,1+2+X and 1+2+Y; 1+3+4,1+3+5,1+3+6 etc. are until 1+3+20,1+3+22,1+3+X and 1+3+Y; 1+2+3+4,1+2+3+5,1+2+3+6 etc. are until 1+2+3+20,1+2+3+22,1+2+3+X and 1+2+3+Y; And and so on, make so all chromosome 1-20,22, all possible combination of X and Y is all used as normalization chromosome sequence (molecule) for each interested chromosome of each in these in training group qualified (aneuploidy) sample to determine all possible chromosome dosage.Chromosome dosage is determined in the same way for the chromosome 21 in all training group samples, and these normalization chromosome sequences systematically determined for chromosome 21 are determined to have the single of minimum variability or a group chromosome as causing having for 21 in a dosage throughout all training samples.Be repeated identical analysis using determine by as each residue chromosome (comprise chromosome 13,18, X and Y) the monosome of normalization chromosome sequence systematically determined of carrying out or chromosomal, that is, employ all possible genome incompatible determine in all training samples for every other interested chromosome 1-12,14-17,19-20,22, the normalization sequence (individual chromosome or a group chromosome) of X and Y.Therefore, all chromosome is all regarded as interested chromosome, and a normalization sequence systematically determined is determined for each in all chromosome in each unaffected sample in training group.Table 21 provides the normalization recognition sequence systematically determined of individual chromosome go out as to(for) each interested chromosome 1-22, X and Y or genome.As highlighted by table 21, for some interested chromosome, the normalization chromosome sequence systematically determined is confirmed as single chromosome (such as when chromosome 4 is interested chromosome), and for other interested chromosome, the normalization chromosome sequence systematically determined is confirmed as a group chromosome (such as when chromosome 21 is interested chromosome).
table 21. is for all chromosomal, normalization chromosome sequences of systematically determining
Provide in table 22 for the mean value of the determined normalization chromosome sequence systematically determined of each in all chromosome, standard deviation (SD) and the coefficient of variation (CV).
table 22. is for the mean value of the normalization chromosome sequence systematically determined, standard deviation (SD) and the coefficient of variation (CV)
Interested chromosome Mean value SD CV
1 0.36637 0.00266 0.72%
2 0.31580 0.00068 0.22%
3 0.21983 0.00055 0.18%
4 0.98191 0.02509 2.56%
5 0.30109 0.00076 0.25%
6 0.21621 0.00059 0.27%
7 0.21214 0.00044 0.21%
8 0.25562 0.00068 0.27%
9 0.12726 0.00034 0.27%
10 0.24471 0.00098 0.40%
11 0.26907 0.00098 0.36%
12 0.12358 0.00029 0.23%
13 a 0.26023 0.00122 0.47%
14 0.09286 0.00028 0.30%
15 0.21568 0.00147 0.68%
16 0.25181 0.00134 0.53%
17 0.46000 0.00248 0.54%
18 a 0.10100 0.00038 0.38%
19 1.43709 0.02899 2.02%
20 0.19967 0.00123 0.62%
21 a 0.07851 0.00053 0.67%
22 0.69613 0.01391 2.00%
X b 0.46865 0.00279 0.68%
Y b 0.00028 0.00004 14.97%
ado not comprise trisomy
bfemale child
Throughout the chromosome dosage of all training samples change (as the value by CV reflect) confirm that the normalization chromosome sequence systematically determined is for providing the purposes of a large signal to noise ratio (S/N ratio) and dynamic range, thus allow to determine aneuploidy, as shown in following content with high susceptibility and high specificity.
In order to prove that the Sensitivity and Specificity of the method determines, for the chromosome dosage for all interested chromosome 1-22, X and Y in each sample in training group of all interested chromosome 1-22, X and Y, and each of all samples in the test group illustrated in example 11 all employ corresponding, the normalization chromosome sequence systematically determined that provide in above table 21.
Use for each interested chromosomal normalization chromosome sequence systematically determined, in the sample of each training group and determine the presence or absence of any fetus aneuploidy in each test sample, that is, determine each sample whether chromosome 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22, X and Y be containing a complete fetal chromosomal aneuploidy.Sequence information is obtained for all chromosomes in the sample of each training group and in each test sample, the i.e. number of sequence label, and use the number of the sequence label obtained with those the normalization chromosome sequences (table 21) that are corresponding, that systematically determine determined in test group to calculate a monosome dosage as previously discussed for each chromosome in each training and testing sample.In each training sample, each chromosomal chromosome dosage in each training sample is used to determine for the number of the sequence label of the normalization chromosome sequence acquisition systematically determined, and in each test sample, each chromosomal chromosome dosage in each test sample is used to determine for the number of the sequence label of the normalization chromosome sequence acquisition systematically determined.In order to ensure carrying out safety and effective classification to aneuploidy, illustrated by example 12, have selected same conservative border.
training group result
Normalization chromosome sequence that use systematically determines is provided in the sample of training group for the drawing of the chromosome dosage of chromosome 21,18 and 13 in Figure 45.When using the normalization chromosome sequence systematically determined, i.e., during the group of chromosome 4+14+16+20+22, wherein 8 samples of clinical caryogram instruction T21 have the NCV between 5.4 and 21.5.When using normalization chromosome sequence (i.e. the group of chromosome 4+14+16+20+22) systematically determined, wherein 8 samples of clinical caryogram instruction T21 have the NCV between 5.4 and 21.5.When using normalization chromosome sequence (i.e. the group of chromosome 2+3+5+7) systematically determined, wherein 4 samples of clinical caryogram instruction T18 have the NCV between 3.3 and 15.3.The T21 sample of training group illustrates (O) as last 8 samples of chromosome 21 data; The T18 sample of training group illustrates (Δ) as last 4 samples of chromosome eighteen data; And the T13 sample of training group illustrates () as last 2 samples of chromosome 13 data.
These data show, normalization chromosome sequence can be used to determine different, complete fetal chromosomal aneuploidy and it correctly classified with high degree of confidence.Because all samples with affected caryogram all have the NCV being greater than 3, there is the possibility of about 0.1%, that is: these samples are the part in unaffected distribution.
Similar with autosome, when the normalization chromosome sequence systematically determined (i.e. the group of chromosome 4+8) is used to chromosome x, and when the normalization chromosome sequence systematically determined (i.e. the group of chromosome 4+6) is used to chromosome Y, all women in training group and male fetus are all correctly identified out.In addition, all 5 monosomy X samples are all identified.Figure 46 A shows the curve map of the NCV (X-axis) determined for X chromosome for each sample in training group and the NCV (Y-axis) determined for Y chromosome.The all samples being monosomy X by caryogram has the NCV value being less than-4.83.Those monosomy X samples with the caryogram consistent with 45, X caryogram (completely or chimeric) have as expected one close to zero YNCV value.For X and Y, women's sample is all gathered near NCV=0.
test group result
Provide in Figure 47 and use the relevant normalization chromosome sequence systematically determined in the test sample for the drawing of the chromosome dosage of chromosome 21,18 and 13.When using normalization chromosome sequence (i.e. the group of chromosome 4+14+16+20+22) systematically determined, wherein clinical caryogram indicates in 13 samples of T21 has 13 to be correctly validated out the NCV had between 7.2 and 16.3.When using normalization chromosome sequence when i.e. the group of chromosome 2+3+5+7 () systematically determined, wherein clinical caryogram indicates all identified NCV had between 12.7 and 30.7 of all 8 samples of T18.When using normalization chromosome sequence (i.e. the group of chromosome 2+3+5+7) systematically determined, the wherein all identified NCV had between 12.7 and 30.7 of all 8 samples of clinical caryogram instruction T18.The T21 sample of test group illustrates (O) as last 13 samples of chromosome 21 data; The T18 sample of test group illustrates (Δ) as last 8 samples of chromosome eighteen data; And the T13 sample of test group illustrates () as the last sample of chromosome 13 data.
These data show, can with high degree of confidence use systematically determine, normalization chromosome sequence to be to determine different complete fetal chromosomal aneuploidy and it correctly to be classified.Similar with training group, all samples with affected caryogram all has the NCV being greater than 7, and this shows to have a minimum possibility, that is: these samples are parts of unaffected distribution.(Figure 47).
Similar with autosome, when the normalization chromosome sequence systematically determined (i.e. the group of chromosome 4+8) is used to chromosome x, and when the normalization chromosome sequence systematically determined (i.e. the group of chromosome 4+6) is used to chromosome Y, all women in test group and male fetus are all correctly identified out.In addition, all 3 monosomy X samples are all identified.Figure 46 B shows the drawing of the NCV (X-axis) determined for X chromosome for each sample in test group and the NCV (Y-axis) determined for Y chromosome.
As described above, this method allows the chromosome aneuploidy determining a kind of complete of presence or absence each chromosome 1-22, X and Y or part in each sample.Except measuring complete chromosome aneuploidy T13, T18, T21 monosomy X, the method also measured weres the existence of trisomy 9 in a test sample wherein.When using normalization chromosome sequence (i.e. the group of chromosome 3+4+8+10+17+19+20+22) of system measurement, for interested chromosome 9, identify the sample (Figure 48) that has the NCV of 14.4.This sample corresponds to the test sample in example 12, and this test sample is under a cloud according to the low dosage of the deformity for chromosome 21 is aneuploidy (wherein employing chromosome 9 as normalization chromosome sequence in example 12) for chromosome 9.
These data show, the sample that the sample of 100% has the clinical caryogram of instruction T21, T13, T18, T9 and monosomy X is correctly identified out.Figure 49 show 47 test samples each in for the curve map of the NCV of each of chromosome 1-22.The median of NCV is normalized to zero.
These data show, method of the present invention (comprising the normalization chromosome sequence that use is systematically determined) determines in this test group the existence of the chromosome aneuploidy of all 5 types existed with the sensitivity of 100% and the specificity of 100%, and clearly point out, the method can be identified in any sample for any one any chromosome aneuploidy of chromosome 1-22, X and Y.
example 14
determine presence or absence part fetal chromosomal aneuploidy: determine cat's eye syndrome
DiGeorge syndrome (22q11.2 deletion syndrome), the illness caused by the defect in chromosome 22, causes the bad development of several body system.Usually the medical care problem be associated with DiGeorge syndrome comprises heart defect, bad function of immune system, cleft palate, parathyroid gland and behavioral disorder.The number of the problem be associated with DiGeorge syndrome and the order of severity have very large change.Almost each people with DiGeorge syndrome needs the treatment of the expert from multiple field.
In order to determine the excalation of presence or absence fetal chromosomal 22, obtain a blood sample by implementing venipuncture to mother, and cfDNA is prepared described in above example.CfDNA after purifying to be connected on aptamer and to use the Illumina cBot station (cluster station) that clusters to make it stand cluster amplification.Reversible dye-terminators is used to carry out extensive parallel order-checking, to produce millions of 36bp readings.These sequence reads and mankind hg19 reference gene group are compared, and the reading be mapped to uniquely in reference gene group is counted as label.
First the group of a qualified samples of the dliploid (i.e. chromosome 22 or its any part is known only exist with diploid condition) being all known as chromosome 22 is carried out checking order and carried out analyzing and does not obtain multiple sequence label with 1000 sections for 3 megabasses (Mb) each (not comprising region 22q11.2).If human genome comprises about 3,000,000,000 bases (3Gb), 1000 sections of 3Mb approximately constitute genomic remainder separately.Each in these 1000 sections can separately or as the group service of a sector sequence, these sector sequences be used to determine the normalization sector sequence of interested section, i.e. the 3Mb region of 22q11.2.The number being mapped to the sequence label on each single 1000bp section is used for the section dosage in the 3Mb region calculating 22q11.2 individually.In addition, all possible combination of two or more sections is used to determine the section dosage for interested section in all qualified samples.Cause having and be selected as normalization sector sequence throughout this single 3Mb section of the section dosage of the minimum variability of sample or the combination of two or more 3Mb sections.
The number of the sequence label be mapped on interested section in each qualified samples is used to determine the section dosage in each qualified samples.The mean value of the section dosage in all qualified samples and standard variance are calculated and are used for definite threshold, the section dosage determined in the test sample and these threshold values can be contrasted.Preferably, normalized section value (NSV) is calculated for all interested section in all qualified samples, and uses these values to set threshold value.
Subsequently, the number being mapped to the label of normalization sector sequence in corresponding test sample is used to determine the dosage of interested section in test sample.As described earlier a normalization section value (NSV) is calculated for the section in test sample and by the NCV of interested section in test sample and the disappearance using the threshold value determined of qualified samples to compare to determine presence or absence 22q11.2 in the test sample.
Test NCV <-3 shows that the one in interested section is lost, and namely there is the excalation of chromosome 22 (22q11.2) in the test sample.
example 15
for obtaining the faeces DNA test carried out that predicts the outcome of II stage colorectal cancer patients
In all II stage colorectal cancer patients, about 30% will recur and die from its disease suffered from.Occurred the II stage colorectal cancer patients of palindromia chromosome 4,5,15q, 17q and 18q demonstrate significantly more loss.Specifically, the loss of II stage colorectal cancer patients on 4q22.1-4q35.2 has shown to be associated with worse result.Determine that these genomes of presence or absence change and assisted Selection can carry out patient people such as (, analysis of cells pathology/cell tumour (Analytical Cellular Pathology/Cellular Oncology) 33:95-104 [2010]) Brosens of adjuvant therapy.)
In order to determine, suffering from one or more chromosome deficiencies in presence or absence 4q22.1 to 4q35.2 region in II stage colorectal cancer patients, to obtain ight soil and/or plasma sample from this or these patient.Faeces DNA is according to people such as Chen, prepared by the method that J Natl Cancer Inst 97:1124-1132 [2005] describes; And plasma dna prepares according to the method described in above example.According to NGS method described here, DNA is checked order, and the sequence information of this or these Patient Sample A is used to calculate the section dosage for the one or more sections crossing over 4q22.1 to 4q35.2 region.Section dosage uses the normalization section dosage formerly determined in ight soil qualified at one respectively and/or plasma sample group to determine.Calculate the section dosage in test sample (Patient Sample A), and one or more chromosome dyads of presence or absence disappearance determines by being compared with the threshold value set by the NSV in qualified samples group by each interested section in 4q22.1 to 4q35.2 region.
example 16
the detection of full gene group fetus aneuploidy is carried out: the accuracy of the diagnosis in prospective, blind multicenter study by carrying out order-checking to Maternal plasma DNA
For determining that the method for presence or absence aneuploidy in parent test sample is for perspective study, and the accuracy of its diagnosis illustrates as mentioned below.Perspective study proves that the inventive method is used for the effect for crossing over genomic gemini detection fetus aneuploidy further.Actual pregnant woman colony is simulated in blind research, and wherein fetal karyotype is unknown, and selects all samples with any abnormal karyotype to check order.By the determination result of the classification made according to the inventive method compared with deriving from the fetal karyotype of invasive program to determine the diagnosis capability of the method to multiple chromosomal aneuploidy.
the general introduction of this example
In perspective blind research, at 60 U.S. sites from 2,882 women carrying out pre-natal diagnosis program collect blood sample (clinicaltrials.gov NCT01122524).
Independently biostatistican selects the gestation with euploid caryogram with all single pregnancies of any abnormal karyotype and the Stochastic choice of equal number.Method according to the present invention carries out chromosome classification to each sample and compared with fetal karyotype.
In the analysis cohort of 532 samples, the case (sensitivity 100% (95%CI 95.9-100)) of 89/89 trisomy 21, case (the sensitivity 97.2% of 35/36 trisomy 18, (95%CI 85.5-99.9)), case (the sensitivity 78.6% of 11/14 trisomy 13, (95%CI 49.2-99.9)), women's (sensitivity 99.6% of 232/233, (95%CI 97.6-> 99.9)), the male sex's (sensitivity 100% of 184/184, (95%CI 98.0-100)) and the case (sensitivity 93.8% of 15/16 monosomy X, (95%CI 69.8-99.8)) be classified.In unaffected experimenter, there is not autosome aneuploidy false positive (100% specificity, (95%CI > 98.5-100)).In addition, there is the fetus of trisomy 21 (3/3), trisomy 18 (1/1) and monosomy X (2/7) chimerism, three routine translocation trisomicses, two other autosome trisomys of examples (20 and 16) and other sex chromosome aneuploidy (XXX, XXY and XYY) correctly classified.
These results prove that this method uses Maternal plasma DNA to detect the effect of the fetus aneuploidy of crossing over genomic gemini further.The high sensitivity detected for trisomy 21,18,13 and monosomy X and specificity show that this method can be combined in existing aneuploidy examination algorithm to reduce unnecessary invasive program.
material and method
Carry out MELISSA (maternal blood is the source of diagnosing fetal aneuploidy exactly) research as prospective multicenter observational study, the nido case with blind: check analysis.Enlist the antenatal program of experience invasive to determine the pregnant woman (Clinicaltrials.gov NCT01122524) of more than 18 years old and 18 years old of fetal karyotype.Qualified criterion comprises the pregnant woman of gestation between 8 weeks 0 day and 22 weeks 0 day, and it meets at least one item in following additional criteria: age >=38 years old; Positive examination test result (serum analysis value and/or nuchal translucency (NT) measured value); Exist and increase relevant ultrasonic tags thing to fetus aneuploidy risk; Or previously nourished aneuploid fetus.From agreeing to that all women participated in obtain written consent book.
The scheme ratified according to the Institutional Review Board (IRB) of each mechanism at 60 medical centre places geographically disperseed in 25 states is registered.Engage two clinical research tissues (CRO) (the elder brother Qin (Quintiles), De Han, the North Carolina states; With An Pusen (Emphusion), San Francisco, California) keep studying and be blind and clinical data management, data monitoring, biometrics and data analysis service are provided.
Before any invasive program, peripheral veins blood sample (17mL) is collected in two acid citrate dextrose (ACD) pipe (must Supreme Being), removes mark and mark by unique numbering of studying.Research numbering, data and blood drawing time are input in safe electronic medical records account (eCRF) by position researchist.Whole blood sample in the container of controlled temperature from multiple website shipped overnight to laboratory (Wei Ruina Qin Jiankang company (Verinata Health, Inc.), California).In reception and after carrying out sample survey, prepare cell-free plasma according to previously described method (see example 13) and in 2 to 4 aliquots freezer storage at-80 DEG C until order-checking time.If the date and time sample that recording laboratory carries out sample reception receives, touches up to be cool and to comprise at least 7mL blood all through the night, so determine that it is applicable to analyzing.Weekly qualified sampling report when receiving is used for the selection (vide infra and Figure 50) of stochastic sampling list to CRO.The clinical data deriving from the current gestation of women and fetal karyotype to be input in eCRF by website researchist and to be verified by CRO.
The degree of accuracy of the estimated value of the target zone of the performance characteristic (sensitivity and specificity) that the determination of sample size is tested based on index.Exactly, determine the number of the case of influenced (T21, T18, T13, the male sex, women or monosomy X) and the contrast of uninfluenced (non-T21, non-T18, non-T13, the non-male sex, non-women or non-monosomy X), to assess sensitivity and specificity (N=(1.96 √ p (1-p)/error span) in prespecified less error span accordingly based on normality approximation 2, wherein p=sensitivity or specific estimated value).Suppose that real sensitivity is 95% or larger, the sample size between 73 to 114 examples guarantees that the degree of accuracy of sensitivity estimation will make the lower bound of 95% confidence subregion (CI) will be 90% or more greatly (error span≤5%).For less sample size, the evaluated error amplitude larger (from 6% to 13.5%) of the 95%CI of plan sensitivity.In order to estimate specificity, at the unaffected contrast number (for case about 4: 1 ratio) that sample phase plan is larger with larger degree of accuracy.Guarantee that the degree of accuracy of specific estimated value reaches at least 3% thus.Therefore, along with sensitivity and/or specificity increase, the degree of accuracy of confidence subregion also will increase.
Determine based on sample size, CRO design random sampling scheme to produce the list of selected sample to check order (minimum 110 cases affected by T21, T18 or T13 and 400 unaffected with regard to trisomy, thus to allow in these cases nearly half to have except 46, caryogram beyond XX or 46, XY).Applicable selection has the experimenter of single pregnancy and qualified blood sample.Get rid of and there is failed test sample, experimenter (Figure 50) without caryogram record or multifetation.In whole research, regularly produce list and deliver to the healthy laboratory of the Wei Ruina Qin.
For six kinds of independent classifications, each qualified blood sample is analyzed.These classifications are the aneuploid states for chromosome 21,18 and 13, and the male sex, women and monosomy X sex state.Although be still blind, for each in six kinds of each plasma dna sample independently classification perspective produce one of three kinds of classification (affected, unaffected or be not classified).When using the program, same sample is classified as affected (such as the aneuploidy of chromosome 21) in may analyzing at one and is classified as unaffected (such as the euploid of chromosome 18) in another is analyzed.
The conventional medium cell genetic analysis of the cell obtained by chorionic villi sampling (CVS) or amniocentesis is used as reference standard in this study.In the participation normally used diagnostic test room of website, carry out fetal karyotype determine.If patient experienced by CVS and amniocentesis after registration, so the caryogram that amniocentesis produces is used for researching and analysing.If caryogram in mid-term cannot be obtained, so allow targeting staining body 21,18,13, fluorescence in situ hybridization (FISH) result (table 24) of X and Y.All abnormal karyotype reports are (namely except 46, XX and 46, beyond XY) all examined by the cytogeneticist through council's certification, and be categorized as affected or unaffected relative to chromosome 21,18 and 13 and sex state XX, XY and monosomy X.
The prespecified following abnormal karyotype of stipulations agreement regulation will be appointed as ' being inspected ' state of caryogram by cytogeneticist: triploidy, tetraploidy, the complex karyotype (such as mosaic) of chromosome 21,18 or 13 involved except trisomy, the heterosomal mosaic with mixing, sex chromosome aneuploidy or the caryogram (such as the label chromosome in unknown source) that can not be translated by source document completely.Because cytogenetics diagnosis is not known to order-checking laboratory, so all samples through cytogenetics inspection are all analyzed independently and are appointed as the classification (order-checking classification) using order-checking information to determine according to the inventive method, but not included in statistical study.Relevant one or more that checked state only belongs in six kinds of analyses (such as will check mosaic T18 from chromosome 18 is analyzed, but analyzed by other, as chromosome 21,13, X and Y, think ' unaffected ') (table 25).Not checking out from analyze when stipulations designs cannot other exceptions of perfect foresight and rare complex karyotype (table 26).
Data contained in eCRF and clinical data storehouse are only limitted to authorized user's (research website, CRO and signing clinical staff).Any employee of Wei Ruina Qin health can not access until when making known.
After receiving chance sample list from CRO, as described in example 13, extract total Cell-free DNA (potpourri of parent and fetus) from through the plasma sample selected by thawing.Yi Lu meter Na TruSeq kit v2.5 is utilized to prepare sequencing library.Check order, carry out on Yi Lu meter Na HiSeq 2000 instrument (6 clumps, i.e. 6 sample/swimming lanes) in the healthy laboratory of the Wei Ruina Qin.Obtain the single-ended reading of 36 base-pairs.Whole genome maps reading, and the sequence label on each interested chromosome to be counted and for classifying to sample for independently classification as described above.
The evidence that clinical stipulations need foetal DNA to exist is to report classification results.The classification of the male sex or aneuploid is regarded as the ample evidence of foetal DNA.In addition, also for the existence of foetal DNA, two kinds of allele-specific methods are used to test each sample.In first method, AmpflSTR Minifiler kit (life technology (Life Technologies), Santiago, California) is used to examine the existence of the fetus component in Cell-free DNA.ABI 3130 Genetic Analyser carries out according to the stipulations of manufacturer the electrophoresis of STR (STR) amplicon.By comparing the intensity of each peak value reported of the percents in the intensity summation accounting for all peak values, all nine the str locus seats in this kit are analyzed, and the existence of minor peaks is for providing the evidence of foetal DNA.When there is not the micro-STR that can identify, by the aliquot of SNP group sample for reference with 15 kinds of single nucleotide polymorphism (SNP), wherein select from the group of people such as Jede (Kidd), average heterozygosity >=0.4 (people such as Jede, international law medical science (Forensic Sci Int) 164 (1): 20-32 [2006]).The allele-specific method that can be used for the foetal DNA detecting and/or quantize in maternal sample is described in U.S. Patent Publication 20120010085,20110224087 and 20110201507, and these announcements are incorporated herein by reference.
Normalized chromosome value (NCV) is by such as calculating all autosomes described in example 13 and the arrangement of heterosomal all possible denominator is determined, but, because the order-checking in this research is carrying out from our previously different with Multi-example/swimming lane work instrument, so have to determine new normalization chromosome denominator.Normalization chromosome denominator in current research checks order to the training group with 110 independently (namely not from MELISSA qualified samples) unaffected samples (namely qualified sample) before being based on analysis and research sample and determines.New normalization chromosome denominator is by calculating all autosomes and the arrangement of heterosomal all possible denominator is determined, thus the variation of unaffected training group is minimized (table 23) for whole genomic all chromosome.
The NCV rule being applied to providing the autosome of each test sample to classify is described in example 12, namely for the classification of autosomal aneuploidy, NCV > 4.0 requires by chromosome classification to be affected (i.e. this chromosomal aneuploid) and chromosome classification is unaffected by NCV < 2.5.There is the autosomal sample of NCV between 2.5 and 4.0 be called " not being classified ".
Sex chromosome classification in this test is undertaken by the NCV applied in order for X and Y, as follows:
If 1. NCV X <-4.0 and NCV Y < 2.5, so classifies sample as monosomy X.
If 2. NCV X >-2.5 and NCV X < 2.5 and NCV Y < 2.5, so classifies sample as women (XX).
If 3. NCV X > 4.0 and NCV Y < 2.5, so classifies sample as XXX.
If 4. NCV X >-2.5 and NCV X < 2.5 and NCV Y > 33, so classifies sample as XXY.
If 5. NCV X <-4.0 and NCV Y > 4.0, so classifies sample as the male sex (XY).
If 6. satisfy condition 5, but NCV Y is about 2 times of NCV X expection measured value, so classifies sample as XYY.
If 7. the NCV of chromosome x and Y does not meet any above criterion, so classify sample as and be not classified with regard to sex.
Because laboratory is blind to clinical information, so do not regulate sequencing result for any following demographic variable: maternal body mass index, smoking state, diabetes, pregnant type (spontaneous or auxiliary), previous gestation, previous aneuploidy or conceptional age.Utilize neither the sample that parent is not again male parent is classified, and do not depend on specific gene seat or allelic measured value according to the classification of this method.
Making known and sequencing result is being returned before analyzing independently signing biostatistican.Studying the personnel of website, CRO (comprising the biostatistican producing stochastic sampling list) and signing cytogeneticist is blind to sequencing result.
the all chromosomal normalization chromosome sequences systematically determined of table 23.
Statistical method is recorded in the detailed statistical analysis of this research in the works.Analyze each in classification for six kinds, use clo amber-Pearson came method (Clopper-Pearson method) meter sensitivity and specific point estimate and 95% confidence subregion accurately.For carried out all statistical estimators, remove and foetal DNA, ' being inspected ' complex karyotype (agreement according to stipulations definition) do not detected or test ' not being classified ' sample by order-checking.
result
Between in June, 2010 and in August, 2011, in this research, register 2,882 pregnant woman.The feature of Eligible subjects and selected cohort is provided in table 24.Register and blood be provided but find that the experimenter of actual conceptional age more than 22 weeks 0 day gone beyond during data monitoring when comprising criterion and register allows to retain under study for action (n=22) subsequently.Three in these samples in selected group.Figure 50 shows the flow process of sample between registration and analysis.There are 2,625 samples being applicable to selecting.
table 24. patient demographic
*gA when invasive program.
*in the fetus with abnormal karyotype, the penetrance of ultrasound wave exception is higher
Abbreviation: BMI-body mass index; IUGR-intrauterine fetal growth retardation
According to random sampling scheme, the all Eligible subjects selecting there is abnormal karyotype and nourish euploid fetus subject group for analyzing (Figure 50 B) so that total order-checking Research Group for trisomy 21 produce be approximately 4: 1 unaffected: the ratio of affected experimenter.By this technique, select 534 experimenters.Subsequently because sample tracing problem based removes two samples from analysis, wherein between sample hose and data acquisition, whole chain of custody does not pass through quality audit (Figure 50).Produce thus by 532 experimenters of 53 contributions in 60 research websites for analysis.The demographics of selected cohort is similar to total cohort.
test performance
The process flow diagram that the aneuploidy that Figure 51 A-51C shows chromosome 21,18 and 13 is analyzed, and Figure 51 D-51F shows gender analysis flow process.Table 27 show six analyze in each sensitivity, specificity and confidence subregion, and Figure 52,53 and 54 shows the diagram sample distribution of the NCV after according to order-checking.Analyze in classification at all 6, remove 16 samples (3.0%) owing to foetal DNA not detected.After making known, there is not recognizable Clinical symptoms in these samples.The number of checked caryogram of all categories depends on the situation (being fully specified in Figure 52) analyzed.
Be 100% (95%CI=95.9 accordingly for detecting sensitivity and the specificity of the method for the T21 analyzed in colony (n=493), 100.0) and 100% (95%CI=99.1,100.0) (table 27 and Figure 51 A).This example comprises following correct classification: a kind of complicated T21 caryogram 47, XX, inv (7) (p22q32) ,+21; The transposition T21 of Robertsonian translocation (Robertsonian translocations) is resulted from two kinds, wherein a kind of with regard to monosomy X or mosaic (45, X ,+21, der (14; 21) q10; Q10) [4]/46, XY ,+21, der (14; 21) q10; Q10) [17] and 46, XY ,+21, der (21; 21) q10; Q10).
The sensitivity and the specificity that detect the T18 analyzed in colony (n=496) are 97.2% (85.5,99.9) and 100% (99.2,100.0) (table 27 and Figure 51 B).Although by checked from initial analysis (according to stipulations), four samples with regard to T21 and T18 with mosaic caryogram are all correctly categorized as with regard to aneuploidy ' affected ' (table 25) by method of the present invention.Because they are correctly detected out, so they are noted in the left side of Figure 51 A and 51B.All the other checked samples all are all correctly classified as with regard to chromosome 21,18 and trisomy 13 unaffected (table 25).The sensitivity and the specificity that detect the T13 analyzed in colony are 78.6% (49.2,99.9) and 100% (99.2,100.0) (Figure 51 C).Detected a T13 case (46, XY ,+13, der (13 caused by Robertsonian translocation; 13) q10; Q10).In chromosome 21 is analyzed, there are seven samples be not classified (1.4%), have five (1.0%) in chromosome 18 is analyzed, and have two (0.4%) (Figure 51 A-51C) in chromosome 13 is analyzed.In all categories, have three sample overlaps, these samples have the caryogram (69, XXX) that is inspected concurrently and foetal DNA do not detected.A sample be not classified during chromosome 21 is analyzed correctly be identified as chromosome 13 analyze in T13, and a sample be not classified during chromosome 18 is analyzed correctly be identified as chromosome 21 analyze in T21.
the caryogram that table 25. is inspected
*the experimenter got rid of from all analysis classifications due to the label chromosome in a clone.
*caryogram 48, XXY ,+18 are not classified and the experimenter of sex chromosome aneuploidy do not detected in chromosome 18 is analyzed.
the exception that table 26. is not inspected and the caryogram of complexity
*after making known, from the order-checking label chromosome 6, notice that the normalized chromosome value (NCV) of increase is 3.6.
For determining that sex chromosome analysis colony (women, the male sex or monosomy X) of the method performance is 433.We allow to determine accurately sex chromosome aneuploidy for the extracted arithmetic of classifying to sex state, thus obtain the higher number of results be not classified.99.6% (95%CI=97.6 accordingly for detecting sensitivity and the specificity of dliploid women state (XX), > 99.9) and 99.5% (95%CI=97.2, > 99.9); All 100% (95%CI=98.0,100.0) for detecting sensitivity and the specificity of the male sex (XY); And be 93.8% (95%CI=69.8,99.8) and 99.8% (95%CI=98.7, > 99.9) (Figure 33 D-f) for the sensitivity that detects monosomy X (45, X) and specificity.Although by analytical review (according to stipulations), but the order-checking of mosaic monosomy X caryogram is classified as follows (table 25): 2/7 is classified as monosomy X, 3/7 is classified as have the Y chromosome component being classified as XY, and has 2/7 of XX chromosome complement and be classified as women.According to two samples that classification of the present invention is monosomy X, there is caryogram 47, XXX and 46, XX.For caryogram 47, XXX, 47, XXY and 47, XYY, the sex chromosome aneuploidy of 8/10ths is correctly classified (table 25).If sex chromosome classification is confined to monosomy X, XY and XX, so the sample that major part is not classified correctly can be categorized as the male sex, but XXY and XYY aneuploidy can not be identified.
Except to chromosome 21,18, except trisomy 13 and sex classify exactly, sequencing result can also by two samples (47, XX ,+16 and 47, XX ,+20) (table 26) is correctly classified for the aneuploidy of chromosome 16 and 20 in.Interestingly, long-armed (6q) and two order-checking labels copied in a sample display chromosome 6 of the change complicated clinically of (one of them is 37.5 megabasses in size) with chromosome 6 cause NCV to increase (NCV=3.6).In another sample, method according to the present invention detects the aneuploidy of chromosome 2, but does not observe (46, XX) in fetal karyotype when amniocentesis.Other complex karyotype variants shown in table 25 and 26 comprise from the sample of the fetus of other exceptions having chromosome inversion, disappearance, transposition, triploidy and do not detect herein, but method of the present invention may be used to classify under higher order-checking density and/or under further algorithm optimization.In these cases, sample can be correctly categorized as unaffected and sex with regard to trisomy 21,18 or 13 by method of the present invention.
In this research, 38/532 sample is by analysis from the women living through supplementary reproduction.Wherein, the sample of 17/38 has chromosome abnormality; False positive or false negative is not detected in this subgroup.
The sensitivity of table 27. the method and specificity
discuss
Should determine that the perspective study of whole chromosome fetus aneuploidy was that design is used for the situation of sample collection in the simulating reality world, process and analysis by Maternal plasma.Obtain whole blood sample at registration website, do not need to process immediately, and shipped overnight is to order-checking laboratory.With the perspective study (people such as Pa Luomaiji (Palomaki) previously only relating to chromosome 21, medical genetics (Genetics in Medicine) 2011:1) contrary, in this research, all qualified samples with any abnormal karyotype are checked order and analyzed.Order-checking laboratory does not know which fetal chromosomal may be influenced in advance, does not know the ratio of aneuploid and euploid sample yet.This research and design enlists excessive risk research pregnant woman group to guarantee statistically evident aneuploidy prevalence rate, and table 25 and 26 indicates the complicacy of analyzed caryogram.Result proves: i) can detect fetus aneuploidy (comprise and to be caused by translocation trisomics, mosaic and complicated variation) under high sensitivity and specificity; And ii) aneuploidy in a chromosome do not affect the inventive method for correctly identifying the ability of other chromosomal euploid states.The algorithm utilized in previous research seems effectively to determine other aneuploidy (people such as Ai Lixi (Erich), U.S.'s journal of obstetrics and gynecology (Am J Obstet Gynecol) in the March, 2011 by being inevitably present in general clinical populations; 204 (3): 205 e1-11; The people such as Zhao, British Medical Journal (BMJ) 2011; 342:c7401).
About mosaic, can correctly classify to the sample that for chromosome 21 and 18, there is mosaic caryogram in the affected sample of 4/4 to the analysis of order-checking information in this research.These results prove the sensitivity of the analysis of the special characteristic being used for Cell-free DNA in detection of complex potpourri.In a case, the sequencing data for chromosome 2 indicates chromosomal aneuploidy that is complete or part, and is dliploids for the amniocentesis results of karyotype of chromosome 2.In two other examples, sample has 47, XXX caryogram and another sample has 46, XX caryogram, and these sample classifications are monosomy X by method of the present invention.Likely these are mosaic cases, or pregnant woman self is mosaic.(importantly should remember, order-checking is carried out STb gene, and this STb gene is the combination of parent and foetal DNA.Although) by invasive program, the current reference standard being aneuploidy and classifying of CYTOGENETIC ANALYSIS OF ONE is carried out to amnion cell or fine hair, low-level mosaic can not be got rid of to the caryogram that a limited number of cell carries out.Current clinical study design does not comprise long-term baby and follows up a case by regular visits to or the contact placenta tissue when giving a birth, and therefore we can not determine that these are true or false positive results.We infer, compared with determining with Standard karyotype, specificity and the sensitiveer identification that finally can provide according to the algorithm combination optimized for detecting whole genomic the inventive method foetal DNA exception of order-checking technique, particularly when mosaic.
International pre-natal diagnosis association has delivered rapid reaction statement people such as (, pre-natal diagnosis (Prenat Diagn) 2012 doi:10.1002/pd.2919) (Benn) commented on for Gong the business usability of Down syndrome (Down syndrome) antenatal detection extensive parallel order-checking (MPS).They state, before introducing the population screening for the extensive parallel order-checking based on routine of fetal Down syndrome, need the evidence carrying out testing in some subgroups, as in the women by pregnancy in vitro fertilization.The result herein reported shows, this method is accurately in this pregnant woman group, and wherein many people exist higher aneuploidy risk.
Utilize this method of the algorithm through optimizing for carrying out premium properties when aneuploidy detects to the whole genome in the single pregnancy from the higher women of aneuploidy risk although these results demonstrate, but when prevalence rate lower and be multifetation time, particularly in low-risk colony, more experiences are needed to set up the credibility of the diagnosis capability to the method.At the commitment of clinical implementation, should order-checking information be used to classify to chromosome 21,18 and 13 according to this method after positive pregnancy first or second trimenon screening results.To the unnecessary invasive program caused by false positive screening results be reduced, the minimizing of the program that simultaneous is relevant to adverse events thus.Invasive program may be confined to confirm by the positive findings obtained that checks order.But, there is the clinical scenarios (such as parent advanced age and sterility) that pregnant woman wants to avoid invasive program; They may require that this test is as the replacement scheme of preliminary examination and/or invasive program.Before all patients should accept test fully, consulting is to guarantee that they understand the restriction of test and the implication of result.Along with utilizing more Multi-example to carry out experience accumulation, alternative current examination planning of experiments is likely become preliminary examination by this test, and finally becomes the non-invasive diagnostic test of fetus aneuploidy.
example 17
determine that fetus mark is to exist fetal chromosomal aneuploidy that is complete or part in discrimination analysis sample by NCV
Suppose that in maternal sample, the relevant chromosome dosage of fetal chromosomal and the fetus mark of increase increase pro rata, people's expection is for complete interested chromosome, and the ff value based on NCV value will determine the fetal chromosomal aneuploidy that presence or absence is complete.In order to prove that the ff determined by NCV can be used for distinguishing the existence of chromosomal aneuploidy or the contribution of mosaic sample of complete chromosomal aneuploidy and part, the genomic DNA from mother and they children is used to set up the Artificial sample of simulating the fetus of discovery and the potpourri of parent cfDNA in pregnant woman's circulation.The value based on NCV of fetus mark is a kind of form of above-mentioned hypothesis fetus mark.
The DNA of mother and children is purchased from Julius Korir medical research association (Coriell Institute fbr Medical Research) (Camden, New Jersey).DNA identifies and sample caryogram is provided in table 27.
table 27. example 17
As follows the sample of the chromosomal aneuploidy comprising complete chromosome or part is analyzed.
In all cases, shear the genomic DNA from mother and the genomic DNA from children by sonication, wherein peak value is 200bp.Process to prepare sequencing library to the Artificial sample comprising mother DNA additional 0%, 5% or 10%w/w children DNA, as described in example 12, use synthetic method order-checking to check order to it with extensive parallel mode.Each artificial DNA sample uses independently flow cell to check order four times on order-checking device, with 4 the sequence information collection of providing package containing each sample of 0%, 5% and 10% children DNA.36bp reading and mankind's canonical sequence genome hg19 are compared, and the label mapped uniquely is counted.For each in 4 flow cell swimming lanes that each sample uses, obtain about 125X10 6individual sequence label.
Normalization chromosome (single or chromosome group) is identified, as described elsewhere herein in the qualified samples group comprising 20 male sex and 20 women gDNA libraries.Normalization chromosome for chromosome 21 is identified as chromosome 4+ chromosome 16+ chromosome 22; Normalization chromosome for chromosome 7 is identified as chromosome 4+ chromosome 6+ chromosome 8+ chromosome 12+ chromosome 19+ chromosome 20; Normalization chromosome for chromosome 15 is identified as chromosome 9+ chromosome 12+ chromosome 14+ chromosome 19+ chromosome 20; Normalization chromosome for chromosome 22 is identified as chromosome 19; And the normalization chromosome for chromosome x is identified as chromosome 4+ chromosome 6+ chromosome 7+ chromosome 8.The sequence label of the interested chromosome obtained by checking order to Artificial sample and corresponding normalization chromosome (single chromosome or chromosome group) is counted, and for calculating chromosome dosage and calculating NCV.
In this example, the NCV for the chromosome 21 in sample mixture (1) is used to determine ff, wherein NCV 21Athat this test sample comprises trisome 21 for the determined NCV value of chromosome 21 in test sample (1), and CV 21Uit is the coefficient of variation of the dosage of determined chromosome 21 in qualified samples (comprising dliploid chromosome 21); And wherein NCV xAthat this test sample comprises trisome 21 for the determined NCV value of chromosome x in test sample (1), and CV xUit is the coefficient of variation of the dosage of determined chromosome x in qualified samples (comprising impregnable female child chromosome).
Figure 56 shows the dosage (ff using chromosome 21 in the maternal sample (1) of synthesis 21) number percent " ff " determined is along with the dosage (ff using chromosome x x) figure that the number percent " ff " determined changes, this sample comprises the DNA from the children with trisomy 21.
Data show, chromosome dosage increases along with ff with the NCV stemming from it and increases pro rata, and there is 1: 1 relation between the number percent ff using the dosage of trisome (i.e. chromosome 21) to determine and the number percent ff using the dosage of the known chromosome (i.e. chromosome x) existed as single chromosome to determine.
Figure 57 shows the dosage (ff using chromosome 7 in the maternal sample (2) of synthesis 7) number percent " ff " determined is along with the dosage (ff using chromosome x x) figure that the number percent " ff " determined changes, this sample comprises the DNA from an euploid mother and her children, and its these children carry excalation in chromosome 7.
As shown in for sample (1) and (2), data display chromosome dosage increases along with ff with the NCV stemming from it and increases pro rata.But, when aneuploidy is the chromosomal aneuploidy of part, use the chromosomal chromosome dosage of part aneuploid (ff 7) the number percent ff that determines not with the dosage (ff using chromosome x x) the number percent ff that determines is corresponding.Therefore, 1: 1 relation departed from shown by complete trisomic sample shows to there is part aneuploidy.
Figure 58 shows the dosage (ff using chromosome 15 in the maternal sample (3) of synthesis 15) number percent " ff " determined is along with the dosage (ff using chromosome x x) figure that the number percent " ff " determined changes, this sample comprises the DNA from an euploid mother and her children, and these children are 25% mosaic types of the partial replication with chromosome 15.
As shown in for sample (1) and (2), the ff that using dosage is determined increases along with ff with the NCV stemming from it and increases pro rata.As shown in sample (2), sample (3) comprises the chromosomal aneuploidy of part, and uses the chromosomal chromosome dosage of part aneuploid (ff 15) the number percent ff that determines not with the dosage (ff used for chromosome x x) the number percent ff that determines is corresponding.Lack correspondence between two ff and show the aneuploidy of existence part instead of complete chromosomal aneuploidy.
Figure 59 shows the dosage (ff using chromosome 22 in Artificial sample (4) 22) figure of the number percent " ff " determined and the NCV that stems from it, this sample comprises 0% children DNA (i); With 10% from the DNA (ii) of unaffected twin boys, this son known does not have the chromosomal aneuploidy of the part of chromosome 22; And 10% from the DNA (iii) of affected twin boys, this son known has the chromosomal aneuploidy of the part of chromosome 22.Data show, and " ff " that determine for the sample comprised from unaffected twinborn DNA and by four NCV of the Rapid Dose Calculation according to chromosome 22, close to zero, this shows the aneuploidy that there is not chromosome 22 in unaffected children; And when Rapid Dose Calculation according to chromosome x, unaffected twinborn " ff " confirms that " ff " of unaffected twins' sample is about 10%.Data also show, for the sample comprised from affected twinborn DNA and by the dosage (ff according to chromosome 22 22) four NCV calculating " ff " that determine are about 3%, this shows to exist aneuploidy in chromosome 22; And as the dosage (ff according to chromosome x x) calculate time, " ff " confirms that " ff " of unaffected twins' sample is about 10%.Ff 22with ff xbetween lack correspondence show the aneuploidy of chromosome 22 in affected twins be part chromosomal aneuploidy.
Therefore, data show, in the maternal sample of cfDNA comprising male fetus, chromosome dosage and the NCV value stemming from it can be used for distinguishing the aneuploidy of the part existing for existing in complete trisomy and mosaic sample and/or complete or aneuploidy partly.The aneuploidy of part can be increase or the minimizing of a chromosome part.Optionally, can by using the fetus mark of chromosome dosage and estimation to obtain the aneuploidy of part and/or the fractionation of mosaic as described in example 12.
Above-mentioned fetus fraction method can also be used for determining that in multifetation, one or more fetus has the possibility of aneuploidy.Such as, in the case of a fraternal twin, find according to NCV xthe fetus mark that value is determined is 8.3%, and by NCV 21the mark that value records is 5.0%.Show to only have one to have T21 aneuploidy in this pair male fetus thus, and confirm this result by results of karyotype.Having in the twinborn example of parent at another, be 7.3%, and the fetus mark determined by chromosome 18 is 8.9% according to the fetus mark that X chromosome is determined.In this example, determine that two twins are the T18 male sex according to caryogram.
example 18
determine that fetus mark is to identify the existence of fetal chromosomal aneuploidy complete in clinical sample by NCV
In order to prove that the ff (CNff) determined according to NCV can be used for distinguishing chromosomal aneuploidy complete in clinical sample and the existence of the chromosomal aneuploidy of part, the cfDNA available from pregnant woman blood is used to quantize the interested chromosome 21,13 and 18 in clinical sample.By the existence of caryogram checking trisomy.
CfDNA is obtained: respectively nourish pregnant woman's 46 maternal samples that has the male fetus of trisomy 21 (T21) from following sample; Respectively nourish 13 maternal samples that has the pregnant woman of the fetus of trisomy 18 (T18); And nourish 3 maternal samples that has the pregnant woman of the male fetus of trisomy 13 (T13).These clinical samples are the sample from the clinical research described in example 16.Be separated cfDNA, and as described in example 16, but use new Yi Lu meter Na v3 chemical substance to prepare sequencing library.
Also new Yi Lu meter Na v3 chemical substance is used to check order to by the sequencing library deriving from the known cfDNA for the unaffected qualified samples of chromosome 21,18 and 13 obtained.The sequence reads obtained for qualified samples is mapped to mankind's canonical sequence genome hg19, and count the sequence reads mapping all chromosome sequences (not shielding repetitive sequence) corresponding to mankind's canonical sequence genome hg19 uniquely, and for systematically determining which chromosome or which group chromosome will serve as the normalization chromosome of each interested chromosome 21,18 and 13 in the test sample.
Show identified for determining the normalization chromosome (denominator chromosome) for the chromosome dosage (ratio) of chromosome 1-22, X and Y in each test sample with following table 28.
the confession that table 28. example 18-systematically identifies tests the normalization chromosome of sample for T21, T18 and T13
When having identified the normalization chromosome in qualified samples, test sample is checked order, and count each chromosome 21,18,13 be mapped in test specimens product and the corresponding chromosomal sequence label of normalization, and for calculating chromosome dosage (ratio).Then, as discussed previously according to following equation calculating NCV value:
NCV jA = R jA - R jU &OverBar; &sigma; jU Equation 21.
For each test sample, determine for chromosome x and interested chromosomal fetus mark according to the following equation in this instructions described in other parts:
Ff=2 × | NCV iAcV iU| equation 28.
Figure 60 shows determined CNffx in the sample comprising fetus T trisomy 21 and contrasts the figure of CNff21.As for complete chromosomal aneuploidy expected, CNffx matches with using the NCV determined (CNff21) of chromosome 21.
Similarly, test in sample at T18, CNffx matches (Figure 61) with using the NCV determined (CNff18) of chromosome 18, and test in sample at T13, CNffx matches (Figure 62) with using the NCV determined (CNff13) of chromosome 13.
Figure 60 also show the fetus mark obtained by the sample that T21 affects for female child.As is expected, the CNff21 in these " women " samples cannot by verifying compared with chromosome x.In order to verify the CNff21 of women's sample, the known CNff that can not become the chromosome (such as chromosome 1) of fetal aneuploidy can be determined.As an alternative, the CNff21 of " women " sample determines by itself and NCNff being compared, such as, determine by counting the label of polymorphic sequence as described elsewhere herein.
Therefore, the gained NCV value that the chromosomal copy number that sequence label number is complete with identification makes a variation can be used for the corresponding fetus mark determined in aneuploid/affected sample.The correspondence of interested chromosomal CNff and the euploid chromosomal CNff of known not right and wrong can be used for the existence confirming complete Trisomy.
example 19
determine that fetus mark is to identify the fetal chromosomal aneuploidy of existence part in clinical sample by NCV
In order to prove that the ff (CNff) determined according to NCV can be used for identifying and locates the chromosomal aneuploidy of part in clinical sample and the existence of the chromosomal aneuploidy of part, the cfDNA being identified as the clinical sample with chromosome 17 aneuploidy as described in example 18 to controlling oneself checks order and analyzes.
Use the sequence label (above table 28) of normalization chromosome (chromosome 16+ chromosome 20+ chromosome 22) being mapped to and identifying in chromosome 17 in test sample and qualified samples group, calculate in test sample for each chromosomal NCV value.
Figure 63 shows the figure for the NCV value of chromosome 1-22 and X in test sample.As shown in the figure, the NCV value for chromosome 17 is confirmed as having NCV > 4, and it is for selecting for identifying the chromosomal threshold value of aneuploid.Also shown is the NCV value for chromosome x, as expected, chromosome x has negative NCV.
The CNff of chromosome 17 and chromosome x is calculated according to following equation:
Ff (i)=2*NCV jAcV jUequation 25,
And determine CNff17=3.9% and CNffX=13.5%.
Difference between CNff shows that the aneuploidy of existence part may be maybe mosaic.
In order to distinguish the aneuploidy of part and possible mosaic, for the continuous matrix/subregion of each 100Kbp on chromosome 17, number of tags being counted, and calculating normalized binary value (NBV) for each subregion.In independent subregion, the normalization of number of tags is by determining label/data box and having formed objects and the ratio with the number of tags summation in 20 data boxes of the immediate GC content with analyzed data box carries out.Therefore, in this case, normalization is relevant with GC content.Optionally, data box normalization also may be relevant with the variability of data box dosage, as in the qualified samples as described in chromosome dosage/ratio determine.In this example, GCC Z score equals as following determined NBV value:
NBV ij = x ij - Mj MAD Equation 29,
Wherein M jand MAD jbe for the estimation median of the chromosome dosage of the jth in qualified samples group and the deviation through median adjustment accordingly, and x ijit is the jth chromosome dosage observed for test sample i.
Normalized binary value (NBV) for each 100Kbp subregion of the length along chromosome 17 obtains form-separating as the normalized GCC Z of instruction GC and illustrates in the Y-axis of Figure 64.Figure shown in Figure 64 obviously illustrates that the copy number corresponding to the subregion being similar to last 200,000bp in chromosome 17 increases.The caryogram that this discovery provides with a sample copied for the q ter place that chromosome 17 is described conforms to.
Therefore, CNff can be used for the aneuploidy of the part in identification and positioning dyeing body.
______________________
Example 20
Verification sample integrality in the multi-biological inspection of parent cfDNA
The marker molecules synthesis of the known sequence be not included in any known genome will be had and integrality in order to verify whole blood and blood plasma maternal source sample, these samples through process to extract the potpourri of fetus in maternal sample and parent cfDNA and to check order to it.
The average length that the experimental data of current and previous has shown cfDNA is about 170bp.Use blast search, log in for all genomes, identify the antigene strand sequence of non-existent 170bp in any one known genome.Six marker molecules (MM1-MM6) are based on sequence (the SEQ ID NO:1-6 of identified antigene strand sequence; Table 29) synthesis, and as follows in order to the integrality of verification sample.
Table 29
Marker molecules
From collecting peripheral blood in pregnant woman's body to 4 blood collection tubes (Cell-Free DNA of Shi Te Rieke Corp. of Omaha city, the Nebraska State (Streck, Inc.Omaha NE) tMbCT) be transported to laboratory all through the night and analyze in.Two following additional marker molecules of whole blood source sample.An additional 720pg marker molecules 1 (MM1) of blood sources sample, and the additional 720pg marker molecules 2 of the second blood sources sample.All 4 pipes all at 4 DEG C under 1600g centrifugal 10 minutes.From each four pipes, shift out plasma supernatant, and to put it in 5mL high speed centrifugation pipe and at 4 DEG C under 16000g centrifugal 10 minutes.Store at-80 DEG C during the blood plasma fractions of the whole blood of additional marker molecules is distributed to separately pipe.Then 1.1mL aliquot is divided into from by the blood plasma fractions of two residual blood pipes (not carrying out additional).Blood plasma source sample is prepared as follows.100 pik MM1 are added in a blood plasma aliquot, 100pg MM2 adds in blood plasma aliquot 2, etc., to obtain 6 through the blood plasma source sample of mark, at each blood plasma source sample is included in-80 DEG C, store different marker molecules (MM1-MM6).
A pipe of each blood plasma source sample through mark and each 1 pipe through the source blood sample of mark are thawed, and according to the method described in example 1, use triumphant outstanding blood Mini Kit (Qiagen Blood Mini Kit) to extract DNA.Use the TruSeq comprising index 1-6 tMdNA sample prepare kit (San Diego, CA city ), use often kind of sample DNA of 30 microlitres to prepare library.Be prepared sequencing library, thus make the sample comprising MM1 use indexed molecule 1 to index, the sample comprising MM2 makes index of reference 2 index etc.Sequencing library uses Agilent bioanalysis device DNA1000 kit (Agilent technology company, Santa Clara, California) quantize and be diluted to 4nM with triumphant outstanding damping fluid EB.To index and pass through the sample marked and collect and be diluted to 2nM further, then use Yi Lu meter Na TruSeq SBS kit v3, according to table 30, check order in four swimming lanes of Yi Lu meter Na HiSeq flow cell.
Table 30
The layout of multiple order-checking flow cell
Sequence reads and the mankind are compared with reference to genome hg19 and compared with the reference genome of the synthesis comprising antigene strand marker molecules sequence.Be mapped to hg19 to unique (namely only once) to count (table 31) with reference to the genomic sequence reads of reference of genome or the synthesis with marker molecules sequence.
Table 31
MM sequence is corresponding with source sample cfDNA sequence
* I=index
* L=swimming lane
Data show, for each sample, the sequence determining the MM added in source sample is only corresponding with the sequence of cfDNA of the source sample adding MM.For example, the data of sample 1 show, determine that the sequence of the reading being mapped to MM1 is only corresponding with the sequence of the cfDNA obtained from the source sample (plasma sample 1) adding MM1.In addition, from the reading that the order-checking cfDNA of source sample 1 obtains, there is not different sequences (such as MM2) and show source sample 1 not by another sample (such as source sample 2) cross pollution.
Example 21
internal positive control
Develop a kind of positives contrast of process for carrying out extensive parallel order-checking to parent cfDNA, for trisomy 13, trisomy 18 and trisomy 21 provide qualitative positive staining body dosage and NCV value.
By from have accordingly Chr13, Chr18 and the known trisomy of Chr21 three male patients become the genomic DNA of fragment to be applied to women to become in the DNA background of fragment.Size Selection is carried out to becoming the genomic DNA of fragment by PAGE, to comprise length from the fragment within the scope of about 150bp to about 250bp, thus the size of simulation fetus cfDNA.The DNA through size Selection of T13, T18 and T21 contrast is carried out to purifying and carries out end reparation, and uses Nanodrop (Wilmington, DE city (Wilmington, DE)) to measure concentration.Prepared DNA confirms in bioanalysis device high sensitivity DNA chip (Agilent, Santa Clara, California).These DNA of trisomy 13, trisomy 18 and trisomy 21 obtain from riel Institute for Medical Research of section (Coriell Institute for Medical Research) (Camden city, New Jersey (Camden, NJ)).Women's genomic DNA obtains from Bo Cheng company (The Biochain Institute) (Hayward city, California (Hayward, CA)).Three a small amount of body DNA are applied in main women DNA background, to simulate " male fetus " DNA mark in women's " parent " DNA background.Carry out optimization to the composition of this DNA potpourri, make when determining that copy number make a variation in the inspection that is used for checking order, potpourri always reports the positive to trisomy 13, trisomy 18 and trisomy 21 qualitatively, and wherein the NCV value of 13,18 and 21 is greater than 4.
Parent cfDNA extracts from plasma sample, and these plasma samples obtain from pregnant woman; And prepare maternal sample cfDNA and T13, T18 and T21 the sequencing library of contrast DNA for multiple order-checking, use Yi Lu meter Na platform to carry out this multiple order-checking.In each flow cell of sequenator, four positive controls and 56 samples are checked order.As described in other places of the application, obtain 36bp reading, identify multiple chromosomal label, and calculate NCV value.
Figure 69 A, B and C show the NCV value of parent test sample (◇) and internal positive control ().NCV value is confirmed as having copy number variation accordingly for interested chromosome 13 (A), 18 (B) and 21 (C) more than 4.The NCV that the figure shows positive control associates with the NCV that parent tests sample, identifies it and has copy number variation, i.e. the additional copy of chromosome 13,18 and 21.
Internal positive control can be designed to simulate complete chromosome variation and chromosome dyad variation, and these internal positive controls may be used for pre-natal diagnosis inspection and such as described in throughout this instructions, determine the inspection that fetus mark etc. is correlated with by extensive parallel order-checking.
Example 22
Extensive parallel order-checking is used to determine fetus mark: sample preparation and cfDNA extract
From being in gravidic first trimenon or second trimenon and collecting peripheral blood sample in the pregnant woman's body being considered to exist fetus aneuploidy risk.Letter of consent is obtained from each participant before blood drawing.Blood is collected before amniocentesis or chorionic villi sampling.Chorionic villi or amniocentesis sample is used to carry out karyotyping to determine fetal karyotype.
The peripheral blood extracted from each experimenter is collected in ACD pipe.One pipe blood sample (about 6 to 9 milliliters/pipe) is transferred in 15 milliliters of low-speed centrifugal pipes.Use Beckman Allegra 6R hydro-extractor and GA 3.8 type rotor, 2640rpm, at 4 DEG C by centrifugal blood 10 minutes.
Cell-free plasma is extracted, top plasma layer is transferred in 15 milliliters of high speed centrifugation pipes, and use Beckman Ku Erte Avanti J-E hydro-extractor and JA-14 rotor, 16000 × g, at 4 DEG C centrifugal 10 minutes.After blood is collected, in 72 hours, carry out two centrifugation step.At the cell-free plasma comprising cfDNA is stored in-80 DEG C, and increases at blood plasma cfDNA or only thaw once before cfDNA purifying.
Use QIAamp blood DNA Mini Kit (Kai Jie), substantially from cell-free plasma, extract purified Cell-free DNA (cfDNA) according to manufacturer specification.One milliliter of buffer A L and 100 μ l protein enzyme solutions are added in 1ml blood plasma.At 56 DEG C, this potpourri is hatched 15 minutes.One milliliter of 100% ethanol is added in blood plasma digestive juice.Gained potpourri is transferred in the QIAamp micro-column combined with VacValve and VacConnector that provide in QIAvac 24 Plus column combination part (Kai Jie).Apply vacuum to sample, and with 750 μ l buffer A W1, the cfDNA be trapped on post filtrator is washed under vacuo, then carry out second time washing with 750 μ l buffer A W24.Under 14,000RPM by centrifugal for this post 5 minutes to remove any remaining buffer from filtrator.By buffer A E elution cfDNA centrifugal under 14,000RPM, and use Qubit tMquantize platform (Invitrogen (hero)) and determine concentration.
Example 23
Extensive parallel order-checking is used to determine fetus mark: prepare sequencing library, order-checking and analyze sequencing data
A. sequencing library is prepared
All sequencing libraries, i.e. target, elementary and through the library of enrichment, all prepared by the purified cfDNA of the about 2ng extracted from Maternal plasma.Use nEBNext tMdNA sample prepares DNA reagent collection 1 (Item Number E6000L; Knob Great Britain biology laboratory, Ipswich, Massachusetts) reagent carry out library preparation as follows.Because cell-free plasma DNA becomes fragment in essence, therefore no longer this plasma dna sample is made to become fragment by spray-on process or sonication.According to end repairs module, by by cfDNA and NEBNext tM5 μ l 10X Phosphorylation Buffer, 2 μ l deoxynucleotide solution mixtures (each dNTP of 10mM), 1 μ l 1: 5 DNA polymerase I dilution, 1 μ l T4 DNA polymerase and 1 μ l T4 polynucleotide kinase that prepared by DNA sample provide in DNA reagent collection 1 hatch 15 minutes together in 1.5ml microcentrifugal tube at 20 DEG C, and the jag of purified for the about 2ng be included in 40 μ l cfDNA fragment is changed into phosphorylation blunt end.Then hot deactivation was carried out to this enzyme in 5 minutes by being hatched by this reaction mixture at 75 DEG C.This potpourri is cooled to 4 DEG C, and uses 10 μ l to comprise the main mixed liquor (NEBNext of dA tailing of Klenow fragment (3 ' to 5 ' exo-) tMdNA sample prepares DNA reagent collection 1) and at 37 DEG C, hatch 15 minutes to realize the dA tailing of blunt end DNA.Subsequently, hot deactivation was carried out to Klenow fragment in 5 minutes by being hatched by this reaction mixture at 75 DEG C.After Klenow fragment deactivation, use NEBNext tMdNA sample prepares the 4 μ l T4 DNA ligases provided in DNA reagent collection 1, by reaction mixture being hatched 15 minutes at 25 DEG C, by 1 μ l Yi Lu meter Na genome aptamer oligomeric mixture (Item Number 1000521; Illumina Inc. of Hayward city, California) 1: 5 dilution Yi Lu meter Na aptamer (non-index Y aptamer) is connected to the DNA of band dA tail.This potpourri is cooled to 4 DEG C, and uses An Jinkete AMPure XP PCR purification system (Item Number A63881; Beckman Ku Erte genome, Dan Fusi, Massachusetts) in the magnetic bead that provides, be purified into the cfDNA that aptamer connects in the aptamer never connected, aptamer dimer and other reagent.Use (fragrant appearance is beautiful for the main mixed liquor of high-fidelity, Wo Ben, Massachusetts) carry out 18 PCR circulations with the Yi Lu meter Na PCR primer (Item Number 1000537 and 1000537) compensating aptamer so that the optionally cfDNA that is connected of enrichment aptamer.Use Yi Lu meter Na Genomic PCR primer (Item Number 100537 and 1000538) and NEBNext tMdNA sample prepares the main mixed liquor of Phusion HF PCR provided in DNA reagent collection 1, according to manufacturer specification to the DNA that aptamer connects carry out PCR (98 DEG C, 30 seconds; 98 DEG C, 10 seconds, 18 circulations; 65 DEG C, 30 seconds; And 72 DEG C, 30 seconds; Final extension 5 minutes at 72 DEG C, and remain on 4 DEG C).Use An Jinkete AMPure XP PCR purification system (An Jinkete biotechnology company, Billy's Buddhist, Massachusetts), carry out the product of purifying through amplification according to the manufacturer specification that can obtain at www.beckmangenomics.com/products/AMPureXPProtocol_000387 v001.pdf place.The purified amplified production of elution in the triumphant outstanding EB damping fluid of 40 μ l, and use for 2100 bioanalysis device (Agilent technology companys, Santa Clara, California) Agilent DNA 1000 kit analyze concentration through the library of amplification and size distribution.
B. check order
Use gene element analyzer II (Illumina Inc., Santiago, California, USA), according to standard manufacturer stipulations, library DNA is checked order.The copy using Yi Lu meter Na/Suo Lekesa technology to carry out the stipulations of genome sequencing can instruct 2007 the 29th pages in BioTechniques.RTM. stipulations disclosed in Dec, 2006 and find, and at WWW biotechniques.com/default.asp? page=protocol & subsection=article_display & id=112378 finds.
DNA library is diluted to lnM and sex change.According to the program described in the Yi Lu meter Na cluster station users' guidebook that can obtain on WWW illumina.com/systems/genome analyzer/cluster_station.ilmn (Illumina ' s Cluster Station User Guide) and cluster station operating guidance (Cluster Station Operations Guide), library DNA (5pM) is made to carry out cluster amplification.Yi Lu meter Na gene element analyzer II is used to check order, to obtain the single-ended reading of 36bp to the DNA through amplification.Identify a sequence and belong to a specific human chromosome, only need the random series information of about 30bp.Longer sequence can identify target more specifically uniquely.In current situations, obtain numerous 36bp reading, cover genomic about 10%.
C. sequencing data is analyzed to determine fetus mark
Once complete the order-checking of sample, image and base are judged that file is transferred to one and run in the Unix server of Yi Lu meter Na " gene element analyzer streamline (Genome Analyzer Pipeline) " software version 1.51 by Yi Lu meter Na " sequence control software design ".Use BOWTIE program, 36bp reading and artificial reference genome (such as SNP genome) are compared.This artificial reference genome is identified as the grouping covering in polymorphic target sequence the allelic polymorphic DNA sequence dna comprised.For example, artificial reference genome is the SNP genome comprising SEQ ID NO:7-62.Only uniqueness is mapped to the reading of this artificial gene group for analyzing fetus mark.Mate the genomic reading of SNP completely can be regarded as label and filter.In residual readings, the reading only with one or two mispairing be can be regarded as label and is included in analysis.The label of each be mapped in polymorphic allele is counted, and the fetus mark number that is defined as the label being mapped to main allele (i.e. maternal allele) and the ratio of number of label being mapped to time allele (i.e. foetal allele).
Example 24
Select autosome SNP to determine fetus mark
One group of 28 autosome SNP be selected from 92 SNP inventory (people such as Parkes, human genetics 127:315-324 [2010]) and be selected from the Life Technologies that Web address is appliedbiosystems.com tMthe applying biological system in (Carlsbad, CA city).Primer is designed to a sequence hybridization close to the SNP site on cfDNA to guarantee that this SNP site is included in by carrying out on Yi Lu meter Na analyser GII in 36bp reading that extensive parallel order-checking produces, and produces the amplicon that length is enough to carry out bridge amplification between cluster Formation period.Therefore, primer is designed to produce the amplicon of at least 110bp, and these amplicons produce the DNA molecular of at least 200bp when combining with General adaptive increased for cluster (Illumina Inc. of San Diego, CA city).Identify primer sequence, and by integrated DNA technique (Santiago, California) synthetic primer set (i.e. forward and reverse primer) and storing with 1 μM of solution form, be ready to use in as described in example 25 to 27, increase polymorphic target sequence.Table 33 provide RefSNP (rs) deposit identification number, for the target cfDNA sequence that increases primer and comprise the sequence of the allelic amplicon of possible SNP that will these primers used to produce.The SNP provided in table 33 is for 13 target sequences that increase in a multiple check simultaneously.The group provided in table 33 is an exemplary SNP group.Less or more SNP can be adopted to come for polymorphic target nucleic acid enriches fetal and mother body D NA.Operable extra SNP is included in the SNP provided in table 34.SNP allele runic is shown and is underlined.Can be used for method according to the present invention and determine that other extra SNP of fetus mark comprise rs315791, rs3780962, rs1410059, rs279844, rs38882, rs9951171, rs214955, rs6444724, rs2503107, rs1019029, rs1413212, rs1031825, rs891700, rs1005533, rs2831700, rs354439, rs1979255, rs1454361, rs8037429 and rs1490413, by TaqMan PCR for determining these SNP of fetus Fraction analysis, and be disclosed in U.S. Provisional Application table 61/296, 358 and 61/360, in 837.
Table 33
For determining the SNP group of fetus mark
Table 34
For determining the extra SNP of fetus mark
Example 25
Fetus mark is determined by carrying out extensive parallel order-checking to target library
In order to determine the cfDNA mark of fetus in maternal sample, each polymorphic nucleotide sequence of target comprising SNP to be increased and for the preparation of the target library of checking order with extensive parallel model.
Extraction cfDNA described above.Target sequencing library is prepared as follows.The cfDNA comprised in the purified cfDNA of 5 μ l is increased in the 50 μ l reaction volumes comprising 7.5 μ l, 1 μM of primer mixture (table 1), the 10 main mixed liquors of μ l NEB 5X and 27 μ l water.Use following cycling condition, carry out thermal cycle with Gene Amp9700 (applying biological system): at 95 DEG C, hatch 1 minute, at 95 DEG C 20 seconds then, at 68 DEG C 1 minute, and at 68 DEG C 30 seconds, circulate 20 to 30 times, then at 68 DEG C, finally hatch 5 minutes.At finally remaining on 4 DEG C, until be the incorporating aspects and shift out sample of not increasing with purified cfDNA sample.Use An Jinkete AMPure XP PCR purification system (Item Number A63881; Beckman Ku Erte genome, Dan Fusi, Massachusetts) purifying is carried out to the product through amplification.At finally remaining on 4 DEG C, until shift out for preparing target library.Analyze (Agilent technology company, Sen Niweier city, California (Sunnyvale, CA)) with 2100 bioanalysis devices pass through the product of amplification and determine the concentration through the product increased.Sequencing library through the target nucleic acid of amplification is prepared as described in example 23, and use by the synthetic method order-checking of reversible dye-terminators and according to Yi Lu meter Na stipulations (BioTechniques.RTM. stipulations guide 2007 disclosed in Dec, 2006 the 29th page, and at WWW biotechniques.com/default.asp? page=protocol & subsection=article_display & id=112378) check order with extensive parallel model.As described in, analyze being mapped to the genomic label of reference be made up of 26 sequences comprising SNP (13 is right, and each is to expression two allele) (i.e. SEQ ID NO:7-32) and counting.
Table 35 provides from the obtained label counting that checks order to target library, and from the fetus mark calculated that sequencing data obtains.
Table 35
Fetus mark is determined by carrying out extensive parallel order-checking to polymorphic nucleic acid library
Result shows, each polymorphic nucleotide sequence comprising at least one SNP can from the cfDNA amplification deriving from Maternal plasma sample, to construct a library, the mark that checks order to determine fetal nucleic acid in maternal sample can be carried out by extensive parallel model in this library.
Example 26
Fetus mark is determined after fetus and maternal nucleic acids enrichment in cfDNA sequencing library sample.
In order to the fetus that comprises in the elementary sequencing library that enrichment uses purified fetus and parent cfDNA to construct and parent cfDNA, use a part for purified cfDNA sample to the polymorphic target nucleic acid sequence that increases, and prepare the sequencing library of polymorphic target nucleic acid increased, this sequencing library is in order to the fetus that comprises in this primary libraries of enrichment and maternal nucleic acids sequence.
The method corresponds to workflow illustrated in Figure 10.As described in example 23, prepare target sequencing library from a part of purified cfDNA.As described in example 23, the remainder of purified cfDNA is used to prepare elementary sequencing library.By elementary and target sequencing library are diluted to 10nM, and by target library and primary libraries with 1: 9 ratio combine to provide the sequencing library of enrichment, realize for the polymorphic nucleic acid through amplification comprised in target library the enrichment of primary libraries.As described in example 23, checked order in the library of enrichment and sequencing data is analyzed.
Table 36 provides the number of the genomic sequence label of the SNP being mapped to informedness SNP, and these informedness SNP are by checking order to deriving from each enriched library of nourishing the plasma sample of the pregnant woman of T21, T13, T18 and monosomy X fetus accordingly and identify.Fetus mark calculates as follows:
Allele xfetus mark %=((∑ allele xfoetal sequence label)/(∑ allele xparental sequences label)) × 100
Table 36 additionally provides and is mapped to the number of the mankind with reference to genomic sequence label.Use with for determining the plasma sample that corresponding fetus mark is identical, use be mapped to the mankind with reference to genomic label to determine presence or absence aneuploidy.Use sequence label counting to determine that the method for aneuploidy is described in U.S. Provisional Application 61/407,017 and 61/455, in 849778, these applications are incorporated into this in full with it by reference.
Table 36 determines fetus mark by carrying out extensive parallel order-checking to the enriched library of polymorphic nucleic acid
Example 27
Fetus mark is determined by extensive parallel order-checking:
For the fetus of polymorphic nucleic acid and the enrichment of maternal nucleic acids in purified cfDNA sample.
The fetus comprised the purification of samples of the cfDNA gone out from Maternal plasma sample extraction in order to enrichment and parent cfDNA, use a part of purified cfDNA to the polymorphic target nucleic acid sequence that increases, each polymorphic target nucleic acid sequence comprises the SNP that is selected from the SNP group provided in table 33.
The method corresponds to workflow illustrated in Fig. 9.As described in example 22, obtain cell-free plasma from maternal blood sample, and from plasma sample purifying cfDNA.Determining ultimate density is 92.8pg/ μ l..The cfDNA comprised in the purified cfDNA of 5 μ l is increased in the 50 μ l reaction volumes comprising 7.5 μ l, 1 μM of primer mixture (table 1), the 10 main mixed liquors of μ l NEB 5X and 27 μ l water.Thermal cycle is carried out with Gene Amp9700 (applying biological system).Use following cycling condition: at 95 DEG C, hatch 1 minute, at 95 DEG C 20 seconds then, at 68 DEG C 1 minute, and at 68 DEG C 30 seconds, circulate 30 times, then at 68 DEG C, finally hatch 5 minutes.At finally remaining on 4 DEG C, until be the incorporating aspects and shift out sample of not increasing with purified cfDNA sample.Use An Jinkete AMPure XP PCR purification system (Item Number A63881; Beckman Ku Erte genome, Dan Fusi, Massachusetts) purifying is carried out to the product through amplification, and use Nanodrop 2000 (the silent science and technology (Thermo Scientific) of match, Wilmington, the Delaware State) quantize concentration.By purified amplified production in water 1: 10 dilution and 0.9 μ l (371pg) adds in 40 μ l purified cfDNA sample additional to obtain 10%.The fetus of enrichment existing in purified cfDNA sample and parent cfDNA for the preparation of sequencing library, and check order as described in example 22.
Table 37 provide for chromosome 21,18,13, each label counting obtained in X and Y, i.e. sequence label density, and for the label counting that SNP obtains with reference to the informedness polymorphic sequence comprised in genome, i.e. SNP label densities.Data show that order-checking information can obtain by checking order to the single library by purified parent cfDNA sample arrangement, and the enrichment of this parent cfDNA sample comprises the sequence of SNP, to determine presence or absence aneuploidy and fetus mark simultaneously.As U.S. Provisional Application 61/407,017 and 61/455, described in 849, use the number determination presence or absence aneuploidy being mapped to chromosomal label.In given example, data show that the mark of foetal DNA in plasma sample AFR105 can quantize from five informedness SNP sequencing results and be defined as 3.84%.For chromosome 21,13,18, X and Y, sequence label density is provided.
This example shows, enrichment stipulations are for providing required label counting by single sequencing procedure determination aneuploidy and fetus mark.
Table 37
Fetus mark is determined by extensive parallel order-checking:
For polymorphic nucleic acid enriching fetus and maternal nucleic acids in purified cfDNA sample
Example 28
Fetus mark is determined by the Capillary Electrophoresis of the polymorphic sequence comprising STR
For determining the fetus mark comprised in the maternal sample of fetus and parent cfDNA, from the volunteer pregnant woman nourishing sex fetus, collect peripheral blood sample.As described in example 22, obtain and process peripheral blood sample to provide purified cfDNA.
Use Amp miniFiler tMpcr amplification kit (applying biological system, Foster city, California), according to manufacturer specification, analyzes ten microlitre cfDNA samples.Briefly, the cfDNA be included in 10 μ l is comprising the fluorescently-labeled primer (Amp of 5 μ l miniFiler tMprimer set) and Amp miniFiler tMincrease in 25 μ l reaction volumes of main mixed liquor, this Amp miniFiler tMmain mixed liquor comprises AmpliTaq archaeal dna polymerase and relevant buffers, salt (1.5 mM MgCl 2) and 200 μMs of deoxynucleotide triphosphates (dNTP:dATP, dCTP, dGTP and dTTP).Fluorescently-labeled primer uses 6FAM tM, VIC tM, NED tM, and PET tMdyestuff carries out the forward primer marked.Use following cycling condition, carry out thermal cycle with Gene Amp9700 (applying biological system): at 95 DEG C, hatch 10 minutes, at 94 DEG C 20 seconds then, at 59 DEG C 2 minutes, and at 72 DEG C 1 minute, circulate 30 times, then at 60 DEG C, finally hatch 45 minutes.At finally remaining on 4 DEG C, until for carrying out analyzing and shifting out sample.By preparing the product through amplification at 8.7 μ l Hi-DiTM formamides (applying biological system) and 0.3 μ l GeneScanTM-500 LIZ inside dimension standard (applying biological system) middle dilution 1 μ l through the product increased, and usage data collects HID_G5_POP4 (applying biological system) and 36cm capillary array, analyzes with ABIPRISM3130xl Genetic Analyser (applying biological system).All Genotypings all use GeneMapper_ID v3.2 software (applying biological system), use the allelic ladder (allelic ladders) that provides of manufacturer and data box and group to carry out.
All Genotypings are measured all on applying biological system 3130xl Genetic Analyser, use the size ± 0.5-nt " window " obtained for each allele to perform, to allow detect and correct allelic comparison.Any sample allele of size outside ± 0.5-nt window is defined as OL, i.e. " (Off Ladder) outside somatotype reference material ".OL allele is that size is at Amp miniFiler tMthe allele do not showed in allelic ladder, or not corresponding with allelic ladder, but make size due to measuring error just in time at allele outside window.Minimum peak height threshold > 50RFU is arranged based on confirmatory experiment, performs these confirmatory experiments to avoid carrying out somatotype when stochastic effects may disturb the accurate reading of potpourri.The calculating of fetus mark is averaged based on by all informedness labels.Informedness label identifies by there is the peak value fallen into in the parameter of the initialize data case of analyzed STR on electrophoretogram.
Use and according to main allele on the determined each str locus seat of in triplicate injection and time allelic average peak height, fetus mark is calculated.The rule being applicable to this calculating is:
1. for allele (OL) data outside the allelic somatotype reference material not included in calculating; And
2. the peak height only obtained by > 50RFU (Relative fluorescence units) comprises in the calculation.
If 3. only have data box to exist, then label is considered to non-information; And
If 4. determined second data box, but the peak value of the first and second data boxes in peak height in 50% to 70% of its Relative fluorescence units (RFU), then do not measure the mark of minority and this label is not considered to informational.
For any informedness label provided secondary allelic mark by by the peak height of accessory constituent divided by the peak height of key component with calculate, and be expressed as number percent, be first calculated as each information gene seat
Fetus mark=(∑ time allelic peak height/allelic peak height of ∑ master) X100,
The fetus mark comprising the sample of two or more informedness STR will be calculated as the mean value of the fetus mark calculated for two or more informedness labels.
Table 38 provides analyzes obtained data to the cfDNA of the experimenter nourishing male fetus.
Table 38
By analyzing the fetus mark that STR determines in the cfDNA of pregnant subject
Result shows, cfDNA can be used for determining presence or absence foetal DNA, as on one or more STR allele accessory constituent detection indicated by, for determining fetus percentage fractional, and for determining sex of foetus, indicated by presence or absence Amelogenin allele.

Claims (23)

1. one for determining the kit of fetus mark, described kit comprises a box body (1), be arranged in the multiple clamping slots for settling multiple bottles in this box body, comprise a bottle (2) of an internal positive control, comprise and be applicable to tracking and the bottle (3) determining a label nucleic acid of sample integrity, and comprise an a kind of bottle (4) of buffer solution, wherein said kit comprises multiple bottle further, each in wherein said multiple bottle comprises a kind of different internal positive control and/or a kind of different label nucleic acid.
2. kit as claimed in claim 1, wherein bottle (2) comprises two or more internal positive controls.
3. kit as claimed in claim 1, wherein said internal positive control comprises the trisomy that is selected from lower group, and this group is made up of the following: trisomy 21, trisomy 18, trisomy 21, trisomy 13, trisomy 16, trisomy 13, trisomy 9, trisomy 8, trisomy 22, XXX, XXY and XYY.
4. kit as claimed in claim 1, wherein said internal positive control comprises a kind of trisomy being selected from lower group, and this group is made up of the following: trisomy 21 (T21), a kind of trisomy 18 (T18) and a kind of trisomy 13 (T13).
5. kit as claimed in claim 4, the positive control be wherein carried in this bottle (2) comprises trisomy 21 (T21), trisomy 18 (T18) and trisomy 13 (T13).
6. kit as claimed in claim 1, this positive control be wherein carried in this bottle (2) comprises amplification or the disappearance of long-armed or galianconism any one or more in chromosome 1-22, X and Y.
7. kit as claimed in claim 6, this positive control be wherein carried in this bottle (2) comprises amplification or the disappearance of the one or more arms being selected from lower group, and this group is made up of the following: 1q, 3q, 4p, 4q, 5p, 5q, 6p, 6q, 7p, 7q, 8p, 8q, 9p, 9q, 10p, 10q, 12p, 12q, 13q, 14q, 16p, 17p, 17q, 18p, 18q, 19p, 19q, 20p, 20q, 21q and 22q.
8. kit as claimed in claim 1, this positive control be wherein carried in this bottle (2) comprises the amplification in the region being selected from lower group, and this group is made up of the following: 20Q13,19q12,1q21-1q23,8p11-p12 and ErbB2.
9. kit as claimed in claim 1, the positive control be wherein carried in this bottle (2) comprises the amplification of lower list 3, table 4, table 5 and the region of shown in table 6 or a gene:
Table 3:
Table 4:
Table 5:
Table 6:
10. kit as claimed in claim 1, this positive control be wherein carried in this bottle (2) comprises comprising and is selected from a region of lower group or the amplification of a gene, and this group is made up of the following: MYC, ERBB2, CCND1, FGFR1, FGFR2, HRAS, KRAS, MYB, MDM2, CCNE, KRAS, MET, ERBB1, CDK4, MYCB, ERBB2, AKT2, MDM2 and CDK4.
11. kits according to any one of claim 1-10, wherein said label nucleic acid is antigene strand label sequence.
12. kits as claimed in claim 11, the length of wherein said label sequence is in the length range from about 30bp to about 600bp.
13. kits as claimed in claim 11, the length of wherein said label sequence is in the length range from about 100bp to about 400bp.
14. kits according to claim 11, comprise at least two or at least three or at least four or at least five or at least six or at least seven or at least eight or at least nine or at least ten or at least 11 or at least 12 or at least 13 or at least 14 or at least 15 or at least 16 or at least 17 or at least 18 or at least 19 or at least 20 or at least 25 or at least 30 or at least 35 or at least 40 or at least 50 bottles further for different label sequences.
15. kits according to claim 11, wherein said label is integrated in described contrast.
16. kits according to claim 11, wherein said label is integrated in an aptamer.
17. kits according to any one of claim 1-10, wherein one or more order-checking aptamers are loaded in this bottle (3) further.
18. kits as claimed in claim 17, wherein said order-checking aptamer comprises multiple order-checking aptamer of indexing.
19. kits as claimed in claim 18, wherein said aptamer comprises a sub-thread arm, and this sub-thread arm comprises an index sequence and one or more PCR priming site.
20. kits according to any one of claim 1-10, wherein said kit comprises a sample collection device for collecting a biological sample further.
21. kits as claimed in claim 20, wherein said sample collection device comprises a device (5) for collecting blood and a container (6) for holding blood.
22. kits according to any one of claim 1-10, wherein said kit comprises one further and is loaded with the bottle (7) that multiple DNA extracts reagent.
23. kits according to any one of claim 1-10, wherein said kit comprises a bottle (8) of the plurality of reagents be loaded with for the preparation of sequencing library further.
CN201220583608.8U 2012-04-12 2012-11-07 For determining the kit of fetus mark Expired - Lifetime CN204440396U (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US13/445,778 US9447453B2 (en) 2011-04-12 2012-04-12 Resolving genome fractions using polymorphism counts
US13/445,778 2012-04-12
US13/482,964 2012-05-29
US13/482,964 US20120270739A1 (en) 2010-01-19 2012-05-29 Method for sample analysis of aneuploidies in maternal samples
US13/555,037 2012-07-20
US13/555,037 US9260745B2 (en) 2010-01-19 2012-07-20 Detecting and classifying copy number variation

Publications (1)

Publication Number Publication Date
CN204440396U true CN204440396U (en) 2015-07-01

Family

ID=49460351

Family Applications (4)

Application Number Title Priority Date Filing Date
CN201710644858.5A Pending CN107435070A (en) 2012-04-12 2012-11-07 Copy the detection and classification of number variation
CN201810154581.2A Active CN108485940B (en) 2012-04-12 2012-11-07 Detection and classification of copy number variation
CN201220583608.8U Expired - Lifetime CN204440396U (en) 2012-04-12 2012-11-07 For determining the kit of fetus mark
CN201210441134.8A Active CN103374518B (en) 2012-04-12 2012-11-07 Copy the detection and classification of number variation

Family Applications Before (2)

Application Number Title Priority Date Filing Date
CN201710644858.5A Pending CN107435070A (en) 2012-04-12 2012-11-07 Copy the detection and classification of number variation
CN201810154581.2A Active CN108485940B (en) 2012-04-12 2012-11-07 Detection and classification of copy number variation

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201210441134.8A Active CN103374518B (en) 2012-04-12 2012-11-07 Copy the detection and classification of number variation

Country Status (1)

Country Link
CN (4) CN107435070A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113684277A (en) * 2021-09-06 2021-11-23 南方医科大学南方医院 Method for predicting ovarian cancer homologous recombination defect based on biomarker of genome copy number variation and application
CN114507904A (en) * 2022-04-19 2022-05-17 北京迅识科技有限公司 Method for preparing second-generation sequencing library

Families Citing this family (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2970916A1 (en) * 2014-03-14 2015-09-17 Caredx, Inc. Methods of monitoring immunosuppressive therapies in a transplant recipient
JP6659672B2 (en) * 2014-05-30 2020-03-04 ベリナタ ヘルス インコーポレイテッド Detection of fetal chromosome partial aneuploidy and copy number variation
CN104152553B (en) * 2014-07-21 2016-11-23 上海交通大学 A kind of auxiliary diagnoses the test kit whether fetus to be measured is mongolism patient
CN114181997A (en) * 2014-12-12 2022-03-15 维里纳塔健康股份有限公司 Determination of copy number variation using cell-free DNA fragment size
US10395759B2 (en) * 2015-05-18 2019-08-27 Regeneron Pharmaceuticals, Inc. Methods and systems for copy number variant detection
JP2018517421A (en) * 2015-06-15 2018-07-05 マードック チルドレンズ リサーチ インスティチュート How to measure chimerism
CN107922973B (en) * 2015-07-07 2019-06-14 远见基因组系统公司 Method and system for the modification detection based on sequencing
EP3347466B1 (en) 2015-09-08 2024-01-03 Cold Spring Harbor Laboratory Genetic copy number determination using high throughput multiplex sequencing of smashed nucleotides
WO2017044885A1 (en) * 2015-09-09 2017-03-16 uBiome, Inc. Method and system for microbiome-derived diagnostics and therapeutics for conditions associated with cerebro-craniofacial health
CN108026576B (en) * 2015-09-22 2022-06-28 香港中文大学 Accurate quantification of fetal DNA fraction by shallow depth sequencing of maternal plasma DNA
WO2017106768A1 (en) * 2015-12-17 2017-06-22 Guardant Health, Inc. Methods to determine tumor gene copy number by analysis of cell-free dna
US10095831B2 (en) 2016-02-03 2018-10-09 Verinata Health, Inc. Using cell-free DNA fragment size to determine copy number variations
WO2018009723A1 (en) * 2016-07-06 2018-01-11 Guardant Health, Inc. Methods for fragmentome profiling of cell-free nucleic acids
RU2674700C2 (en) * 2016-12-30 2018-12-12 Общество с ограниченной ответственностью "Научно-производственная фирма ДНК-Технология" (ООО "НПФ ДНК-Технология") Method of determining the source of aneuploid cells on the blood of a pregnant woman
JP7009518B2 (en) * 2017-06-20 2022-01-25 イルミナ インコーポレイテッド Methods and systems for the degradation and quantification of DNA mixtures from multiple contributors of known or unknown genotypes
CN108427864B (en) * 2018-02-14 2019-01-29 南京世和基因生物技术有限公司 A kind of detection method, device and computer-readable medium copying number variation
CN110656159B (en) * 2018-06-28 2024-01-09 深圳华大生命科学研究院 Copy number variation detection method
EP3827094A1 (en) * 2018-07-24 2021-06-02 Affymetrix, Inc. Array based method and kit for determining copy number and genotype in pseudogenes
CN110880356A (en) * 2018-09-05 2020-03-13 南京格致基因生物科技有限公司 Method and apparatus for screening, diagnosing or risk stratification for ovarian cancer
CN109628579B (en) * 2019-01-13 2022-11-15 清华大学 Detection method for determining whether chromosome number in biological sample is abnormal
JP2022534634A (en) * 2019-06-03 2022-08-03 イルミナ インコーポレイテッド Detection limit-based quality control metrics
CN110373477B (en) * 2019-07-23 2021-05-07 华中农业大学 Molecular marker cloned from CNV fragment and related to porcine ear shape character
CN110317877A (en) * 2019-08-02 2019-10-11 苏州宏元生物科技有限公司 Application of the unstable variation of one group chromosome in preparation diagnosis bladder transitional cell carcinoma, the reagent or kit of assessing prognosis
CN110452985A (en) * 2019-08-02 2019-11-15 苏州宏元生物科技有限公司 Application of the unstable variation of one group chromosome in the reagent or kit for preparing diagnosing liver cancer, assessment prognosis
CN111105844B (en) * 2019-11-22 2023-06-06 广州金域医学检验集团股份有限公司 Somatic cell mutation classification method, apparatus, device, and readable storage medium
CN111394474B (en) * 2020-03-24 2022-08-16 西北农林科技大学 Method for detecting copy number variation of GAL3ST1 gene of cattle and application thereof
CN111476497B (en) * 2020-04-15 2023-06-16 浙江天泓波控电子科技有限公司 Distribution feed network method for miniaturized platform
CN111948394B (en) * 2020-08-10 2023-07-28 山西医科大学 Application of TSTA3 and LAMP2 as targets in esophageal squamous carcinoma cell metastasis detection and drug screening
CN112322722B (en) * 2020-11-13 2021-11-12 上海宝藤生物医药科技股份有限公司 Primer probe composition and kit for detecting 16p11.2 microdeletion and application thereof
CN112614548B (en) * 2020-12-25 2021-08-03 北京吉因加医学检验实验室有限公司 Method for calculating sample database building input amount and database building method thereof
CN113462768B (en) * 2021-07-29 2023-05-30 中国医学科学院整形外科医院 Primer and kit for detecting copy number of ECR region of small ear deformity patient by ddPCR
CN114093417B (en) * 2021-11-23 2022-10-04 深圳吉因加信息科技有限公司 Method and device for identifying chromosomal arm heterozygosity loss

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2001274869A1 (en) * 2000-05-20 2001-12-03 The Regents Of The University Of Michigan Method of producing a dna library using positional amplification
AU2001271623A1 (en) * 2000-06-30 2002-01-14 Incyte Genomics, Inc. Human extracellular matrix (ecm)-related tumor marker
EP2261372B1 (en) * 2003-01-29 2012-08-22 454 Life Sciences Corporation Methods of amplifying and sequencing nucleic acids
EP2952589B1 (en) * 2008-09-20 2018-02-14 The Board of Trustees of The Leland Stanford Junior University Noninvasive diagnosis of fetal aneuploidy by sequencing
EP2883965B8 (en) * 2010-01-19 2018-06-27 Verinata Health, Inc Method for determining copy number variations
US20130203606A1 (en) * 2010-02-25 2013-08-08 Advanced Liquid Logic Inc Method of Preparing a Nucleic Acid Library
CN102409043B (en) * 2010-09-21 2013-12-04 深圳华大基因科技服务有限公司 Method for constructing high-flux and low-cost Fosmid library, label and label joint used in method
CN102127818A (en) * 2010-12-15 2011-07-20 张康 Method for creating fetus DNA library by utilizing peripheral blood of pregnant woman

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113684277A (en) * 2021-09-06 2021-11-23 南方医科大学南方医院 Method for predicting ovarian cancer homologous recombination defect based on biomarker of genome copy number variation and application
CN113684277B (en) * 2021-09-06 2022-05-17 南方医科大学南方医院 Method for predicting ovarian cancer homologous recombination defect based on biomarker of genome copy number variation and application
CN114507904A (en) * 2022-04-19 2022-05-17 北京迅识科技有限公司 Method for preparing second-generation sequencing library

Also Published As

Publication number Publication date
CN103374518B (en) 2018-03-27
CN108485940B (en) 2022-01-28
CN103374518A (en) 2013-10-30
CN108485940A (en) 2018-09-04
CN107435070A (en) 2017-12-05

Similar Documents

Publication Publication Date Title
CN204440396U (en) For determining the kit of fetus mark
US11875899B2 (en) Analyzing copy number variation in the detection of cancer
US11697846B2 (en) Detecting and classifying copy number variation
US20200219588A1 (en) Detecting and classifying copy number variation
KR102184868B1 (en) Using cell-free dna fragment size to determine copy number variations
US9411937B2 (en) Detecting and classifying copy number variation
EP2877594B1 (en) Detecting and classifying copy number variation in a fetal genome
AU2015360298B2 (en) Using cell-free DNA fragment size to determine copy number variations
US9323888B2 (en) Detecting and classifying copy number variation
CN103003447B (en) Method for determining the presence or absence of different aneuploidies in a sample
AU2019200163B2 (en) Detecting and classifying copy number variation
AU2019200162B2 (en) Detecting and classifying copy number variation

Legal Events

Date Code Title Description
C14 Grant of patent or utility model
GR01 Patent grant
CX01 Expiry of patent term
CX01 Expiry of patent term

Granted publication date: 20150701