CN103374518A - Detecting and classifying copy number variation - Google Patents

Detecting and classifying copy number variation Download PDF

Info

Publication number
CN103374518A
CN103374518A CN2012104411348A CN201210441134A CN103374518A CN 103374518 A CN103374518 A CN 103374518A CN 2012104411348 A CN2012104411348 A CN 2012104411348A CN 201210441134 A CN201210441134 A CN 201210441134A CN 103374518 A CN103374518 A CN 103374518A
Authority
CN
China
Prior art keywords
karyomit
sequence
interested
fetus
dosage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012104411348A
Other languages
Chinese (zh)
Other versions
CN103374518B (en
Inventor
里查德·P·拉瓦
阿奴巴玛·斯里尼瓦桑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Verinata Health Inc
Original Assignee
Verinata Health Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US13/445,778 external-priority patent/US9447453B2/en
Priority claimed from US13/482,964 external-priority patent/US20120270739A1/en
Priority claimed from US13/555,037 external-priority patent/US9260745B2/en
Application filed by Verinata Health Inc filed Critical Verinata Health Inc
Priority to CN201810154581.2A priority Critical patent/CN108485940B/en
Priority to CN201710644858.5A priority patent/CN107435070A/en
Publication of CN103374518A publication Critical patent/CN103374518A/en
Application granted granted Critical
Publication of CN103374518B publication Critical patent/CN103374518B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1096Processes for the isolation, preparation or purification of DNA or RNA cDNA Synthesis; Subtracted cDNA library construction, e.g. RT, RT-PCR
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Wood Science & Technology (AREA)
  • Genetics & Genomics (AREA)
  • Zoology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Analytical Chemistry (AREA)
  • Biophysics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Immunology (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Plant Pathology (AREA)
  • General Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The maternal DNA background in maternal samples restricts detection for distinguishing fetal DNA from maternal cfDNA in samples. Thereby fetal fraction is an important parameter for diagnosis and conventional detection to quantitative and/or substantial differences between fetal and maternal DNA. The invention provides methods for determining aneuploidy and/or fetal fraction in maternal samples comprising fetal and maternal cfDNA by massively parallel sequencing. The method comprises a novel protocol for preparing sequencing libraries that unexpectedly improves the quality of library DNA while expediting the process of analysis of samples for prenatal diagnoses. The invention also provides apparatus and kits employing the method.

Description

Detection and the classification of copy number variation
Background technology
One of key effort in the physianthropy research is to have found the extremely important hereditary disorder to the adverse health result.Under many circumstances, identified specific gene and/or key diagnostic marker in genomic a plurality of parts, they exist with unusual copy number.For example, in antenatal diagnosis, whole chromosomal copy extra or that lose is recurrent genetic damage.In cancer, the higher levels of amplification of the copy of whole karyomit(e) or chromosome segment disappearance or multiplication and genome specific region is common situation.
Provide the most information that makes a variation about copy number by allowing to identify structural unusual cytogenetics resolving power.The multiple conventional procedure that is used for genetic screening and biological dosimetry has utilized invasive program (for example amniocentesis) to obtain the cell for karyotyping.Recognize the needs to the rapider testing method that does not need cell cultures, developed fluorescence in situ hybridization (FISH), quantitative fluorescence PCR (QF-PCR) and array-comparative genome hybridization (array-CGH) be used as for the molecular cytogenetics method of analyzing the copy number variation.
Allow the discovery of the appearance of the technology that within a short period of time whole genome checked order and circulation Cell-free DNA (cfDNA) to provide chance to have chromosomal inheritance material to be compared to compare with the karyomit(e) of another genetic material with being derived from one, and less than the risk relevant with the invasive sampling process.Yet, the multiple restriction of existing method (they comprise the susceptibility of the deficiency of the cfDNA that comes from limited level) and the order-checking deviation of technology that comes from the inherent nature of genomic information have determined the persistence demand for non invasive method, these non invasive methods will provide in specificity, susceptibility and the suitability each or all, in order in the various clinical environment, determine reliably the variation of copy number.
Embodiment disclosed here has satisfied some in the above demand, and particularly provided a kind of advantage aspect a kind of reliable method providing, the method is applicable at least to implement Non-invasive Prenatal Diagnosis and is applicable to diagnose and guards transitivity progress in the cancer patient.
General introduction
The performance constraint that mother body D NA background in the maternal sample all has susceptibility to any detection of attempting differentiation fetal chromosomal from the parent genome of sample.Therefore, for the quantitative differences between dependence fetus and the parent genome and/or diagnosis and the conventional sense of essence difference, the fetus mark is the important parameter that needs consideration.The invention provides a kind of method of the fetus mark for determining maternal sample.The method obtains the function of fetus mark as normalization method karyomit(e) value or normalization method chromosomal region segment value.The present invention can be combined with additive method for the method for determining the fetus mark, for example with the fetus mark combine as the method that the function of allelotrope unbalance information in the polymorphism obtains, the copy number variation of the fetal chromosomal in the maternal sample or chromosome segment is classified.The present invention also provides equipment and the test kit of implementing described method.
Supplied several different methods to be used for determining in the specimen that comprises nucleic acid mixture the copy number variation (CNV) of sequence interested, these nucleic acid amount in interested one or more sequences known or under a cloud is different.This method comprises a kind of statistical, and this statistical method will be taken into account from the cumulative bad variability of process variability relevant, between interchromosomal and sequence.The method is applicable to determine the CNV of any fetus dysploidy, and multiple CNV known or that suspection is relevant with the plurality of medical condition.Comprise any or a plurality of trisomy or monosomy among karyomit(e) 1-22, X and the Y according to the confirmable CNV of present method, other chromosomal polysomies, and the disappearance of any or a plurality of section in these karyomit(e)s and/or copy, these can only carry out once sequencing and detect by the nucleic acid to specimen.Only carrying out the order-checking information that once sequencing obtains and to determine any dysploidy from the nucleic acid by specimen.
A kind of method is provided in one embodiment, and the method is used for determining to exist or do not exist any four kinds or more kinds of different, complete fetal chromosomal dysploidy in the parent specimen that comprises fetus and parent nucleic acid.The step of the method comprises: (a) obtain fetus and sequence information parent nucleic acid in the parent specimen; (b) come to identify the sequence label of certain number in interested any four or the more karyomit(e) that are selected from karyomit(e) 1-22, X and Y each with this sequence information, and identify the sequence label of certain number for each a normalization method chromosome sequence that is used for described interested any four or more karyomit(e)s; (c) use the number of the described sequence label that identifies for the number of each the described sequence label that identifies in described interested any four or the more karyomit(e) and for each described normalization method chromosome sequence to come for each calculates a monosome dosage in described interested any four or the more karyomit(e); And (d) will in described interested any four or the more karyomit(e) each each described monosome dosage with compare for each the threshold value in described interested any four or the more karyomit(e), and determine thus in this parent specimen, to exist or do not exist any four kinds or more kinds of complete, different fetal chromosomal dysploidy.Step (a) can comprise at least a portion in these nucleic acid of a specimen is checked order, to obtain for the fetus of specimen and the described sequence information of parent nucleic acid molecule.In some embodiments, step (c) comprises for each described interested karyomit(e) calculates a monosome dosage, as the number of this sequence label that identifies for each described interested karyomit(e) and the ratio of this sequence label number that identifies for each described interested chromosomal described normalization method chromosome sequence.In some other embodiments, step (c) comprising: (i) number by making this sequence label that identifies for each described interested karyomit(e) in step (b) and each described interested chromosomal length carry out relatedly calculating a sequence label density ratio for each described interested karyomit(e); (ii) number by making this sequence label that identifies for each described normalization method chromosome sequence in step (b) and the length of each described normalization method chromosome sequence carry out relatedly calculating a sequence label density ratio for each described normalization method chromosome sequence; And (iii) use in step (i) and these sequence label density that calculate (ii) recently calculate a monosome dosage for each described interested karyomit(e), wherein this karyomit(e) dosage is as calculating for each described interested chromosomal sequence label density ratio and for the ratio of the sequence label density ratio of each described interested chromosomal described normalization method chromosome sequence.
Provide in another embodiment a kind of method to be used for determining to exist or do not exist any four kinds or more kinds of different, complete fetal chromosomal dysploidy in the parent specimen that comprises fetus and parent nucleic acid.The step of the method comprises: (a) obtain the sequence information for the fetus in the parent specimen and parent nucleic acid; (b) come to identify the sequence label of certain number and to identify the sequence label of certain number for each a normalization method chromosome sequence that is used for described interested any four or more karyomit(e)s in interested any four or the more karyomit(e) that are selected from karyomit(e) 1-22, X and Y each with described sequence information; (c) use the number of the described sequence label that identifies for the number of each the described sequence label that identifies in described interested any four or the more karyomit(e) and for each described normalization method chromosome sequence to come for each calculates a monosome dosage in described interested any four or the more karyomit(e); And (d) will in described interested any four or the more karyomit(e) each each described monosome dosage with compare for each the threshold value in described interested any four or the more karyomit(e), and determine thus in this parent specimen to exist or do not have any four kinds or more kinds of complete, different fetal chromosomal dysploidy, wherein be selected from karyomit(e) 1-22, X, and described interested any four or the more karyomit(e) of Y comprise and are selected from karyomit(e) 1-22, X, and at least two ten karyomit(e)s of Y, and wherein determined to exist or do not exist at least two ten kinds different, complete fetal chromosomal dysploidy.Step (a) can comprise at least a portion in these nucleic acid of specimen is checked order, to obtain for the fetus of this specimen and the described sequence information of parent nucleic acid molecule.In some embodiments, step (c) comprises for each described interested karyomit(e) calculates a monosome dosage, as the number of this sequence label that identifies for each described interested karyomit(e) and the ratio of this sequence label number that identifies for each described interested chromosomal described normalization method chromosome sequence.In some other embodiments, step (c) comprising: (i) number by making this sequence label that identifies for each described interested karyomit(e) in step (b) and each described interested chromosomal length carry out relatedly calculating a sequence label density ratio for each described interested karyomit(e); (ii) number by making this sequence label that identifies for each described normalization method chromosome sequence in step (b) and the length of each described normalization method chromosome sequence carry out relatedly calculating a sequence label density ratio for each described normalization method chromosome sequence; And (iii) use in step (i) and these sequence label density that calculate (ii) recently calculate a monosome dosage for each described interested karyomit(e), wherein said karyomit(e) dosage is as calculating for each described interested chromosomal sequence label density ratio and for the ratio of the sequence label density ratio of each described interested chromosomal described normalization method chromosome sequence.
A kind of method is provided in another embodiment, has been used for determining to exist or do not exist any four kinds or more kinds of different, complete fetal chromosomal dysploidy in the parent specimen that comprises fetus and parent nucleic acid.The step of the method comprises: (a) obtain the sequence information for the described fetus in the parent specimen and parent nucleic acid; (b) come to identify the sequence label of certain number in interested any four or the more karyomit(e) that are selected from karyomit(e) 1-22, X and Y each with described sequence information, and identify the sequence label of certain number for each a normalization method chromosome sequence that is used for described interested any four or more karyomit(e)s; (c) use the number of the described sequence label that identifies for the number of each the described sequence label that identifies in described interested any four or the more karyomit(e) and for each described normalization method chromosome sequence to come for each calculates a monosome dosage in described interested any four or the more karyomit(e); And (d) will in described interested any four or the more karyomit(e) each each described monosome dosage with compare for each the threshold value in described interested any four or the more karyomit(e), and determine thus in described sample to exist or do not have any four kinds or more kinds of complete, different fetal chromosomal dysploidy, wherein be selected from karyomit(e) 1-22, X, and described interested any four or the more karyomit(e) of Y are all karyomit(e) 1-22, X and Y, and wherein determined to exist or do not exist whole karyomit(e) 1-22, X, complete fetal chromosomal dysploidy with Y.Step (a) can comprise at least a portion in these nucleic acid of specimen is checked order, to obtain for the fetus of this specimen and the described sequence information of parent nucleic acid molecule.In some embodiments, step (c) comprises for each described interested karyomit(e) calculates a monosome dosage, as the number of this sequence label that identifies for each described interested karyomit(e) and the ratio of this sequence label number that identifies for each described interested chromosomal described normalization method chromosome sequence.In some other embodiments, step (c) comprising: (i) number by making this sequence label that identifies for each described interested karyomit(e) in step (b) and each described interested chromosomal length carry out relatedly calculating a sequence label density ratio for each described interested karyomit(e); (ii) number by making this sequence label that identifies for each described normalization method chromosome sequence in step (b) and the length of each described normalization method chromosome sequence carry out relatedly calculating a sequence label density ratio for each described normalization method chromosome sequence; And (iii) be used in step (i) and (ii) in these sequence label density of calculating recently calculate a monosome dosage for each described interested karyomit(e), wherein this karyomit(e) dosage is as calculating for each described interested chromosomal sequence label density ratio and ratio for the sequence label density ratio of each described interested chromosomal described normalization method chromosome sequence.
In office how going up in the embodiment, this normalization method chromosome sequence can be a kind of monosome that is selected from karyomit(e) 1-22, X and Y.Alternately, this normalization method chromosome sequence is a group chromosome that is selected from karyomit(e) 1-22, X and Y.
A kind of method is provided in another embodiment, has been used for determining to exist or do not exist any or multiple different, complete fetal chromosomal dysploidy in the parent specimen that comprises fetus and parent nucleic acid.The step of the method comprises: (a) obtain the sequence information for the described fetus in sample and parent nucleic acid; (b) come to identify the sequence label of certain number in interested any one or a plurality of karyomit(e) that are selected from karyomit(e) 1-22, X and Y each with described sequence information, and identify the sequence label of certain number for each a normalization method chromosome sequence that is used for described interested any one or a plurality of karyomit(e)s; (c) use the number of the described sequence label that identifies for the number of each the described sequence label that identifies in described interested any one or a plurality of karyomit(e) and for each described normalization method sector sequence to come for each calculates a monosome dosage in described interested any one or a plurality of karyomit(e); And (d) will for each the described monosome dosage in described interested any one or a plurality of karyomit(e) with compare for each the threshold value in described interested one or more karyomit(e)s, and determine thus in described sample, to exist or do not exist any one or more complete, different fetal chromosomal dysploidy.Step (a) can comprise at least a portion in these nucleic acid of specimen is checked order, to obtain for the fetus of this specimen and the described sequence information of parent nucleic acid molecule.
In some embodiments, step (c) comprises for each described interested karyomit(e) calculates a monosome dosage, as the number of this sequence label that identifies for each described interested karyomit(e) and the ratio of this sequence label number that identifies for each described interested chromosomal described normalization method chromosome sequence.In some other embodiments, step (c) comprising: (i) number by making this sequence label that described interested karyomit(e) identifies for each in step (b) and each described interested chromosomal length carry out relatedly coming that each in the described interested karyomit(e) calculates a sequence label density ratio for each; (ii) number by making this sequence label that identifies for each described normalization method sector sequence in step (b) and the chromosomal length of each described normalization method carry out relatedly calculating a sequence label density ratio for each described normalization method sector sequence; And (iii) use step (i) and (ii) in the sequence label density that calculates recently calculate each monosome dosage in the interested described karyomit(e), wherein said karyomit(e) dosage be calculated as in the interested karyomit(e) each the sequence label density ratio and each the ratio of sequence label density ratio of normalization method sector sequence in the interested karyomit(e).
A kind of method is provided in another embodiment, has been used for determining to exist or do not exist any or multiple different, complete fetal chromosomal dysploidy in the parent specimen that comprises fetus and parent nucleic acid.The step of the method comprises: (a) obtain the sequence information for the fetus in sample and parent nucleic acid; (b) come to identify the sequence label of certain number in interested any one or a plurality of karyomit(e) that are selected from karyomit(e) 1-22, X and Y each with described sequence information, and identify the sequence label of certain number for each a normalization method chromosome sequence that is used for described interested any one or a plurality of karyomit(e)s; (c) use the number of the described sequence label that identifies for the number of each the described sequence label that identifies in described interested any one or a plurality of karyomit(e) and for each described normalization method sector sequence to come for each calculates a monosome dosage in described interested any one or a plurality of karyomit(e); And (d) will in described interested any one or a plurality of karyomit(e) each each described monosome dosage with compare for each the threshold value in described interested any one or a plurality of karyomit(e), and determine thus in described sample, to exist or do not exist one or more complete, different fetal chromosomal dysploidy, wherein be selected from karyomit(e) 1-22, X, and described interested any one or a plurality of karyomit(e) of Y comprise and are selected from karyomit(e) 1-22, at least two ten karyomit(e)s of X and Y, and wherein determined to exist or do not exist at least two ten kinds of different complete fetal chromosomal dysploidy.Step (a) can comprise at least a portion in these nucleic acid of specimen is checked order, to obtain for the fetus of this specimen and the described sequence information of parent nucleic acid molecule.In some embodiments, step (c) comprises for each described interested karyomit(e) calculates a monosome dosage, as the number of this sequence label that identifies for each described interested karyomit(e) and the ratio of this sequence label number that identifies for each described interested chromosomal described normalization method chromosome sequence.In some other embodiments, step (c) comprising: (i) number by making this sequence label that identifies for each described interested karyomit(e) in step (b) and each described interested chromosomal length carry out relatedly calculating a sequence label density ratio for each described interested karyomit(e); (ii) number by making this sequence label that identifies for each described normalization method sector sequence in step (b) and the chromosomal length of each described normalization method carry out relatedly calculating a sequence label density ratio for each described normalization method sector sequence; And (iii) use in step (i) and these sequence label density that calculate (ii) recently calculate a monosome dosage for each described interested karyomit(e), wherein said karyomit(e) dosage is as calculating for each described interested chromosomal sequence label density ratio and for the ratio of the sequence label density ratio of each described interested chromosomal described normalization method sector sequence.
A kind of method is provided in another embodiment, has been used for determining to exist or do not exist any or multiple different, complete fetal chromosomal dysploidy in the parent specimen that comprises fetus and parent nucleic acid.The step of the method comprises: (a) obtain the sequence information for the fetus in sample and parent nucleic acid; (b) come to identify the sequence label of certain number in interested any one or a plurality of karyomit(e) that are selected from karyomit(e) 1-22, X and Y each with described sequence information, and identify the sequence label of certain number for each a normalization method sector sequence that is used for described interested any one or a plurality of karyomit(e)s; (c) use the number of the described sequence label that identifies for the number of each the described sequence label that identifies in described interested any one or a plurality of karyomit(e) and for each described normalization method sector sequence to come for each calculates a monosome dosage in described interested any one or a plurality of karyomit(e); And (d) will in described interested any one or a plurality of karyomit(e) each each described monosome dosage with compare for each the threshold value in described interested any one or a plurality of karyomit(e), and determine thus in described sample, to exist or do not exist one or more complete, different fetal chromosomal dysploidy, wherein be selected from karyomit(e) 1-22, X, and described interested any one or a plurality of karyomit(e) of Y are whole karyomit(e) 1-22, X and Y, and wherein determined to exist or do not exist whole karyomit(e) 1-22, X, complete fetal chromosomal dysploidy with Y.Step (a) can comprise at least a portion in these nucleic acid of specimen is checked order, to obtain for the fetus of this specimen and the described sequence information of parent nucleic acid molecule.In some embodiments, step (c) comprises for each described interested karyomit(e) calculates a monosome dosage, as the number of this sequence label that identifies for each described interested karyomit(e) and the ratio of this sequence label number that identifies for each described interested chromosomal described normalization method chromosome sequence.In some other embodiments, step (c) comprising: (i) number by making this sequence label that identifies for each described interested karyomit(e) in step (b) and each described interested chromosomal length carry out relatedly calculating a sequence label density ratio for each described interested karyomit(e); (ii) number by making this sequence label that identifies for each described normalization method sector sequence in step (b) and the chromosomal length of each described normalization method carry out relatedly calculating a sequence label density ratio for each described normalization method sector sequence; And (iii) be used in step (i) and (ii) in these sequence label density of calculating recently calculate a monosome dosage for each described interested karyomit(e), wherein said karyomit(e) dosage is as calculating for each described interested chromosomal sequence label density ratio and ratio for the sequence label density ratio of each described interested chromosomal described normalization method sector sequence.
In any of above embodiment, these different complete karyomit(e) dysploidy are selected from complete karyomit(e) trisomy, complete karyomit(e) monosomy and complete karyomit(e) polysomy.These coloured differently body dysploidy are selected from any the complete dysploidy among karyomit(e) 1-22, X and the Y.For example, the fetal chromosomal dysploidy that described difference is complete is selected from trisomy 2, trisomy 8, trisomy 9, trisomy 20, trisomy 21, trisomy 13, trisomy 16, trisomy 18, trisomy 22,47, XXX, 47, XYY and monosomy X.
In any of above embodiment, for the specimen repeating step (a)-(d) from different parent experimenters, and the method comprises to be determined in each specimen, has or do not exist the karyomit(e) dysploidy of any four or more different complete fetus.
In any of above embodiment, the method may further include and calculates a normalization method karyomit(e) value (NCV), it is related that wherein said NCV carries out described karyomit(e) dosage and the mean value of corresponding karyomit(e) dosage in a combination lattice sample, as:
NCV ij = x ij - μ ^ j σ ^ j
Wherein
Figure BDA00002366924900101
With
Figure BDA00002366924900102
Respectively estimation mean value and the standard deviation for j karyomit(e) dosage in a combination lattice sample accordingly, and x IjFor viewed j the karyomit(e) dosage of specimen i.
A kind of method is provided in another embodiment, has been used for determining to exist or do not exist fetal chromosomal dysploidy different, part in the parent specimen that comprises fetus and parent nucleic acid.The step of the method comprises: (a) obtain the sequence information for the fetus in sample and parent nucleic acid; (b) use described sequence information to identify the sequence label of certain number for each interested any one or a plurality of chromosomal any one or a plurality of section that is selected from karyomit(e) 1-22, X and Y and identify the sequence label of certain number for the normalization method sector sequence of each described interested any one or a plurality of chromosomal any one or a plurality of sections; (c) use the number of the described sequence label that identifies for each described interested any one or a plurality of chromosomal any one or a plurality of section and the number of the described sequence label that identifies for each described normalization method sector sequence to calculate a monosome dosage in described interested any one or a plurality of chromosomal any one or a plurality of section each; And (d) will compare for each the described single section dosage in each described interested any one or a plurality of chromosomal any one or a plurality of section and a threshold value for each described interested any one or a plurality of chromosomal any one or a plurality of sections, and determine thus in described sample, to exist or do not have one or more fetal chromosomal dysploidy different, part.Step (a) can comprise at least a portion in these nucleic acid of specimen is checked order, to obtain for the fetus of this specimen and the described sequence information of parent nucleic acid molecule.
In some embodiments, step (c) comprises for each described interested any one or a plurality of chromosomal any one or a plurality of section and calculates a single section dosage, the ratio of the number of this sequence label that identifies as the number of this sequence label that identifies for each described interested any one or a plurality of chromosomal any one or a plurality of section and described normalization method sector sequence for each described interested any one or a plurality of chromosomal any one or a plurality of sections.In some other embodiments, step (c) comprising: (i) by making in step (b) for each the number of this sequence label that identifies in described interested each section and the length of each described interested section carry out relatedly calculating a sequence label density ratio for each described interested section; (ii) number by making this sequence label that identifies for each described normalization method sector sequence in step (b) and the length of each described normalization method sector sequence carry out relatedly calculating a sequence label density ratio for each described normalization method sector sequence; And (iii) use in step (i) and these sequence label density that calculate (ii) recently calculate a monosome dosage for each described interested section, wherein said section dosage is as calculating for the sequence label density ratio of each described interested section and for the ratio of the sequence label density ratio of the described normalization method sector sequence of each described interested section.The method may further include and calculates a normalization method section value (NSV), and it is related that wherein said NSV carries out described section dosage and the mean value of corresponding section dosage in a combination lattice sample, as:
NSV ij = x ij - μ ^ j σ ^ j
Wherein
Figure BDA00002366924900112
With
Figure BDA00002366924900113
Accordingly Estimation mean value and the standard deviation for j section dosage in a combination lattice sample, and x IjViewed j the section dosage for specimen i.
In a plurality of embodiments of illustrated method, determine karyomit(e) dosage or section dosage with the normalization method sector sequence thus, this normalization method sector sequence can be any one or more among karyomit(e) 1-22, an X and the Y single section.Alternately, this normalization method sector sequence can be one group of any one or more among karyomit(e) 1-22, X and Y section.
Be recycled and reused for the step (a)-(d) of the method for the fetal chromosomal dysploidy that determine to have or do not exist part for a plurality of specimen from different parent experimenters, and the method comprises and determines to exist or do not have a fetal chromosomal dysploidy different, part in each described sample.The dysploidy that comprises the part of any chromosomal any fragment according to the fetal chromosomal dysploidy of the confirmable part of the method.The dysploidy of these parts can be selected from the part copy, the part multiplication, the part insertion and the part disappearance.Comprise the partial monosomy of karyomit(e) 1, the partial monosomy of karyomit(e) 4, the partial monosomy of karyomit(e) 5, the partial monosomy of karyomit(e) 7, the partial monosomy of karyomit(e) 11, the partial monosomy of karyomit(e) 15, the partial monosomy of karyomit(e) 17, the partial monosomy of karyomit(e) 18 and the partial monosomy of chromosome 22 according to the example of the confirmable part dysploidy of the method.
In any one of above-mentioned embodiment, this specimen can be a maternal sample that is selected from blood, blood plasma, serum, urine and saliva sample.In any one of these embodiments, this specimen can be plasma sample.These nucleic acid molecule of maternal sample are Cell-free DNA molecules fetus and parent.Can use order-checking of future generation (NGS) to come these nucleic acid is checked order.In some embodiments, order-checking is the extensive parallel order-checking of using by the synthesis method order-checking of reversible dyestuff terminator.In other embodiments, order-checking is the connection method order-checking.Still in other embodiments, order-checking is single-molecule sequencing.Optionally, before order-checking, carry out an amplification step.
A kind of method is provided in another embodiment, has been used for determining to exist or do not exist any 20 kinds or more kinds of different, complete fetal chromosomal dysploidy in the Maternal plasma specimen of the mixture of the Cell-free DNA molecule that comprises fetus and parent.The step of the method comprises: (a) at least a portion in the Cell-free DNA molecule is checked order in order to obtain sequence information for the Cell-free DNA molecule of the fetus in this sample and parent; (b) come the sequence label that identifies the sequence label of certain number and come to identify for each described interested 20 or a more chromosomal normalization method karyomit(e) certain number for being selected from each of karyomit(e) 1-22, X and Y interested any 20 or more karyomit(e) with described sequence information; (c) use the number of the described sequence label that identifies for each described interested 20 or more karyomit(e) and the number of the described sequence label that identifies for each described normalization method karyomit(e) to calculate a monosome dosage for each described interested 20 or more karyomit(e); And (d) will for each described interested 20 or more chromosomal each described monosome dosage with compare for each described interested 20 or more chromosomal threshold value, and determine thus in described sample, to exist or do not exist any 20 kinds or more kinds of different, complete fetal chromosomal dysploidy.
In another embodiment, the invention provides the method for the copy number variation (CNV) of an interested sequence that is identified in specimen (sequence of for example being correlated with clinically), the method may further comprise the steps: (a) obtain a specimen and a plurality of qualified sample, described specimen comprises test nucleic acid molecule and described a plurality of qualified sample, and described a plurality of qualified samples comprise qualified nucleic acid molecule; (b) obtain sequence information at fetus and nucleic acid parent described in the described sample; (c) calculate the qualified sequence dosage of interested described qualified sequence in each described a plurality of qualified samples based on the described order-checking of described qualified nucleic acid molecule, the qualified sequence dosage of wherein said calculating comprises the parameter of determining interested described qualified sequence and at least one qualified normalization method sequence; (d) identify at least one qualified normalization method sequence based on described qualified sequence dosage, wherein have minimum variability and/or maximum resolvability at least one qualified normalization method sequence described in described a plurality of qualified samples; (e) based on the described order-checking at nucleic acid molecule described in the described specimen, calculate the cycle tests dosage of interested described cycle tests, wherein said calculating cycle tests dosage comprises the parameter of determining described interested cycle tests and at least one normalization method cycle tests, and described at least one normalization method cycle tests is corresponding to described at least one qualified normalization method sequence; (f) more described cycle tests dosage and at least one threshold value; And (g) evaluate described copy number variation in interested sequence described in the described specimen based on the result of step (f).In one embodiment, it is related for the parameter of described interested qualified sequence and at least one qualified normalization method sequence these a plurality of sequence labels that are mapped to described interested qualified sequence and these a plurality of labels that are mapped to described qualified normalization method sequence to be carried out, and wherein the described parameter of interested described cycle tests and at least one normalization method cycle tests makes these a plurality of sequence labels that are mapped to described interested cycle tests carry out related with these a plurality of labels that are mapped to described normalization method cycle tests.In some embodiments, step (b) comprises that at least a portion in the nucleic acid molecule of qualified to these in test checks order, and wherein order-checking comprises the sequence label of a plurality of mappings that are provided for testing and an interested qualified sequence and is used at least one test and normalization method sequence that at least one is qualified; At least a portion in the described nucleic acid molecule of specimen is checked order to obtain the fetus of this specimen and the sequence information of parent nucleic acid molecule.Used in some embodiments sequence measurement of future generation to carry out this order-checking step.In some embodiments, this sequence measurement can be extensive parallel sequence measurement, and wherein this sequence measurement uses the synthesis method order-checking by reversible dyestuff terminator.In other embodiments, this sequence measurement is the connection method order-checking.In some embodiments, order-checking comprises once amplification.In other embodiments, order-checking is single-molecule sequencing.The CNV of interested sequence is a kind of dysploidy, and it can be a chromosomal or partial dysploidy.In some embodiments, this karyomit(e) dysploidy is to be selected from trisomy 2, trisomy 8, trisomy 9, trisomy 20, trisomy 16, trisomy 21, trisomy 13, trisomy 18, trisomy 22, Ge Laifude Cotard (klinefelter ' s syndrome), 47, XXX, 47, XYY and monomer X.In other embodiments, the dysploidy of this part is that a chromosome dyad disappearance or a chromosome dyad insert.In some embodiments, the CNV by the method identification is a kind of chromosomal or partial dysploidy relevant with cancer.In some embodiments, these tests and qualified sample are the biological fluid samples, for example: the plasma sample that derives from conceived experimenter (such as the human experimenter of pregnancy).In other embodiments, biological fluid sample test and qualified (for example plasma sample) is to derive from known or suspect the experimenter suffer from cancer.
Be used for determining to exist or do not exist some method of fetal chromosomal aneuploidy can comprise following operation in the parent specimen: the sequence reading from the fetus in this parent specimen and parent nucleic acid (a) is provided, and wherein these sequence readings provide with electronic format; (b) use a calculating device that these sequence readings and one or more karyomit(e) reference sequences are compared, and a plurality of sequence labels corresponding with these sequence readings are provided thus; (c) identify number from these sequence labels of one or more interested karyomit(e)s or interested chromosome segment in the mode of calculating, and with the mode of calculating identify in this or these interested karyomit(e) or interested chromosome segment each at least one normalization method chromosome sequence or the number of these sequence labels of normalization method chromosome segment sequence; (d) use for the number of each the described sequence label identified in described one or more interested karyomit(e)s or the interested chromosome segment and for the number of each the described sequence label identified in described normalization method chromosome sequence or the normalization method chromosome segment sequence, calculate for each monosome or the section dosage in described one or more interested karyomit(e)s or the interested chromosome segment in the mode of calculating; And (e) use described calculating device will in one or more interested karyomit(e)s or the interested chromosome segment each described monosome dosage each with compare for each the respective threshold in described one or more interested karyomit(e)s or the interested chromosome segment, and in described specimen, determine to exist or do not exist at least a fetus dysploidy thus.In some implementation, be at least about 10,000 or at least about 100,000 for the number of each sequence label identified in this or these interested karyomit(e) or the interested chromosome segment.Disclosed embodiment also provides a kind of computer program, this computer program comprises a nonvolatile computer-readable media, and the programmed instruction that is used for carrying out described operation and other calculating operations described here is provided at this nonvolatile computer-readable media.
In certain embodiments, the karyomit(e) reference sequences has a plurality of zones that are excluded, and these zones that are excluded are present in the karyomit(e) natively but they do not affect the number of its sequence label for any karyomit(e) or chromosome segment.In certain embodiments, a kind of method comprises in addition: (i) determine whether to be compared in a reading of paying attention to and a site on a karyomit(e) reference sequences, and another reading from specimen had before been compared in this site; And (ii) determine whether this reading of paying attention to is included among the number for the sequence label of an interested karyomit(e) or an interested chromosome segment.The karyomit(e) reference sequences can be stored on the computer-readable media.
In certain embodiments, a kind of method comprises in addition at least a portion in the described nucleic acid molecule of described parent specimen is checked order, in order to obtain for the described fetus of described specimen and the described sequence information of parent nucleic acid molecule.Order-checking can comprise carries out extensive parallel order-checking to produce the sequence reading to the parent from this parent specimen with fetal nucleic acid.
In certain embodiments, a kind of method further is included in and uses treater automatically record as determined existence or do not have fetal chromosomal aneuploidy in (d) in the human experimenter's that this parent specimen is provided the patient medical record card.Record can be included in the computer-readable media record karyomit(e) dosage and/or based on the diagnosis of described karyomit(e) dosage.In some cases, the patient medical record card blocks the website by laboratory, doctor's office, hospital, HMO, Insurance Company or IMR and preserves.A kind of method can further comprise to the human experimenter who obtains this parent specimen prescribe, begin treatment and/or change treatment.In addition or alternately, the method can comprise reservation and/or carry out one or more other tests.
Some method disclosed here is identified normalization method chromosome sequence or the normalization method chromosome segment sequence of interested karyomit(e) or chromosome segment.Some described method comprises following operation: a plurality of qualified samples for interested karyomit(e) or chromosome segment (a) are provided; (b) come for interested karyomit(e) or chromosome segment double counting karyomit(e) dosage with a plurality of potential normalization method chromosome sequences or normalization method chromosome segment sequence, wherein this double counting is carried out with a calculating device; And (c) individually or in a kind of combination normalization method chromosome sequence or normalization method chromosome segment sequence are selected, thereby in the dosage that calculates for interested karyomit(e) or chromosome segment, provided minimum variability and/or large resolvability.
Selected normalization method chromosome sequence or normalization method chromosome segment sequence can be the parts of the combination of normalization method chromosome sequence or normalization method chromosome segment sequence, or can provide separately, rather than combined with other normalization method chromosome sequences or normalization method chromosome segment sequence.
The embodiment that discloses provides a kind of method that variation is classified to the copy number in the Fetal genome.The operation of the method comprises: (a) receive the sequence reading from the fetus in the parent specimen and parent nucleic acid, wherein these sequence readings provide with electronic format; (b) use a calculating device that these sequence readings and one or more karyomit(e) reference sequences are compared, and a plurality of sequence labels corresponding with these sequence readings are provided thus; (c) by using this calculating device to identify number from one or more interested chromosomal these sequence labels in the mode of calculating, and determine that first an interested karyomit(e) in this fetus makes a variation with copy number; (d) calculate first a fetus fractional value by a kind of the first method, this first method is not used the information from this first interested chromosomal label; (e) calculate second a fetus fractional value by a kind of the second method, this second method is used the information from the label of this first chromosome; And (f) compare and use this relatively the copy number variation of this first chromosome to be classified this first fetus fractional value and this second fetus fractional value.In certain embodiments, the method further comprises and checks order to provide these sequence readings to the Cell-free DNA from this parent specimen.In certain embodiments, the method further comprises from a conceived organism and obtains this parent specimen.In certain embodiments, operation (b) comprises that calculating device comparison of use is at least about 1,000,000 readings.In certain embodiments, operation (f) can comprise and determines whether approximately equal of these two fetus fractional values.
In certain embodiments, operation (f) can further comprise determines these two fetus fractional value approximately equals, and determines that thus a ploidy hypothesis that implies in this second method is real.In certain embodiments, this ploidy hypothesis that implies in this second method is that this first interested karyomit(e) has complete chromosomal aneuploidy.In some these embodiment, this first interested chromosomal complete chromosomal aneuploidy is monosomy or trisomy.
In certain embodiments, whether not operation (f) can comprise determines these two fetus fractional values approximately equal, and further comprise and analyze this first interested chromosomal label information to determine that (i) this first interested karyomit(e) is with a kind of part dysploidy, still (ii) this fetus is a mosaic.
In certain embodiments, this operation can also comprise this first interested chromosomal sequence is cased into a plurality of parts; Determine whether in the described part any comprises than one or more other parts and significantly more many or significantly still less nucleic acid; And if in the described part any comprise than one or more other parts and significantly more many or significantly still less nucleic acid, determine that then this first interested karyomit(e) is with the part dysploidy.In one embodiment, this operation can further comprise determine to comprise than one or more other parts significantly more many or this first interested chromosomal part of significantly still less nucleic acid with the part dysploidy.
In one embodiment, operation (f) can also comprise this first interested chromosomal sequence is cased into a plurality of parts; Determine whether in the described part any comprises than one or more other parts and significantly more many or significantly still less nucleic acid; And if do not comprise in the described part than one or more other parts and significantly more many or significantly still less nucleic acid, determine that then this fetus is a mosaic.
Operation (e) can comprise: (a) calculate number from the sequence label of this first interested karyomit(e) and at least one normalization method chromosome sequence to determine karyomit(e) dosage; And (b) use the second method from this karyomit(e) Rapid Dose Calculation fetus fractional value.In certain embodiments, this operation further comprises calculates normalized karyomit(e) value (NCV), wherein this second method is used this normalized karyomit(e) value, and wherein this NCV is associated this karyomit(e) dosage with the average of corresponding karyomit(e) dosage in a combination lattice sample, as:
NCV iA = R iA - R lU ‾ σ iU
Wherein
Figure BDA00002366924900172
And σ IURespectively estimation average and the standard deviation for i karyomit(e) dosage in this combination lattice sample, and R IAThe karyomit(e) dosage that calculates for interested karyomit(e).In another embodiment, operation (d) comprises further that the first method is used from the information of unbalanced one or more polymorphisms of allelotrope in the fetus that represents this parent specimen and the parent nucleic acid and calculates the first fetus fractional value.
In different embodiments, if the first fetus fractional value and the second fetus fractional value approximately equal not, then the method comprises that further (i) determines that the copy number variation is to be caused by part dysploidy or mosaic; And if (ii) copy number variation is caused by the part dysploidy, then determine the locus of the part dysploidy on this first interested karyomit(e).In certain embodiments, the locus of determining the part dysploidy on this first interested karyomit(e) comprises these first interested chromosomal these sequence labels is divided into nucleic acid data box or matrix in this first interested karyomit(e); And these map tags in each data box are counted.
Operation (e) can further comprise by the following formula evaluation is calculated the fetus fractional value:
ff=2×|NCV iACV iU|
Wherein ff is the second fetus fractional value, NCV IAThe normalized karyomit(e) value on i karyomit(e) in an influenced sample, and CV IUIt is the variation coefficient of the interested chromosomal dosage in these qualified samples, determined.
In the embodiment, this first interested karyomit(e) is to be selected from lower group more than any, and this group is comprised of karyomit(e) 1 to 22, X and Y.More than any in the embodiment, operation (f) can be categorized into copy number variation and be selected from a classification of lower group, and this group is comprised of the following: complete karyomit(e) insertion, complete chromosome deletion, chromosome dyad copies and chromosome dyad lacks and mosaic.
Disclosed embodiment also provides a kind of computer program, this computer program comprises a nonvolatile computer-readable media, and the programmed instruction of classifying for to the copy number variation of Fetal genome is provided at this nonvolatile computer-readable media.This computer program can comprise: (a) be used for to receive the code from the sequence reading of the fetus of a parent specimen and parent nucleic acid, wherein these sequence readings provide with electronic format; (b) use a calculating device to be used for these sequence readings are compared with one or more karyomit(e) reference sequences and code with the corresponding a plurality of sequence labels of these sequence readings is provided thus; (c) by using this calculating device to be used for identifying from the number of one or more interested chromosomal these sequence labels and determining that the first interested karyomit(e) of this fetus is with the code of copy number variation in the mode of calculating; (d) for the code that calculates the first fetus fractional value by a kind of the first method, this first method is not used the information from this first interested chromosomal label; (e) for the code that calculates the second fetus fractional value by a kind of the second method, this second method is used the information from the label of this first chromosome; And (f) be used for the code that compares and use this relatively the copy number of this first chromosome to be made a variation and classifies this first fetus fractional value and this second fetus fractional value.In certain embodiments, this computer program comprises for the different operation of any above embodiment of disclosed method and the code of method.
The embodiment that discloses also provides a kind of system that variation is classified to the copy number in the Fetal genome.This system comprises: (a) be used for receiving from the fetus of a parent specimen and an interface at least about 10,000 sequence readings of parent nucleic acid, wherein these sequence readings provide with electronic format; (b) for the storer of at least temporarily storing a plurality of described sequence readings; (c) treater, this treater is designed or is configured to a plurality of programmed instruction, these programmed instruction are used for: (i) these sequence readings and one or more karyomit(e) reference sequences are compared, and a plurality of sequence labels corresponding with these sequence readings are provided thus; (ii) identification is from a number of one or more interested chromosomal these sequence labels, and determines that first an interested karyomit(e) in this fetus makes a variation with copy number; (iii) calculate first a fetus fractional value by a kind of the first method, this first method is not used the information from this first interested chromosomal label; (iv) calculate second a fetus fractional value by a kind of the second method, this second method is used the information from the label of this first chromosome; And (v) compare and use this relatively the copy number variation of this first chromosome to be classified this first fetus fractional value and this second fetus fractional value.According to different embodiments, the first interested karyomit(e) is to be selected from lower group, and this group is comprised of karyomit(e) 1 to 22, X and Y.In certain embodiments, be used for (c) programmed instruction (v) and comprise that this group is comprised of the following for the programmed instruction that this copy number variation is categorized into a classification that is selected from lower group: complete karyomit(e) insertion, complete chromosome deletion, chromosome dyad copies and chromosome dyad lacks and mosaic.According to different embodiments, this system can comprise the programmed instruction that the Cell-free DNA from this parent specimen is checked order to provide these sequence readings.According to some embodiment, be used for operation (c) programmed instruction (i) and comprise that the use calculating device is used for comparison at least about the programmed instruction of 1,000,000 readings.
In certain embodiments, this system also comprises a sequenator, and this sequenator is configured to for the fetus of a parent specimen and parent nucleic acid being checked order and providing the sequence reading with electronic format.In different embodiments, this sequenator is arranged in the facility that separates with this treater, and this sequenator links to each other by network with this treater.
In different embodiments, system also further comprises for the device that obtains the parent specimen from a conceived mother.According to some embodiment, be positioned at facility out of the ordinary for this device and this treater that obtain the parent specimen.In different embodiments, system also comprises for the device that extracts Cell-free DNA from the parent specimen.In certain embodiments, this device and this sequenator that are used for the extraction Cell-free DNA are positioned at same facility, and are positioned at a far-end facility for this device that obtains the parent specimen.
According to some embodiment, be used for the programmed instruction that this first fetus fractional value and this second fetus fractional value compare is also comprised for definite these two fetus fractional values approximately equalised programmed instruction whether.
In certain embodiments, this system also comprises for determining that the ploidy hypothesis that the second method implies is real programmed instruction when this two fetus fractional value approximately equals.In certain embodiments, the ploidy hypothesis that implies in the second method is that this first interested karyomit(e) has complete chromosomal aneuploidy.In certain embodiments, this first interested chromosomal complete chromosomal aneuploidy is monosomy or trisomy.
In certain embodiments, this system also comprises for analyzing this first interested chromosomal label information to determine that (i) this first interested karyomit(e) is with a kind of part dysploidy, still (ii) this fetus is a chimeric programmed instruction, and these programmed instruction of wherein be used for analyzing are configured in for the programmed instruction that this first fetus fractional value and this second fetus fractional value are compared indicates these two fetus fractional values not carry out during approximately equal.In certain embodiments, comprise for the programmed instruction of analyzing this first interested chromosomal label information: the programmed instruction that is used for this first interested chromosomal sequence is cased into a plurality of parts; Whether any that be used for to determine described part comprises than one or more other parts is significantly more manyed or remarkable still less the programmed instruction of nucleic acid; And if be used for described part any comprise than one or more other parts and significantly more many or significantly still less nucleic acid, then definite this first interested karyomit(e) is with a kind of programmed instruction of part dysploidy.In certain embodiments, this system further comprise be used to determine to comprise than one or more other parts significantly more many or this first interested chromosomal part of significantly still less nucleic acid with the programmed instruction of this part dysploidy.
In certain embodiments, comprise for the programmed instruction of analyzing this first interested chromosomal label information: the programmed instruction that is used for this first interested chromosomal sequence is cased into a plurality of parts; Whether any that be used for to determine described part comprises than one or more other parts is significantly more manyed or remarkable still less the programmed instruction of nucleic acid; And if be used for described part and do not comprise than one or more other parts and significantly more many or significantly still less nucleic acid, then definite this fetus is a chimeric programmed instruction.
According to different embodiments, this system can comprise that these programmed instruction comprise for the programmed instruction of the second method of calculating the fetus fractional value: (a) be used for calculating number from the sequence label of this first interested karyomit(e) and at least one normalization method chromosome sequence to determine the programmed instruction of karyomit(e) dosage; (b) be used for using the second method from the programmed instruction of this karyomit(e) Rapid Dose Calculation fetus fractional value.
In certain embodiments, this system further comprises be used to the programmed instruction that calculates normalized karyomit(e) value (NCV), the programmed instruction that wherein is used for the second method comprises for the programmed instruction that uses this normalized karyomit(e) value, and the programmed instruction that wherein is used for this NCV with this karyomit(e) dosage be associated in an average that makes up the corresponding karyomit(e) dosage of lattice sample, as:
NCV iA = R iA - R lU ‾ σ iU
Wherein
Figure BDA00002366924900212
And σ IURespectively estimation average and the standard deviation for i karyomit(e) dosage in this combination lattice sample, and R IAThe karyomit(e) dosage that calculates for interested karyomit(e).In different embodiments, the programmed instruction that is used for this first method comprises the programmed instruction that calculates the first fetus fractional value from the information of unbalanced one or more polymorphisms of allelotrope of the fetus that represents this parent specimen and parent nucleic acid be used to using.
According to different embodiments, the programmed instruction that is used for the second method of calculating fetus fractional value comprises for the programmed instruction to the following formula evaluation:
ff=2×|NCV iACV iU|
Wherein ff is the second fetus fractional value, NCV IAThe normalized karyomit(e) value on i karyomit(e) in an influenced sample, and CV IUIt is the variation coefficient of the interested chromosomal dosage in these qualified samples, determined.
According to different embodiments, this system further comprises: be used for (i) determining that this copy number variation is by a kind of part dysploidy or the programmed instruction that mosaic causes; If (ii) being used for this copy number variation is caused by a kind of part dysploidy, then determine the programmed instruction of the locus of this part dysploidy on this first interested karyomit(e), wherein (i) and (ii) in these programmed instruction be configured in for these programmed instruction that this first fetus fractional value and this second fetus fractional value are compared and determine that this first fetus fractional value and this second fetus fractional value do not carry out during approximately equal.
In certain embodiments, the programmed instruction for the locus of determining the part dysploidy on the first interested karyomit(e) comprises for the first interested chromosomal sequence label is divided into the nucleic acid data box of the first interested karyomit(e) or the programmed instruction of matrix; With the programmed instruction that is used for these map tags of each data box are counted.
in certain embodiments, be provided for the method that the identification cancer exists and/or risk of cancer increases in mammal (for example mankind), wherein these methods comprise: the sequence reading from the nucleic acid in a described mammiferous specimen (a) is provided, wherein said specimen can comprise from the genomic nucleic acids of cancer cell or precancerous cell and from the genomic nucleic acids that forms (planting system) cell, wherein these sequence readings provide with electronic format, (b) use a calculation element that these sequence readings and one or more chromosome reference sequences are compared, and a plurality of sequence labels corresponding with these sequence readings are provided thus, (c) identify the number of the sequence label of the fetus of the interested chromosome segment that joins from the interested chromosome of one or more known amplifications or disappearance and related to cancer connection or known amplification or disappearance and related to cancer and parent nucleic acid in the mode of calculating, wherein said chromosome or chromosome segment are to be selected from chromosome 1 to 22, X and Y with and section, and with the mode of calculating identify in this or these interested chromosome or interested chromosome segment each at least one normalization chromosome sequence or the number of the sequence label of normalization chromosome segment sequence, wherein for the number of each sequence label identified in this or these interested chromosome or interested chromosome segment at least about 2, 000, or at least about 5, 000, or at least about 10, 000, (d) use for the number of each the described sequence label identified in described one or more interested chromosomes or interested chromosome segment and for the number of each the described sequence label identified in described normalization chromosome sequence or normalization chromosome segment sequence, calculate for each monosome or the section dosage in described one or more interested chromosomes or interested chromosome segment in the mode of calculating, And (e) using said computing means of interest for one or more chromosomes or chromosomal regions of interest in the chromosome of each of said single-dose for each one of said one or more of interest chromosome or chromosomal region of interest in each of a corresponding threshold value; and thereby in said sample to determine the presence or absence of aneuploidy, wherein the presence of aneuploidy and / or the for those interested in this or chromosome or chromosomal region of interest in each of the identified sequences to increase the number of labels indicates the presence of cancer and / or cancer risk.In certain embodiments, the risk increase is that the same experimenter with different time (for example early stage) compares, compare with reference group's (such as for sex and/or the optional adjustment such as race and/or age), compare etc. with similar experimenter without certain risk factor.In certain embodiments, interested karyomit(e) or interested chromosome segment comprise amplification and/or lack known and the related whole chromosome of cancer (example as described in this).In certain embodiments, interested karyomit(e) or interested chromosome segment comprise amplification or lack chromosome segments known and one or more related to cancer connection.In certain embodiments, chromosome segment comprises in fact whole chromosome arm (example as described in this).In certain embodiments, chromosome segment comprises the whole chromosome dysploidy.In certain embodiments, the whole chromosome dysploidy comprises loses, and in certain other embodiments, and the whole chromosome dysploidy comprises acquisition (for example acquisition as shown in table 1 or lose).In certain embodiments, interested chromosome segment is the fragment of essence upper arm level, comprises among karyomit(e) 1 to 22, X and the Y any one or a plurality of galianconism or long-armed.In certain embodiments, dysploidy comprises the amplification of the horizontal fragment of chromosomal essence arm or the disappearance of the horizontal fragment of chromosomal essence arm.In certain embodiments, interested chromosome segment comprises in fact and is selected from one or more arms of lower group that this group is comprised of the following: 1q, 3q, 4p, 4q, 5p, 5q, 6p, 6q, 7p, 7q, 8p, 8q, 9p, 9q, 10p, 10q, 12p, 12q, 13q, 14q, 16p, 17p, 17q, 18p, 18q, 19p, 19q, 20p, 20q, 21q and/or 22q.In certain embodiments, dysploidy comprises the amplification of the one or more arms that are selected from lower group, and this group is comprised of the following: 1q, 3q, 4p, 4q, 5p, 5q, 6p, 6q, 7p, 7q, 8p, 8q, 9p, 9q, 10p, 10q, 12p, 12q, 13q, 14q, 16p, 17p, 17q, 18p, 18q, 19p, 19q, 20p, 20q, 21q, 22q.In certain embodiments, dysploidy comprises the disappearance of the one or more arms that are selected from lower group, and this group is comprised of the following: 1p, 3p, 4p, 4q, 5q, 6q, 8p, 8q, 9p, 9q, 10p, 10q, 11p, 11q, 13q, 14q, 15q, 16q, 17p, 17q, 18p, 18q, 19p, 19q, 22q.In certain embodiments, interested chromosome segment is to comprise the zone shown in table 3 and/or table 5 and/or table 4 and/or the table 6 and/or the fragment of gene.In certain embodiments, dysploidy comprises the amplification of the zone shown in table 3 and/or the table 5 and/or gene.In certain embodiments, dysploidy comprises the disappearance of the zone shown in table 4 and/or 6 and/or gene.In certain embodiments, interested chromosome segment is the known fragment that contains one or more oncogenes and/or one or more tumor suppressor genes.In certain embodiments, dysploidy comprises the amplification in the one or more zones that are selected from lower group, and this group is comprised of the following: 20Q13,19q12,1q21-1q23,8p11-p12 and ErbB2.In certain embodiments, dysploidy comprises one or more amplifications that comprise the zone of the gene that is selected from lower group, and this group is comprised of the following: MYC, ERBB2 (EFGR), CCND1 (cycle element D1), FGFR1, FGFR2, HRAS, KRAS, MYB, MDM2, CCNE, KRAS, MET, ERBB1, CDK4, MYCB, ERBB2, AKT2, MDM2 and CDK4 etc.In certain embodiments, cancer is to be selected from lower group cancer, and this group is comprised of the following: leukemia, ALL, the cancer of the brain, breast cancer, colorectal carcinoma, dedifferentiated liposarcoma, esophageal adenocarcinoma, esophagus squamous cell carcinoma, GIST, neurospongioma, HCC, hepatocellular cancer, lung cancer, lung NSC, lung SC, medulloblastoma, melanoma, MPD, myelosis sexual dysfunction, cervical cancer, ovarian cancer, prostate cancer and kidney.In certain embodiments, biological sample comprises and is selected from lower group sample that this group is comprised of the following: whole blood, clot, saliva/saliva, urine, biopsy, Pleural fluid, pericardial fluid, brains liquid and peritoneal fluid.In certain embodiments, the karyomit(e) reference sequences has a plurality of zones that are excluded, and these zones that are excluded are present in the karyomit(e) natively but they do not affect the number of its sequence label for any karyomit(e) or chromosome segment.In certain embodiments, the method further comprises and determines whether to be compared in a reading of paying attention to and a site on a karyomit(e) reference sequences, and before compared at another reading of this site; And determine whether this reading of paying attention to is included among the number for the sequence label of an interested karyomit(e) or an interested chromosome segment, wherein two definite operations are all carried out with this calculating device.In different embodiments, the method further comprises the sequence information of at least temporarily storing for nucleic acid described in the described sample in a kind of computer-readable media (for example nonvolatile media).In certain embodiments, step (d) comprise in the interested section selected one calculate section dosage as the number of the sequence label of being identified for this selected interested section and ratio for the number of corresponding at least one normalization method chromosome sequence of this interested section of selecting or the sequence label that normalization method chromosome segment sequence is identified in the mode of calculating.In certain embodiments, described one or more interested chromosome segment comprises at least 5 or at least 10 or at least 15 or at least 20 or at least 50 or at least 100 different interested sections.In certain embodiments, detect at least 5 or at least 10 or at least 15 or at least 20 or at least 50 or at least 100 different dysploidy.In certain embodiments, at least one normalization method chromosome sequence comprises and is selected from one or more karyomit(e)s of lower group that this group is comprised of karyomit(e) 1 to 22, X and Y.In certain embodiments, for each section, described at least one normalization method chromosome sequence comprises the corresponding karyomit(e) of karyomit(e) that is positioned at described section.In certain embodiments, for each section, described at least one normalization method chromosome sequence comprise with just by the corresponding chromosome segment of normalized chromosome segment.In certain embodiments, at least one normalization method chromosome sequence or normalization method chromosome segment sequence are for a kind of interested karyomit(e) that is associated or section selected karyomit(e) or section, this carries out in the following manner, that is: (i) identification is for a plurality of qualified samples of this interested section; (ii) come for this selected karyomit(e) double counting karyomit(e) dosage with a plurality of potential normalization method chromosome sequences or normalization method chromosome segment sequence; And (iii) individually or in a kind of combination this normalization method chromosome segment sequence is selected, thereby in the karyomit(e) dosage that calculates, provided minimum variability and/or maximum resolvability.In certain embodiments, the method further comprises calculates normalized section value (NSV), and wherein as described in this, described NSV is associated described section dosage with the average of respective section dosage in the combination lattice sample.In certain embodiments, the normalization method sector sequence is any one or an a plurality of single section among karyomit(e) 1 to 22, X and the Y.In certain embodiments, the normalization method sector sequence is any one or one group of a plurality of section among karyomit(e) 1 to 22, X and the Y.In certain embodiments, the normalization method sector sequence comprises in fact among karyomit(e) 1 to 22, X and the Y any one or a plurality of arms.In certain embodiments, the method further comprises at least a portion in the described nucleic acid molecule of described specimen is checked order, in order to obtain described sequence information.In certain embodiments, order-checking comprises and checks order to provide sequence information to the Cell-free DNA from specimen.In certain embodiments, order-checking comprises and checks order to provide sequence information to the cell DNA from specimen.In certain embodiments, order-checking comprises extensive parallel order-checking.In certain embodiments, should (these) method further be included in the human experimenter's that specimen is provided the patient medical record card automatically record as determined existence or do not have a kind of dysploidy in (d), wherein this record is carried out with treater.In certain embodiments, record is included in a kind of computer-readable media record karyomit(e) dosage and/or based on the diagnosis of described karyomit(e) dosage.In different embodiments, the patient medical record card blocks the website by laboratory, doctor's office, hospital, HMO, Insurance Company or IMR and preserves.In certain embodiments, determine to exist or not exist described dysploidy and/or number to comprise a kind of for a factor in the differential diagnosis of cancer.In certain embodiments, the detection of dysploidy indication positive findings, and described method further comprise to the human experimenter who gets specimen prescribe, begin treatment and/or change treatment.In certain embodiments, to the human experimenter who gets specimen prescribe, begin treatment and/or change treatment and comprise and prescribe and/or carry out further diagnosis to determine existing and/or severity of cancer.In certain embodiments, further diagnosis comprises for the cancer biomarkers thing, screens the sample from described experimenter, and/or for cancer, described experimenter is carried out imaging.In certain embodiments, when described method is indicated when having neoplastic cell in the described Mammals, treat described Mammals or described Mammals is treated, to remove described neoplastic cell and/or to suppress growth or the propagation of described neoplastic cell.In certain embodiments, the treatment Mammals comprises by operation and removes superfluous natural disposition (for example tumour) cell.In certain embodiments, the treatment Mammals comprises carries out radiotherapy or makes described Mammals carry out radiotherapy described Mammals, to kill neoplastic cell.In certain embodiments, the treatment Mammals comprises and gives or make described Mammals be given anticarcinogen (horse trastuzumab (matuzumab) for example, Erbitux (erbitux), Wei Ke replaces than (vectibix), Buddhist nun's trastuzumab (nimotuzumab), the horse trastuzumab, Victibix (panitumumab), Fluracil (flourouracil), capecitabine (capecitabine), 5-trifluoromethyl-2 '-deoxyuridine (5-trifluoromethy1-2 '-deoxyuridine), methotrexate (methotrexate), Raltitrexed (raltitrexed), pemetrexed (pemetrexed), cytosine arabinoside (cytosine arabinoside), Ismipur (6-mercaptopurine), azathioprine (azathioprine), 6-thioguanine (6-thioguanine), pentostatin (pentostatin), fludarabine (fludarabine), CldAdo (cladribine), floxuridine (FUDR) (floxuridine), endoxan (cyclophosphamide), knob husky (neosar), ifosfamide (ifosfamide), thiotepa (thiotepa), two (2-the chloroethyl)-1-nitrosourea of 1,3-, 1-(2-chloroethyl)-3-cyclohexyl-1-nitrosourea, altretamine (hexamethylmelamine), busulfan (busulfan), Procarbazine (procarbazine), dacarbazine (dacarbazine), Chlorambucil (chlorambucil), melphalan (melphalan), cis-platinum (cisplatin), NSC-241240 (carboplatin), oxaliplatin (oxaliplatin), bendamustine (bendamustine), carmustine (carmustine), mustargen (chloromethine), dacarbazine, fotemustine (fotemustine), lomustine (lomustine), Mannosulfan (mannosulfan), S 254 (nedaplatin), nimustine (nimustine), prednimustine (prednimustine), ranomustine (ranimustine), Satraplatin (satraplatin), semustine (semustine), U-9889 (streptozocin), Temozolomide (temozolomide), Treosulfan (treosulfan), triaziquone (triaziquone), triethylenemelamine (triethylene melamine), thiotepa (thiotepa), four nitric acid, three platinum (triplatin tetranitrate), Z-4828 (trofosfamide), uracil mustard (uramustine), little red mould (doxorubicin), daunomycin (daunorubicin), mitoxantrone (mitoxantrone), Etoposide (etoposide), Hycamtin (topotecan), teniposide (teniposide), Rinotecan (irinotecan), Ka Motuosha (camptosar), camptothecine (camptothecin), Belotecan (belotecan), rubitecan (rubitecan), vincristine(VCR) (vincristine), vincaleucoblastine (vinblastine), vinorelbine (vinorelbine), vindesine (vindesine), taxol (paclitaxel), Docetaxel (docetaxel), Ah cloth Kern (abraxane), ipsapirone (ixabepilone), La Ruotaxi (larotaxel), Ao Tataxi (ortataxel), Te Saitaxi (tesetaxel), Vinflunine (vinflunine), imatinib mesylate (imatinib mesylate), Sunitinib malate (sunitinib malate), Sorafenib Tosylate (sorafenib tosylate), the AMN107 hydrochloride monohydrate/, Ta Sina (tasigna), Sai Makeni (semaxanib), ZD6474 (vandetanib), PTK787 (vatalanib), vitamin A acid (retinoicacid), retinoic acid derivatives etc.).
In another embodiment, provide a kind of for determining the computer program that cancer exists and/or risk of cancer increases Mammals.this computer program typically comprises: (a) be used for providing the code from the sequence reading of the nucleic acid of a described mammiferous specimen, wherein said specimen can comprise from the genomic nucleic acids of cancer cell or precancerous cell and from the genomic nucleic acids that forms (planting system) cell, wherein these sequence readings provide with electronic format, (b) use a calculation element to be used for these sequence readings are compared with one or more chromosome reference sequences and the code of a plurality of sequence labels corresponding with these sequence readings is provided thus, (c) be used for identifying number from the sequence label of fetus and parent nucleic acid in the mode calculated for the interested chromosome segment that joins from the interested chromosome of one or more known amplifications or disappearance and related to cancer connection or known amplification or disappearance and related to cancer, wherein said chromosome or chromosome segment are selected from chromosome 1 to 22, X and Y with and section, and with the mode of calculating identify in this or these interested chromosome or interested chromosome segment each at least one normalization chromosome sequence or the code of the number of the sequence label of normalization chromosome segment sequence, wherein the number for each sequence label identified in this or these interested chromosome or interested chromosome segment is at least about 10, 000, (d) use for the number of each the described sequence label identified in described one or more interested chromosomes or interested chromosome segment and for the number of each the described sequence label identified in described normalization chromosome sequence or normalization chromosome segment sequence, with the mode of calculating calculate in described one or more interested chromosomes or interested chromosome segment each monosome or the code of section dosage, And (e) using said computing means of interest for one or more chromosomes or chromosomal regions of interest in the chromosome of each of said single dose of each one of said one or more chromosomes of interest or chromosomal region of interest in each of a corresponding threshold value, and thus in the sample to determine the presence or absence of aneuploidy code, wherein the presence of aneuploidy and / or for those interested in this or said chromosomes or chromosomal regions of interest identified in each of the increased number of sequence tags indicate the presence of cancer and / or cancer risk.In different embodiments, code is provided for carrying out the as mentioned instruction of (with hereinafter) described diagnostic method.
Treatment cancer experimenter's method also is provided.In certain embodiments, these methods comprise to be carried out as described herein a kind ofly in the method that Mammals identification cancer exists and/or risk of cancer increases, and the method is used from experimenter's a sample or received the result of these class methods that this sample is carried out; And when the method individually or with from combined and show when having neoplastic cell among the described experimenter for one or more other indexs of a kind of differential diagnosis of cancer, the treatment experimenter, or the experimenter is treated, to remove growth or the propagation of neoplastic cell and/or inhibition neoplastic cell.In certain embodiments, treating described experimenter comprises by operation and removes cell.In certain embodiments, the treatment experimenter comprises the experimenter is carried out radiotherapy or makes the execution radiotherapy, to kill described neoplastic cell.In certain embodiments, the treatment experimenter comprises and gives or make the experimenter be given anticarcinogen (horse trastuzumab for example, Erbitux, Wei Ke for than, Buddhist nun's trastuzumab, the horse trastuzumab, Victibix, Fluracil, capecitabine, 5-trifluoromethyl-2 '-deoxyuridine, methotrexate, Raltitrexed, pemetrexed, cytosine arabinoside, Ismipur, azathioprine, 6-thioguanine, pentostatin, fludarabine, CldAdo, floxuridine (FUDR), endoxan, knob is husky, ifosfamide, thiotepa, two (2-the chloroethyl)-1-nitrosourea of 1,3-, 1-(2-chloroethyl)-3-cyclohexyl-1-nitrosourea, altretamine, busulfan, Procarbazine, dacarbazine, Chlorambucil, melphalan, cis-platinum, NSC-241240, oxaliplatin, bendamustine, carmustine, mustargen, dacarbazine, fotemustine, lomustine, Mannosulfan, S 254, nimustine, prednimustine, ranomustine, Satraplatin, semustine, U-9889, Temozolomide, Treosulfan, triaziquone, triethylenemelamine, thiotepa, four nitric acid, three platinum, Z-4828, uracil mustard, little red mould, daunomycin, mitoxantrone, Etoposide, Hycamtin, teniposide, Rinotecan, Ka Motuosha, camptothecine, Belotecan, rubitecan, vincristine(VCR), vincaleucoblastine, vinorelbine, vindesine, taxol, Docetaxel, Ah cloth Kern, ipsapirone, La Ruotaxi, Ao Tataxi, Te Saitaxi, Vinflunine, imatinib mesylate, Sunitinib malate, Sorafenib Tosylate, the AMN107 hydrochloride monohydrate/, Ta Sina, Sai Makeni, ZD6474, PTK787, vitamin A acid, retinoic acid derivatives etc.).
The method of monitoring cancer experimenter's treatment also is provided.In different embodiments, these methods are included in before the treatment or are as described herein a kind of in Mammals identification cancer exists and/or risk of cancer increases method or receive the result of these class methods that this sample is carried out to carrying out from experimenter's a sample during the treatment; And the slightly late time during treating or treatment are rear to again carrying out the method or reception from second sample of experimenter to the result of these class methods of this second sample execution; The number of the positive course for the treatment of and the middle dysploidy of the measurement second time (for example comparing with for the first time measurement) is indicated in the number of the middle dysploidy of the wherein measurement second time (for example comparing with for the first time measurement) or severity reduction (for example the reduction of dysploidy frequency and/or some dysploidy reduce or do not exist) or severity is identical or the negative course for the treatment of is indicated in increase, and when described indication is negative, described treatment plan is adjusted to has more invasive treatment plan and/or palliative therapy scheme.
Also be provided at the method for determining the mark of fetal nucleic acid in the maternal sample of the mixture that comprises fetus and parent nucleic acid.In one embodiment, described for determining that at a maternal sample method of fetus mark comprises: as (a) to receive the sequence reading from the fetus in this parent specimen and parent nucleic acid; (b) these sequence readings and one or more karyomit(e) reference sequences are compared, and a plurality of sequence labels corresponding with these sequence readings are provided thus; (c) identify to come from and be selected from karyomit(e) 1 to 22, X and Y with and a number of those sequence labels of the one or more interested karyomit(e) of section or interested chromosome segment, and identify a number from those sequence labels of at least one normalization method chromosome sequence or normalization method chromosome segment sequence in this or these interested karyomit(e) or the interested chromosome segment each, to determine a karyomit(e) dosage or chromosome segment dosage, wherein, described one or more interested karyomit(e) or interested chromosome segment have the copy number variation; (d) use with step (c) in make a variation corresponding described karyomit(e) dosage or chromosome segment dosage of the described copy number identified determine described fetus mark.In some embodiments, the variation of described copy number is that the dosage by will each karyomit(e) in described one or more interested karyomit(e)s or the interested chromosome segment or chromosome segment compares with a respective threshold for each karyomit(e) in described one or more interested karyomit(e)s or the interested chromosome segment or chromosome segment, comes definite.The copy number variation can be selected from lower group, and this group is comprised of the following: complete chromosome duplication, complete chromosome deletion, partial replication, part multiplication, partial insertion and excalation.
In certain embodiments, the karyomit(e) in the step (c) or section Rapid Dose Calculation are the number of the sequence label identified for described selected interested karyomit(e) or section and the ratio of the number of the sequence label of identifying for corresponding at least one normalization method chromosome sequence or the normalization method chromosome segment sequence of selected interested karyomit(e) or section.In some embodiments, the karyomit(e) in the step (c) or section Rapid Dose Calculation are the ratio of the sequence label density ratio of at least one corresponding normalization method chromosome sequence of the described selected interested karyomit(e) of the sequence label density ratio of described selected interested karyomit(e) or section and each or section or normalization method chromosome segment sequence.
In certain embodiments, the method further comprises and calculates a normalization method karyomit(e) value (NCV), and wherein calculating this NCV, that this karyomit(e) dosage and the mean value of corresponding karyomit(e) dosage in a combination lattice sample are carried out is related, as:
NCV iA = R iA - R lU ‾ σ iU
Wherein And σ IUEstimation mean value and the standard deviation for i karyomit(e) dosage in this combination lattice sample accordingly, and R IAThat wherein said i karyomit(e) is described interested karyomit(e) for i karyomit(e) dosage that karyomit(e) calculates in the specimen.Then determine the fetus mark according to following formula:
ff=2×|NCV iACV iU|
Wherein ff is the fetus fractional value, NCV IAThe normalized karyomit(e) value on i karyomit(e) in an influenced sample, and CV IUBe the variation coefficient of i chromosomal dosage determining in described qualified samples, wherein said i karyomit(e) is described interested karyomit(e).
In certain embodiments, this fetus mark uses normalization method section value (NSV) to determine, wherein this NSV make this chromosome segment dosage with carry out at a mean value that makes up the corresponding chromosome segment dosage in the lattice sample related, as:
NSV iA = R iA - R lU ‾ σ iU
Wherein And σ IUEstimation mean value and the standard deviation for i chromosome segment dosage in this combination lattice sample accordingly, and R IAThat wherein said i chromosome segment is described interested chromosome segment for i chromosome segment dosage that chromosome segment calculates in the specimen.Then determine the fetus mark according to following formula:
ff=2×|NSV iACV iU|
Wherein ff is the fetus fractional value, NSV IAThe normalized chromosomal region segment value on i chromosome segment in an influenced sample, and CV IUBe the variation coefficient of the dosage of i chromosome segment determining in described qualified samples, wherein said i chromosome segment is described interested chromosome segment.
In certain embodiments, any one karyomit(e) of the X chromosome that described interested karyomit(e) is karyomit(e) 1-22 or male fetus, described interested chromosome segment is selected from the X chromosome of karyomit(e) 1-22 or male fetus.
In certain embodiments, be used for to determine that this at least one normalization method chromosome sequence of a plurality of embodiments of method of fetus mark or normalization method chromosome segment sequence are for a kind of interested karyomit(e) that is associated or section selected karyomit(e) or section, this carries out in the following manner, that is: (i) identification is for a plurality of qualified samples of this interested karyomit(e) or section; (ii) come for this selected karyomit(e) or section double counting karyomit(e) dosage or chromosome segment dosage with a plurality of potential normalization method chromosome sequences or normalization method chromosome segment sequence; And (iii) individually or in a kind of combination this normalization method chromosome sequence or normalization method chromosome segment sequence are selected, thereby in the karyomit(e) dosage that calculates or chromosome segment dosage, provided minimum variability or maximum resolvability.The normalization method chromosome sequence can be any one or an a plurality of monosome among karyomit(e) 1 to 22, X and the Y.Alternately, the normalization method chromosome sequence can be that any chromosomal group chromosome is same among karyomit(e) 1 to 22, X and the Y, and the normalization method sector sequence can be any one or an a plurality of single section among karyomit(e) 1 to 22, X and the Y.Alternately, the normalization method sector sequence can be any one or one group of a plurality of section among karyomit(e) 1 to 22, X and the Y.
In certain embodiments, the method for described definite fetus mark can also comprise that the fetus mark that will obtain as described compares with can using from the determined fetus mark of the information of unbalanced one or more polymorphisms of allelotrope in these fetuses that represent this parent specimen and the parent nucleic acid.Be used for to determine that the unbalanced method of allelotrope is described at the application's elsewhere, and comprise that the polymorphic difference used between fetus and the maternal gene group (include but not limited to detect difference) determines the fetus mark in SNP or STR sequence.
In certain embodiments, the method further comprises at least temporarily storage sequence reading.
A kind of additional method with the variation of the copy number in Fetal genome classification is provided.This extra method comprises: (a) obtain the sequence reading from the fetus in the parent specimen and parent nucleic acid; (b) these sequence readings and one or more karyomit(e) reference sequences are compared, and a plurality of sequence labels corresponding with these sequence readings are provided thus; (c) identify number from one or more interested chromosomal these sequence labels, and determine that first an interested karyomit(e) in this fetus makes a variation with a kind of copy number; (d) calculate first a fetus fractional value by a kind of the first method, this first method is not used the information from these first interested chromosomal these labels; (e) calculate second a fetus fractional value by a kind of the second method, this second method is used the information from these labels of this first chromosome; And (f) compare and use this relatively this copy number variation of this first chromosome to be classified this first fetus fractional value and this second fetus fractional value.
In certain embodiments, the first method of calculating fetus fractional value comprises described in the step (d) of this extra method: use from the information of unbalanced one or more polymorphisms of allelotrope in the fetus that represents this parent specimen and the parent nucleic acid and calculate this first fetus fractional value; The second method of calculating the fetus fractional value described in the step (e) of this extra method comprises: (a) calculate number from the sequence label of this first interested karyomit(e) and at least one normalization method chromosome sequence to determine karyomit(e) dosage; And (b) use this second method from this fetus fractional value of this karyomit(e) Rapid Dose Calculation.
In certain embodiments, the information of this first method use comprises By to predetermined polymorphic The sequence label that sequence checks order and obtains, each of described polymorphic sequence comprises described one or more many The attitude site.In certain embodiments, the information that this first method is used obtains by non-sequence measurement, such as obtaining by methods such as qPCR, digital pcr, mass spectrometry or capillary gel electrophoresises.
In certain embodiments, this first method comprises using and comes from karyomit(e) with copy number variation or this first fetus fractional value of tag computation of chromosome segment.For instance, when this first interested karyomit(e) was karyomit(e) 21, the determined fetus mark of sequence label that use can be come from karyomit(e) 21 compared with the determined fetus mark of sequence label that basis comes from the chromosome x in the male fetus.Knownly occur or determined it is not that any karyomit(e) or the chromosome segment of aneuploid (for example determining by calculating its NCV or NSV) may be used to determine the first fetus mark by any method described here with aneuploid state.
In certain embodiments, the karyomit(e) that the second method is determined in the step (e) or section Rapid Dose Calculation are the number of the sequence label identified for described selected interested karyomit(e) or section and the ratio of the number of the sequence label of identifying for corresponding at least one normalization method chromosome sequence or the normalization method chromosome segment sequence of selected interested karyomit(e) or section.In certain embodiments, the described karyomit(e) dosage of determining in the step (e) or section Rapid Dose Calculation are the ratio of the sequence label density ratio of at least one corresponding normalization method chromosome sequence of the described selected interested karyomit(e) of the sequence label density ratio of described selected interested karyomit(e) or section and each or section or normalization method chromosome segment sequence.
Some embodiment of the method that this is extra further comprises calculates a normalized karyomit(e) value (NCV), wherein this second method is used this normalized karyomit(e) value, and wherein calculate this NCV this karyomit(e) dosage is associated with the average of corresponding karyomit(e) dosage in a combination lattice sample, as:
NCV iA = R iA - R lU ‾ σ iU
Wherein
Figure BDA00002366924900332
And σ IUEstimation mean value and the standard deviation for i karyomit(e) dosage in this combination lattice sample accordingly, and R IAThat wherein said i karyomit(e) is described interested karyomit(e) for i karyomit(e) dosage that karyomit(e) calculates in the specimen.
In certain embodiments, this second method of calculating this fetus fractional value comprises the following formula evaluation:
ff=2×|NCV iACV iU|
Wherein ff is the fetus fractional value, NSV IAThe normalized karyomit(e) value on i karyomit(e) in an influenced sample or specimen, and CV IUBe the variation coefficient of i chromosomal dosage determining in described qualified samples, wherein said i karyomit(e) is described interested karyomit(e).
In certain embodiments, the first method of described calculating fetus mark comprises that (a) calculates the sequence label number from described the non-described first chromosomal karyomit(e) interested and at least one normalization method chromosome sequence, to determine this non-described first chromosomal karyomit(e) dosage interested; And (b) by this first method from this first fetus fractional value of this karyomit(e) Rapid Dose Calculation; Described the second method comprises: (a) calculate sequence label number from this first interested karyomit(e) and at least one normalization method chromosome sequence to determine a karyomit(e) dosage; And (b) by this second method from this second fetus fractional value of this karyomit(e) Rapid Dose Calculation.
Preferably, karyomit(e) or section Rapid Dose Calculation are the number of the sequence label identified for described selected interested karyomit(e) or section and the ratio of the number of the sequence label of identifying for corresponding at least one normalization method chromosome sequence or the normalization method chromosome segment sequence of selected interested karyomit(e) or section; Perhaps, karyomit(e) dosage or section Rapid Dose Calculation are the ratio of the sequence label density ratio of at least one corresponding normalization method chromosome sequence of the described selected interested karyomit(e) of the sequence label density ratio of described selected interested karyomit(e) or section and each or section or normalization method chromosome segment sequence.
Preferably, the extra method that should be used for the variation of classification copy number also comprise calculates corresponding normalization method karyomit(e) value (NCV), and the first method and the second method are used corresponding NCV.Calculate NCV the karyomit(e) dosage of determining is associated with a mean value that makes up the corresponding karyomit(e) dosage in the lattice sample, as:
NCV iA = R iA - R lU ‾ σ iU
Wherein
Figure BDA00002366924900342
And σ IURespectively estimation mean value and the standard deviation for i in this combination lattice sample chromosomal dosage, and R IAI chromosomal dosage in the specimen of calculating.The first method and the second method can use NCV to calculate the fetus mark, by the following formula evaluation:
ff=2×|NCV iACV iU|
Wherein ff is the fetus fractional value, NCV IAThe normalized karyomit(e) value on i karyomit(e) in the described specimen, and CV IUIt is the variation coefficient of i chromosomal dosage in the described qualified samples.In above-mentioned formula, for first method, described i karyomit(e) is not the described first interested karyomit(e); For being used for this second method, described i karyomit(e) is the described first interested karyomit(e).
This first interested karyomit(e) is selected from lower group, and this group is comprised of karyomit(e) 1 to 22, X and Y.Described the non-described first chromosomal karyomit(e) interested can be karyomit(e) 1 to 22 any one, or be X chromosome when fetus is the male sex.
In certain embodiments, step (f) comprises and determines whether approximately equal of these two fetus fractional values.In certain embodiments, step (f) further comprises: determine that when this two fetus fractional value approximately equals a kind of ploidy hypothesis that implies in this second method is real.This ploidy hypothesis that implies in the second method can be that this first interested karyomit(e) has a kind of complete chromosomal aneuploidy.For example, the first interested chromosomal complete chromosomal aneuploidy is a kind of monosomy or a kind of trisomy.
In certain embodiments, the additional method that is used for the variation of classification copy number further comprises a step (g): analyze this first interested chromosomal this label information, to determine whether that (i) first interested karyomit(e) is with a kind of part dysploidy, or (ii) at these two fetus fractional values not during approximately equal, this fetus is a mosaic.
In certain embodiments, wherein said the first method comprises using from the information of unbalanced one or more polymorphisms of allelotrope in the fetus that represents this parent specimen and the parent nucleic acid calculates this first fetus fractional value, and described polymorphism is present in the non-described first chromosomal karyomit(e) interested; Comprise using from the information of unbalanced one or more polymorphisms of allelotrope in the fetus that represents this parent specimen and the parent nucleic acid with described the second method and calculate this second fetus fractional value, described polymorphism is present in the described first interested karyomit(e).Being used for step (f) relatively can comprise: determine that when the ratio of described the second fetus fractional value and the first fetus fractional value is approximately 1 the described first interested karyomit(e) is diploid; When being approximately 1.5, the ratio of described the second fetus fractional value and the first fetus fractional value determines that the described first interested karyomit(e) is triploid; With, when being approximately 0.5, the ratio of described the second fetus fractional value and the first fetus fractional value determines that the described first interested karyomit(e) is monoploid.The ratio that the additional method that makes a variation for the classification copy number can further include when the second fetus fractional value and the first fetus fractional value is not when being approximately 1,1.5 or 0.5, analyze the step (g) of described the first interested chromosomal label information, determining whether (i) first interested karyomit(e) with a kind of part dysploidy, or (ii) this fetus is a mosaic.
In certain embodiments, utilize the first method of polymorphism and the information of the second method use to comprise the sequence label that obtains by predetermined polymorphic sequence is checked order, each of described polymorphic sequence comprises described one or more polymorphic site.Perhaps, utilizing the first method of polymorphism and the information of the second method use is not to obtain by sequence measurement, such as obtaining by the non-sequence measurement such as qPCR, digital pcr, mass spectrometry or capillary gel electrophoresis.
In certain embodiments, the step (g) of analyzing the first interested chromosomal label information comprising: (a) this first interested chromosomal this sequence is cased into a plurality of parts; (b) determining whether in the described part any comprises than one or more other parts significantly more manys or significantly still less nucleic acid; And, (c) comparing with one or more other parts, if any one of described part contains significantly more or during nucleic acid significantly still less, determine that this first interested karyomit(e) is with a kind of part dysploidy; Perhaps comparing with one or more other parts, if described part does not all comprise significantly more or during nucleic acid significantly still less, determine that this fetus is a mosaic.Therefore, this extra method may further include determine to comprise than one or more other parts significantly more many or first an interested chromosomal part of significantly still less nucleic acid with the part dysploidy.
The step (f) that is used for the method that the copy number variation is classified comprises this copy number variation is categorized into and be selected from a classification of lower group that this group is comprised of the following: complete chromosome duplication or multiplication, complete chromosome deletion, chromosome dyad copies and chromosome dyad lacks and mosaic.
Determine that in the step (f) that the first fetus fractional value and the second fetus fractional value are compared the method further comprises in the first fetus fractional value and the not approximately equalised embodiment of the second fetus fractional value:
(i) determine that this copy number variation is to be caused by part dysploidy or mosaic; And
(ii) when this copy number variation is caused by the part dysploidy, determine the locus of the part dysploidy on this first interested karyomit(e).
In certain embodiments, the locus of determining the part dysploidy on this first interested karyomit(e) comprises these first interested chromosomal these sequence labels is divided into nucleic acid case or matrix in this first interested karyomit(e); And these map tags in each case are counted.
In certain embodiments, (b) step of middle comparison comprises that comparison is at least about 100 ten thousand readings.
Any method described here can further comprise the fetus in the parent specimen and parent nucleic acid (for example Cell-free DNA) are checked order to obtain the sequence reading.Parent from the parent specimen is checked order to produce the sequence reading with fetal nucleic acid comprise extensive parallel order-checking.In certain embodiments, extensive parallel order-checking is the synthesis method order-checking.The synthesis method order-checking can use reversible dyestuff terminator to realize.In other embodiments, extensive parallel order-checking is the connection method order-checking.In other other embodiments, extensive parallel order-checking is single-molecule sequencing.
Can be used for determining that the maternal sample of fetus mark comprises blood, blood plasma, serum or urine samples according to method described here.In certain embodiments, maternal sample is plasma sample.In other embodiments, maternal sample is whole blood sample.
A plurality of different equipment also are provided, comprise for the equipment that sample is carried out medical analysis (for example maternal sample), and these equipment are in order to carry out a plurality of steps of aforesaid method, for example be used for individually determining the copy number variation, be used for determining the fetus mark, or be used for the copy number variation is classified.
Test kit also is provided, these test kits comprise can be individually or with the reagent that is used for determining being used for determining in one of two genomes Combination of Methods on the impact (for example fetus mark in the maternal sample) of the mixture that derives from these two genomic nucleic acid the copy number variation.These test kits can be combined with equipment described here.
Although relate to the mankind and these wording mainly is for human problem at these these examples, concept described here also is applicable to the genome from any plant or animal.
Brief Description Of Drawings
Fig. 1 is the schema of method 100, and the method is used for determining existence or not having the copy number variation in the specimen of the mixture that comprises nucleic acid.
Fig. 2 describe according to as the Yi Luna described here technical process that scheme, simple scheme (ABB), two-step approach and single stage method prepare sequencing library of not abridging." P " represents purification step; And " X " indication does not comprise that purification step and/or DNA repair.
Fig. 3 describes the technical process for the embodiment of the method for preparing sequencing library at solid surface.
Fig. 4 shows the schema for an embodiment 400 of the method for the integrity of verifying a sample that carries out multistep single channel order-checking biological test.
Fig. 5 shows the schema for an embodiment 500 of the method for the integrity of verifying a plurality of samples that carry out the multiple order-checking biological test of multistep.
Fig. 6 is for the schema of determining to exist or do not exist simultaneously the method 600 of dysploidy and fetus mark in the parent specimen of the mixture that comprises fetus and parent nucleic acid.
Fig. 7 is the apart of using extensive parallel sequencing or polymorphic nucleotide sequence, determines the schema of the method 700 of fetus mark in the parent specimen of the mixture that comprises fetus and parent nucleic acid.
Fig. 8 is for the schema of determining to exist or do not exist simultaneously the method 800 of fetus dysploidy and fetus mark in the Maternal plasma specimen of the polymorphic nucleic acid of enrichment.
Fig. 9 is for the schema of determining to exist or do not exist simultaneously the method 900 of fetus dysploidy and fetus mark in the parent purifying cfDNA of the polymorphic nucleic acid of enrichment specimen.
Figure 10 is for from determine to exist or do not exist simultaneously the schema of the method 1000 of fetus dysploidy and fetus mark derived from the sequencing library of the fetus of parent specimen and the polymorphic nucleic acid of enrichment and parent nucleic acid structure.
Figure 11 be general introduction by the extensive parallel order-checking shown in Fig. 7, determine the schema of alternate embodiment of the method for fetus mark.
Figure 12 is the column diagram of showing in order to the identification of the fetus of determining the fetus mark in specimen and the polymorphic sequence of parent (SNP).The sum (Y-axis) of showing the sequence reading that is mapped to the SNP sequence of identifying by rs number (X-axis), and the relative content of fetal nucleic acid (*).
Figure 13 describes the fetus of set genome position and the block diagram that parent is joined the classification of type state.
Figure 14 shows the comparison of using mixture model and known fetal mark and estimating the result that the fetus mark produces.
Figure 15 illustrates by use has the estimation of error that make the order-checking base position on 30 paths of the Eland of default parameter and the Yi Luna GA2 data that human genome HG18 compares.
Figure 16 shows that use machine error rate can make point of upper inclined to one side minimizing as known parameters.
Figure 17 shows and uses the machine error rate as known parameters, and the simulated data of intensive conditions 1 and 2 error models makes the point of less than that reduces to greatly upper bigger than normal that is lower than 0.2 fetus mark.
Figure 18 describes by the fetus fractional value that relatively calculates with the two kinds of different technologies schema with the method for CNV classification.
Figure 19 is for the processing specimen and finally makes the block diagram of the discrete system of diagnosis.
How many different operations when the processing specimen of Figure 20 schematic presentation can be processed in groups by the different elements of system.
Figure 21 A and 21B show the electrophorogram according to the cfDNA sequencing library of scheme (Figure 21 B) preparation of describing among the simple scheme of describing among the example 2a (Figure 21 A) and the example 2b.
Figure 22 A provides displaying as the simple scheme (ABB of basis to 22C; When ◇) preparing sequencing library and when repairing two-step approach (INSOL according to nothing; ) preparation is mapped to mean value (the n=16) (%ChrN of the total per-cent of everyone chromosomal sequence label during sequencing library; Figure 22 A) and sequence label per-cent as the figure of the function (Figure 22 B) of chromosome size.Figure 22 C shows that the ratio per-cent of the label that shines upon when using two-step approach to prepare the library and the label that uses simple (ABB) to obtain when legal system is made the library is as the function of chromosomal GC content.
Figure 23 A and 23B show the column diagram of average and standard deviation that label per-cent is provided, and these label mappings are to chromosome x (Figure 23 A from 10 samples from the cfDNA of 10 pregnant woman's blood plasma purifying are checked order and obtain; %ChrX) and Y (Figure 23 B; %ChrY).Figure 23 A shows that the number of tags that is mapped to X chromosome when using without restorative procedure (two go on foot) is larger than the number of tags of using simple method (ABB) to obtain.Figure 23 B show to use the label per-cent of label per-cent during with the simple method of use (ABB) that is mapped to Y chromosome when repairing two-step approach not have different.
Figure 24 show with reference to the number in the upper non-eliminating site (NE site) of genome (hg18) be mapped to each the ratio of sum of label in non-eliminating site of 5 samples, cfDNA prepare from these samples and according to (two go on foot without recovery scenario in the simple scheme (ABB) (solid post) of describing in the example 2, the solution; Open tubular column) and solid surface without recovery scenario (one the step; The grey post) in order to construct sequencing library.
Figure 25 A and 25B show as the simple scheme (ABB of basis; ◇) when solid surface preparation sequencing library, when according to when repairing two-step approach () preparation sequencing library and when according to without mean value (the n=5) (%ChrN that repairs the total per-cent that is mapped to everyone chromosomal sequence label when single stage method (△) prepares the library; Figure 25 A) and sequence label per-cent as the figure of the function (Figure 25 B) of chromosome size.From the simple scheme (ABB of basis; ◇) (two go on foot without recovery scenario with solid surface; ) the regression coefficient of the map tags of the sequencing library of preparation acquisition.Figure 25C shows two steps from the restoration program under the no sequencing library preparation obtained by mapping each chromosome sequence tags, from simple solutions based on (ABB) sequencing library preparation obtained the label of each chromosome ratio percentage, as per a chromosome function of the percentage of GC content (◇), and from under the no-repair sequencing library preparation step program to get every one chromosome mapping sequence tags, from simple programs according to (ABB) sequencing library preparation obtained each chromosome the percentage ratio of the label, as per the percentage GC content of a chromosome function (□).
Figure 26 A and 26B show the average of label per-cent and the comparison of standard deviation, and these label mappings are to according to ABB method, two-step approach and single stage method chromosome x (Figure 26 A) and the Y (Figure 26 B) from 5 samples from the cfDNA of 5 pregnant woman's blood plasma purifying are checked order and obtain.Figure 26 A shows that the number of tags that is mapped to X chromosome when using without restorative procedure (two steps and one go on foot) is larger than the number of tags of using simple method (ABB) to obtain.Figure 26 B show to use the label per-cent of label per-cent during with the simple method of use that is mapped to Y chromosome when repairing two-step approach with single stage method not have different.
Figure 27 A and 27B show for 61 clinical samples (Figure 27 A) that use the ABB method to prepare in solution and use 35 study samples (Figure 27 B) that prepare without repairing solid surface (SS) single stage method, will be associated with the amount of gained library product in order to the amount of the purifying cfDNA for preparing sequencing library.
Figure 28 shows in order to the amount of the cfDNA that makes the library and the dependency of the amount of the library product that uses two steps (), ABB (◇) and step (△) method to obtain.
Figure 29 shows when using a step (open tubular column) and two steps (solid post) preparation index library acquisition and the per-cent of the index sequence reading that checks order as 6 clumps (6 index sample/flow cell paths).
Figure 30 A and 30B are average (the n=42) (%ChrN that shows the total per-cent that is mapped to everyone chromosomal sequence label when the index sequencing library checks order in the solid surface preparation and as 6 clumps according to single stage method; Figure 30 A) and gained sequence label per-cent as the figure of the function (Figure 30 B) of chromosome size.
Figure 31 displaying is mapped to the sequence label per-cent (ChrY) of Y chromosome with respect to the label per-cent (ChrX) that is mapped to X chromosome.
Figure 32 A and 32B have showed that from the distribution of the karyomit(e) dosage of determined karyomit(e) 21 that cfDNA is checked order cfDNA extracts from one group of 48 blood sample, and these samples are obtained from the human experimenter who nourishes separately the sex fetus.For karyomit(e) 1-12 and X (Figure 32 A) and for karyomit(e) 1-22 and X (Figure 32 B), will be depicted as (△) for dosage and trisomy 21 specimen of qualified (that is: normal for karyomit(e) 21 (O)) karyomit(e) 21.
Fig. 3 has showed that from the distribution of the karyomit(e) dosage of determined karyomit(e) 18 that cfDNA is checked order cfDNA extracts from one group of 48 blood sample, and these samples are obtained from the human experimenter who nourishes separately the sex fetus.Show for qualified (that is: normal for karyomit(e) 18 (the O)) dosage of karyomit(e) 18 and specimen of trisomy 18 (△) for karyomit(e) 1-12 and X (Figure 33 A) and for karyomit(e) 1-22 and X (Figure 33 B).
Figure 34 A and 34B have showed that from the distribution of the karyomit(e) dosage of determined karyomit(e) 13 that cfDNA is checked order cfDNA extracts from one group of 48 blood sample, and these samples are obtained from the human experimenter who nourishes separately the sex fetus.For karyomit(e) 1-12 and X (Figure 34 A), and show for qualified (that is: normal for karyomit(e) 13 (the O)) dosage of karyomit(e) 13 and specimen of trisomy 13 (△) for karyomit(e) 1-22 and X (Figure 34 B).
Figure 35 A and 35B have showed that from the distribution of the karyomit(e) dosage of determined chromosome x that cfDNA is checked order cfDNA extracts from one group of 48 test blood samples, and these samples are obtained from the human experimenter who nourishes separately the sex fetus.Show (46, the XY for the male sex for karyomit(e) 1-12 and X (Figure 35 A) and for karyomit(e) 1-22 and X (Figure 35 B); (O)), women (46, XX; Chromosome x dosage (△)), monosomy X (45, X; (+)), and the sample of complex karyotype (Cplx (X)).
Figure 36 A and 36B have showed that from the distribution of the karyomit(e) dosage of determined karyomit(e) Y that cfDNA is checked order cfDNA extracts from one group of 48 test blood samples, and these samples are obtained from the human experimenter who nourishes separately the sex fetus.Show (46, the XY for the male sex for karyomit(e) 1-12 (Figure 36 A) and for karyomit(e) 1-22 (Figure 36 B); (△)), women (46, XX; (O)) karyomit(e) Y dosage, monosomy X (45, X; (+)), and the sample of complex karyotype (Cplx (X)).
Figure 37 shows for from Figure 32 A and 32B, 33A and 33B, and the variation coefficient (CV) of the dosage that illustrates respectively of 34A and the 34B karyomit(e) 21 (■), 18 (●) and 13 (▲) that determine.
Figure 38 shows the chromosome x (■) determined for the dosage that illustrates respectively and the variation coefficient (CV) of Y (●) from Figure 35 A and 35B and 36A and 36B.
The cumulative bad that Figure 39 shows the GC part of human chromosomal distributes.Longitudinal axis representative has the chromosomal frequency of the GC content that is lower than the value that illustrates on the transverse axis.
Figure 40 showed for the sequence dosage (Y-axis) from the section of determined karyomit(e) 11 (81000082-103000103bp) that cfDNA is checked order, and cfDNA extracts from one group 7 qualified samples (O) that obtain with from 1 specimen (◆) of conceived human experimenter.Identified the sample from an experimenter, this experimenter nourishes one with the fetus of a kind of part dysploidy of karyomit(e) 11 (◆).
Figure 41 A-41E has showed, with respect to the standard deviation of the mean value (Y-axle) of the homologue in unaffected sample, for the distribution of the normalized karyomit(e) dosage of karyomit(e) 21 (41A), karyomit(e) 18 (41B), karyomit(e) 13 (41C), chromosome x (41D) and karyomit(e) Y (41E).
Figure 42 shows the normalization method karyomit(e) that uses described in example 12, for the karyomit(e) 21 (zero), 18 (△) of determining in the sample in coming self-training group 1 and the normalized karyomit(e) value of 13 ().
Figure 43 shows the normalization method karyomit(e) that uses described in example 12, for the karyomit(e) 21 (zero), 18 (△) of determining in from the sample in the test group 1 and the normalized karyomit(e) value of 13 ().
Figure 44 shows the method for normalizing that uses the people such as Chiu (Zhao), and (number of the sequence label that number and the residue karyomit(e) in sample of interested karyomit(e) institute recognition sequence label is obtained carries out normalization method, referring to the example 13 at the application's elsewhere), for from the karyomit(e) 21 (zero) of determining in the sample of test group 1 and the normalized karyomit(e) value of 18 (△).
Figure 45 shows the normalization method karyomit(e) (described in example 13) that use is systematically determined, for the karyomit(e) 21 (zero), 18 (△) of determining in the sample that comes self-training group 1 and the normalized karyomit(e) value of 13 ().
Figure 46 shows the normalized karyomit(e) value of chromosome x (X-axis) and Y (Y-axis).Described in arrow points such as the example 13,5 (Figure 46 A) that in training set and test set, identifies respectively and 3 (Figure 46 B) X monosomy samples.
Figure 47 shows the normalization method karyomit(e) (described in example 13) that use is systematically determined, for from the karyomit(e) 21 (zero), 18 (△) of determining in the sample of test group 1 and the normalized karyomit(e) value of 13 ().
Figure 48 shows the normalization method karyomit(e) (described in example 13) that use is systematically determined, for the normalized karyomit(e) value from the karyomit(e) 9 (zero) of determining in the sample of test group 1.
Figure 49 shows the normalization method karyomit(e) (described in example 13) that use is systematically determined, for the normalized karyomit(e) value from the karyomit(e) 1-22 that determines in the sample of test group 1.
Figure 50 shows the design (A) of the research described in the example 16 and the schema of stochastic sampling scheme (B).
Figure 51 A shows the schema of the gender analysis (being respectively that Figure 51 D is to 51F) of karyomit(e) 21,18 and 13 analysis (being respectively that Figure 51 A is to 51C) and women, the male sex and X monosomy to 51F.Ellipse comprises that rectangle comprises results of karyotype from the result from the order-checking information acquisition in laboratory, and the rectangle with fillet is showed in order to determine the comparative result of test performance (susceptibility and specificity).Dotted line among Figure 51 A and the 51B represents the relation between the mosaic sample of T21 (n=3) and T18 (n=1), and these samples are inspected by the analysis of karyomit(e) 21 and 18 respectively, but correctly determines described in example 16.
Figure 52 show needle is to the specimen of the research described in the example 16, and the normalized karyomit(e) value (NCV) of karyomit(e) 21 (●), 18 (■) and 13 (▲) contrasts the caryogram classification relation.Circular sample represents to have the unfiled sample of trisomy caryogram.
Figure 53 shows the caryogram classification relation of normalized karyomit(e) value (NCV) contrast Gender Classification of chromosome x of the specimen of the research described in the example 16.Show sample (zero) with women's caryogram, have the sample (●) of male sex's caryogram, the sample (■) that has the sample () of 45, X and have other caryogram (being XXX, XXY and XYY).
Figure 54 shows the specimen for the clinical study described in the example 16, the figure of the normalized karyomit(e) value relation of the normalized karyomit(e) value counterstain body X of karyomit(e) Y.Show euploid masculinity and femininity sample (zero), XXX sample (●), 45, X sample (X), XYY sample (■) and XXY sample (▲).Be used for the threshold value with sample classification described in dash lines show such as the example 16.
Figure 55 schematic presentation CNV described here determines an embodiment of method.
Figure 56 shows from example 17, " ff " per-cent (ff of the dose determination of use karyomit(e) 21 in comprising from the synthetic maternal sample (1) of the DNA of the child with trisomy 21 21) as " ff " per-cent (ff of the dose determination that uses chromosome x X) the figure of function.
Figure 57 shows from example 17, " ff " per-cent (ff of the dose determination of use karyomit(e) 7 in the synthetic maternal sample (2) of the DNA that comprises the child who carries karyomit(e) 7 excalations from euploid mother and its 7) as " ff " per-cent (ff of the dose determination that uses chromosome x X) the figure of function.
Figure 58 shows from example 17, " ff " per-cent (ff of the dose determination of use karyomit(e) 15 in the synthetic maternal sample (3) of the DNA that comprises the 25% mosaic child who has karyomit(e) 15 partial replications from euploid mother and its 15) as " ff " per-cent (ff of the dose determination that uses chromosome x X) the figure of function.
Figure 59 shows from example 17, uses " ff " per-cent (ff of the dose determination of chromosome 22 in Artificial sample (4) 22) and from the figure of the NCV of its acquisition, this Artificial sample comprises 0% child DNA (i), with from the known 10%DNA (ii) that does not have the uninfluenced twin son of chromosome 22 chromosome dyad dysploidy, and from known 10%DNA (iii) with influenced twin son of chromosome 22 chromosome dyad dysploidy.
Figure 60 shows from example 18, the figure of the CNffx contrast CNff21 relation of determining in the sample that comprises fetus T21 trisomy.
Figure 61 shows from example 18, the figure of the CNffx contrast CNff18 relation of determining in the sample that comprises fetus T18 trisomy.
Figure 62 shows from example 18, the figure of the CNffx contrast CNff13 relation of determining in the sample that comprises fetus T13 trisomy.
Figure 63 shows from example 19, the figure of the NCV value of karyomit(e) 1 to 22 and X in specimen.
Figure 64 shows the fetus mark that obtains for the sample with female child of suffering from T21 in the example 18.
Figure 65 shows an a kind of embodiment of medical analysis equipment, and this medical analysis equipment is for the fetus mark of determining as the function of the existing copy number variation of Fetal genome.
Figure 66 show to be used for determines that the fetus mark is with an embodiment of a kind of medical analysis equipment that the copy number variation of Fetal genome is classified.
Figure 67 shows a kind of test kit, and this test kit comprises the check contrast agents and is used for the reagent of tracking with the integrity of verifying the parent cfDNA sample that carries out extensive parallel order-checking.
Figure 68 shows a kind of test kit, and this test kit comprises blood collection device, DNA extraction reagent and is used for the contrast agents of check mother body D NA sample.
Figure 69 (A, B, C) shows that the inherent positive control [ ] checked for karyomit(e) 13,18 and 21 copy number variation and the NCV of maternal sample [◇] scheme.
Describe in detail
Disclosed embodiment relates to the copy number variation (CNV) that several different methods, equipment and system are used for determining in the specimen that comprises nucleic acid mixture interested sequence, and is known or suspect that these nucleic acid are different in the amount of interested one or more sequences.>interested sequence for example comprises scope from kilobase (kb) to megabasse (Mb) to whole chromosomal genome sector sequence, and is known or suspect that these sequences are associated with hereditary situation or disease situation.The example of interested sequence comprises the karyomit(e) (for example trisomy 21) that is associated with the dysploidy of knowing and the chromosomal section that increases, for example partial trisomy in acute myelocytic leukemia 8 in disease (such as cancer).(for example: 45 comprise euchromosome 1-22 and sex chromosome X and Y according to the confirmable CNV of present method, X, 47, XXX, 47, XXY and 47, XYY) any one in or a plurality of monosomy and trisomys, other karyomit(e) polysomies, namely tetrasomy and five body constituents (include but are not limited to: XXXX, XXXXX, XXXXYWith XYYYY), and the disappearance of any or a plurality of section in these karyomit(e)s and/or copy.
The method is a kind of statistical method, and is that this statistical method is implemented at one or more treaters and the process that will be derived from is relevant, the cumulative bad variability of the variability of interchromosomal (same round) and order-checking (between the round) between processing is taken into account.These methods are applicable to determine CNV and the CNV known or that suspection is relevant with the plurality of medical patient's condition of any fetus dysploidy.
Except as otherwise noted, enforcement of the present invention relates to routine techniques and the device that is generally used for molecular biology, microbiology, protein purification, protein engineering, albumen and dna sequencing and recombinant DNA field, and these are all in the technology of this area.This type of technology and device are known for those of ordinary skills, and (for example be illustrated in numerous files and reference works, see the people such as Sambrook (Pehanorm Brooker), " Molecular Cloning:A Laboratory Manual (molecular cloning experiment guide) ", the third edition (Cold Spring Harbor (cold spring port)), [2001]); And the people such as Ausubel (Su Beier difficult to understand), " Current Protocols in Molecular Biology (up-to-date experimental methods of molecular biology compilation) " [1987].
Numerical range comprises the numerical value that limits this scope.Run through each greatest measure limit that this specification sheets provides being intended that of this and comprise the numerical value limit that each is lower, clearly write out at this as this type of low numerical value limit.Run through each minimum value limit that this specification sheets provides and to comprise the numerical value limit that each is higher, clearly write out at this as this type of high value limit.Run through each numerical range that this specification sheets provides and to comprise each the narrower numerical range that drops in this type of wider numerical range, all write out clearly as this type of narrower numerical range herein.
Be not intended to limit this disclosure at this title that provides.
Unless define separately at this, all technology and term science all have the identical meanings that a those of ordinary skill in the field is understood usually under the present invention as used herein.Comprised that the different science dictionaries at this term that comprises are to know and are obtainable for those skilled in the art.Although similar or be equivalent to any method of those methods described herein and material and material implementing or testing in the embodiment disclosed here and found purposes, some preferred method and materials only have been described.
Directly the term of definition is illustrated more completely by this specification sheets is consulted namely as a whole hereinafter.Should be understood that this disclosure content is not limited to illustrated concrete grammar, rules and reagent because these can change, they by those skilled in the art according to the use of getting off of its situation.
Definition
As used in this, the term of odd number " ", " a kind of " and " being somebody's turn to do " comprise plural reference, unless context clearly indicates in addition.Except as otherwise noted, accordingly, nucleic acid be by 5 ' from left to right write and aminoacid sequence is from left to right to write to the carboxyl direction by amino to 3 ' direction.
Term " assessment " is when refer to that when this uses the state representation with karyomit(e) or section dysploidy is one of three types judgement: " normally " or " uninfluenced ", " influenced " and " without judging " in the situation of the CNV of analysis of nucleic acids sample.Judge that normal and affected threshold value typically arranges.Parameter relevant with dysploidy in the sample is measured, and these observed values and threshold value are compared.For the dysploidy of copy type, if karyomit(e) or section dosage (or other observed values of sequence content) surpass for influenced sample set define threshold value, judge so influenced.For these dysploidy, if karyomit(e) or section dosage are lower than for the set threshold value of normal sample, judge so normal.By contrast, for the dysploidy of disappearance type, if karyomit(e) or section dosage are lower than the threshold value that defines of influenced sample, judge so influenced, and if karyomit(e) or section dosage surpass for the set threshold value of normal sample, judge so normal.For instance, in the presence of trisomy, be lower than the reliability thresholds that the user defines by for example isoparametric value of test chromosome dosage, determine " normally " judgement, and by surpassing the reliability thresholds that the user defines such as parameters such as test chromosome dosage, determine " influenced " judgement.Between the threshold value that is positioned at " normally " or " influenced " judgement such as parameters such as test chromosome dosage, determine the result of " without judging ".Term " without judging " and " unfiled " Alternate.
Term " copy number variation " this refer to qualified samples in the copy number of the nucleotide sequence that exists compare the variation of the copy number of the nucleotide sequence that exists in the specimen.In certain embodiments, nucleotide sequence is 1kb or larger.In some cases, nucleotide sequence is whole chromosome or its integral part." copy number varient " refers to compare by the expection content with interested sequence and interested sequence in the specimen, finds the nucleotide sequence of copy number difference.For instance, the content with the interested sequence that exists in the content of interested sequence in the specimen and the qualified samples compares.Copy number varient/variation comprises disappearance (comprising micro-deleted), inserts (comprising little insertion), copies, multiplication, inversion, transposition and the variation of complicated multi-position.CNV is contained chromosomal aneuploidy and part dysploidy.
Term " dysploidy " refers to by loss or obtains whole karyomit(e) or a chromosomal part and the imbalance of the genetic material that causes at this.
Term " karyomit(e) dysploidy " and " complete karyomit(e) dysploidy " refer to by loss or obtain whole karyomit(e) and the imbalance of the genetic material that causes at this, and comprise kind being dysploidy and mosaic dysploidy.
Term " part dysploidy " and " chromosome dyad dysploidy " refer to by losing or (for example obtaining a chromosomal part at this, partial monosomy and partial trisomy) and the imbalance of the genetic material that causes, and contain the imbalance that is caused by transposition, deletion and insertion.
Term " dysploidy sample " refers to show that at this karyomit(e) content of experimenter is not an euploid sample, that is: this sample shows that an experimenter is with the unusual copy number of karyomit(e) or karyomit(e) part.
Term " dysploidy karyomit(e) " refers to a kind of karyomit(e) at this, and it is known or be determined to be among the sample that is present in a unusual copy number.
Term " a plurality of/multiple " refer to above one at this.For instance, this term is using the marked difference that is enough to copy number variation (for example karyomit(e) dosage) in Recognition test sample and the qualified samples under the method disclosed here at this in order to the number that refers to nucleic acid molecule or sequence label.In some embodiments, for each specimen obtained to be included in about 20 and the 40bp reading between at least about 3x 10 6Individual sequence label, at least about 5x 10 6Individual sequence label, at least about 8x 10 6Individual sequence label, at least about 10x 10 6Individual sequence label, at least about 15x 10 6Individual sequence label, at least about 20x 10 6Individual sequence label, at least about 30x 10 6Individual sequence label, at least about 40x 10 6Individual sequence label or at least about 50x 10 6Individual sequence label.
Term " polynucleotide ", " nucleic acid " and " nucleic acid molecule " are used interchangeably, and refer to a covalently bound nucleotide sequence (being the ribonucleotide of RNA and the deoxyribonucleotide of DNA), 3 ' position of the pentose of one of them Nucleotide is connected to by a phosphodiester group on the 5 ' position of pentose of next Nucleotide, this comprises the sequence of any type of nucleic acid, including, but not limited to RNA and dna molecular, cfDNA molecule for example.Term " polynucleotide " comprises and is not limited to strand and polynucleotide two strands.
Term " partly (portion) " is used to mention the amount of the sequence information of fetus and parent nucleic acid molecule in a biological sample at this, and this amount adds up to the sequence information less than a human genome.
Term " specimen " refers to comprise the nucleic acid of the nucleotide sequence that comprises that at least one will screen for copy number variation or the sample of nucleic acid mixture at this, typically derived from biological fluid, cell, tissue, organ or organism.In certain embodiments, sample comprises that at least one suspects the nucleotide sequence that its copy number has made a variation.These samples include but not limited to saliva/saliva, amniotic fluid, blood, clot or fine needle biopsy's sample (for example surgical biopsy, fine needle biopsy etc.), urine, peritoneal fluid, Pleural fluid etc.Although sample is often taken from human experimenter (for example patient), check can be used for from the copy number variation (CNV) that includes but not limited in any mammiferous samples such as dog, cat, horse, goat, sheep, ox, pig.Sample can directly use when obtaining from biogenetic derivation, perhaps uses after changing sample characteristic in pre-treatment.For instance, this pre-treatment can comprise from blood and prepares blood plasma, dilution viscous fluid etc.Pretreated method can also include but not limited to filtration, precipitation, dilution, distillation, mixing, centrifugal, freezing, freeze-drying, concentrated, amplification, nucleic acid fragment, interfering component deactivation, add reagent, dissolving etc.If these pretreated methods are used for sample, so these pretreated methods typically can make one or more associated nucleic acids preferably with untreated specimen (for example namely not carrying out the sample of any such pretreatment process) in the proportional concentration of concentration be retained in the specimen.For method described here, think that still these samples that carry out " processing " or " processing " are biological " test " samples.
Term " qualified samples " refers to comprise the sample of the mixture of the nucleic acid that the known copy number that compares with the nucleic acid in the specimen exists at this, and for interested sequence, this sample is normal sample, namely is not the aneuploid sample.In certain embodiments, qualified samples is used for chromosomal one or more normalization method karyomit(e)s or the section that identification is paid attention to.For instance, qualified samples can be used for identifying the normalization method karyomit(e) of karyomit(e) 21.In the case, to be one be not the sample of trisomy 21 sample to qualified samples.Qualified samples can also be used for determining to judge the threshold value of influenced sample.
Term " training group " refers to one group of sample at this, and they can comprise affected and unaffected sample and be used to develop a kind of model for the analytical test sample.Unaffected sample can be identified the normalization method sequence as qualified samples in the training group, normalization method karyomit(e) for example, and the karyomit(e) dosage of unaffected sample is used to be each setting threshold in these interested sequences (for example karyomit(e)).These affected samples in a training group can be used to verify that affected specimen can easily distinguish out from unaffected sample.
Term " qualified nucleic acid " is to use interchangeably with " qualified sequence ", and this is the sequence that a cycle tests or test nucleic acid compare with it.Qualified sequence is preferably to be present in a kind of sequence in the biological sample by known expression (amount that is qualified sequence is known).In general, qualified sequence is the sequence that is present in " qualified samples "." interested qualified sequence " is to the known a kind of qualified sequence of its amount in qualified samples, and it be with express with a kind of sequence of individuality of medical condition in the sequence that is associated of a species diversity.
Term " interested sequence " refers to a kind of nucleotide sequence at this, and it is associated with a species diversity in the sequence of health contrast diseased individuals is expressed.Interested sequence can be the sequence on a kind of karyomit(e), it under disease or hereditary situation by false demonstration, that is: cross express or express not enough.An interested sequence can be a chromosomal part (being chromosome segment) or a karyomit(e).For example, an interested sequence can be a kind of karyomit(e) (it was expression in the dysploidy situation), or a kind of gene (it is encoded to express not enough a kind of tumor suppression in cancer).Interested sequence is included in to cross in experimenter's the total group of cell or the subgroup expresses or expresses not enough sequence.One " interested qualified sequence " is the interested sequence in qualified samples.One " interested cycle tests " is the interested sequence in specimen.
Term " normalization method sequence " refers in order to will be mapped to the normalized sequence of number with the sequence label of the interested sequence of this normalization method Serial relation connection at this.In certain embodiments, the normalization method sequence shows the variability of number in sample and order-checking round of the sequence label that is mapped to the normalization method sequence, this variability is used as the variability of the interested sequence of normalized parameter close to the normalization method sequence, and influenced sample and one or more uninfluenced sample can be distinguished open.In some implementation, and to compare such as other potential normalization method sequences such as other karyomit(e)s, this normalization method sequence is best or effectively influenced sample and one or more uninfluenced sample are distinguished out." normalization method karyomit(e) " or " normalization method chromosome sequence " is that the example " normalization method chromosome sequence " of " normalization method sequence " can be made of a monosome or a group chromosome." one " normalization method section " is another example of " normalization method sequence ".One " normalization method sector sequence " can be made of a chromosomal single section, and perhaps it can be made of identical or different chromosomal two or more sections.In certain embodiments, normalization method sequence is to carry out normalization method for the variability such as variability of (between round) between the variability of the variability of being correlated with such as technique, interchromosomal (same round) and order-checking.
Term " resolvability " is the chromosomal feature of a kind of normalization method when this refers to, this makes it pick out one or more unaffected (namely normal) sample from one or more affected (being dysploidy) sample.
Term " sequence dosage " refers to the parameter that is associated with number for the sequence label of normalization method recognition sequence for the number of the sequence label of interested recognition sequence at this.In some cases, sequence dosage is the number of the sequence label identified for interested sequence and the ratio of the number of the sequence label of identifying for the normalization method sequence.In some cases, sequence dosage refers to the parameter with the label density dependent connection of the sequence label density of interested sequence and normalization method sequence." cycle tests dosage " is a parameter, and it makes the sequence label density of an interested sequence (for example karyomit(e) 21) carry out related with the sequence label density of the normalization method sequence (for example karyomit(e) 9) of determining in a specimen.Similarly, one " qualified sequence dosage " is a parameter, and it makes the sequence label density of an interested sequence carry out related with the label density of the normalization method sequence of determining in a qualified samples.
Term " sequence label density " refers to the number of sequence reading at this, these readings are mapped to one and list with reference to genome sequence, for example, the sequence label density for karyomit(e) 21 is the number that is mapped to the sequence reading on the genomic karyomit(e) 21 of reference by the back of the body that sequence measurement produces.Term " sequence label density ratio " refers to be mapped to reference to the sequence label number of genomic karyomit(e) (for example karyomit(e) 21) and ratio with reference to the chromosomal length of genome at this
Term " of future generation order-checking (NGS) " refers to allow sequence measurement that the molecule of clonal expansion and single nucleic acid molecule are carried out extensive parallel order-checking at this.The limiting examples of NGS comprises synthesis method order-checking and the connection method order-checking of using reversible dyestuff terminator.
Term " parameter " refers to characterize a kind of numerical relation of physical property at this.Often, parameter characterizes the numerical relation between quantized data collection and/or the quantized data collection in number.For example, being mapped to the number of a sequence label on the karyomit(e) and ratio (or function of ratio) that these labels are mapped between the top chromosomal length is exactly a parameter.
Term " threshold value " and " qualified threshold value " refer to as cut-off to characterize any number of the samples such as specimen of nucleic acid of suffering from a kind of organism of medical science symptom from suspection such as containing at this.Threshold value can compare with parameter value, whether shows that to determine the sample that produces this parameter value this organism suffers from this medical science symptom.In certain embodiments, use qualified data set to calculate qualified threshold value, and serve as the boundary that makes a variation such as copy numbers such as dysploidy in the diagnosis organism.If the result who obtains from method disclosed here has surpassed a threshold value, the experimenter can be diagnosed with the copy number variation so, for example, and trisome 21.The normalized value that calculates by the sample of analyzing for a training group (for example karyomit(e) dosage, NCV or NSV) can be identified for the appropriate threshold value in the method for this explanation.Use comprises that qualified (namely unaffected) sample in the training group of qualified (namely unaffected) sample and affected sample can recognition threshold.These samples (being affected sample) in known training group with karyomit(e) dysploidy can be used for confirming that it is useful (referring to these examples at this) that the threshold value of these selections pick out the affected sample at the unaffected sample from test group.The selection of threshold value depends on that the user wishes the confidence level of making classification that obtains.In some embodiments, the training group that is used for the identification appropriate threshold value comprises at least 10, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 200, at least 300, at least 400, at least 500, at least 600, at least 700, at least 800, at least 900, at least 1000, at least 2000, at least 3000, at least 4000 or more qualified samples.Maybe advantageously improve the diagnosis effectiveness of threshold value with larger group qualified samples.
Term " normalized value " refers to a numerical value at this, and this numerical value makes the sequence label number of identifying for interested sequence (for example karyomit(e) or chromosome segment) carry out related with the sequence label number of identifying for normalization method sequence (for example normalization method karyomit(e) or normalization method chromosome segment).For example, " normalized value " can be the karyomit(e) dosage in the application's elsewhere explanation, perhaps it can be the NCV (normalized karyomit(e) value) in the application's elsewhere explanation, and perhaps it can be the NSV (normalized section value) in the application's elsewhere explanation.
Term " reading " refers to the sequence reading from a part of nucleic acid samples.Typically, but not necessarily, the short sequence of adjacent base pair in the readings signify sample.Reading can symbolically represent by the base-pair sequence (ATCG) of sample sample part.This reading can be stored in the storing device, and deals with on the merits of each case, to determine that this reading is whether with the reference sequences coupling or reach other indexs.Reading can directly obtain from sequencing device, and perhaps the storage sequence information indirect from relevant sample obtains.In some cases, term " reading " refers to the dna sequence dna that can be used for the larger sequence of identification or zone of one section sufficiently long (such as 30bp at least), such as comparing with a karyomit(e) or genome area or gene and compare targetedly.
Term " sequence label " uses interchangeably at this and term " sequence label of mapping ", refers to distribute to definitely by comparison the sequence reading of (being that mapping is arrived) larger sequence (for example with reference to genome).The sequence label of mapping is mapped to uniquely with reference to genome, and namely they are assigned to reference to genomic unit and put.Label can be used as data structure or other data acquisition provides.In certain embodiments, label comprises the relevant information of reading sequence and this reading, for example position of sequence in the genome, for example position on the karyomit(e).In certain embodiments, the position is with positive chain direction explanation.Can define with the reference genome alignment time, to provide limited amount mispairing to label.Can mapping with reference to being not included in the analysis more than the label of one position (namely not uniquely the label of mapping) in the genome.
As used in this, term " comparison (aligned, alignment or aligning) " refers to reading or label and the process that reference sequences compares and whether definite this reference sequences comprises this reading sequence thus.If this reference sequences comprises this reading, this reading can be mapped to reference sequences so, perhaps in certain embodiments, is mapped to the particular location in the reference sequences.In some cases, whether specifically comparison informs the reading member of reference sequences (being that the reading existence still is not present in the reference sequences) simply.For instance, the reference sequences of reading and human chromosome 13 is compared, will inform whether this reading is present in the reference sequences of karyomit(e) 13.Provide the instrument of this information can be determined set member's identity tstr.In some cases, the position that reading or label shine upon in the other indication of the comparison reference sequences.For instance, if reference sequences is universe's genome sequence, comparison can indicate reading to be present on the karyomit(e) 13 so, and can further indicate reading on the concrete thigh and/or site of karyomit(e) 13.
The reading of comparison or label are the order according to its nucleic acid molecule, are identified as and one or more sequences of coming the genomic known array coupling of self-reference.Comparison can manually be carried out, but comparison is typically by the computerized algorithm realization, because for realizing method disclosed here, be impossible comparing reading within reasonable time.An example that is used for the algorithm of aligned sequences is effectively Local Alignment (ELAND) computer program of few nucleotide certificate, and this programme distribution is the part of Yi Luna genomics analysis conduit (IlluminaGenomics Analysis pipeline).Scheme as an alternative, Bloom filter (Bloom filter) or similarly set member's identity tstr can be used for reading with compare with reference to genome.Be illustrated in the Application No. 61/552,374 of submitting on October 27th, 2011, this patent application is incorporated into this in full with it by reference.The coupling of comparison time series reading can be 100% sequences match or less than 100% (imperfect coupling).
As used in this, term " with reference to genome " or " reference sequences " refer to any concrete known group sequence (no matter be part or complete) of any organism or virus, and it can be used for the sequence from an experimenter's identification is carried out reference.For example, be used for the human experimenter and be found in the National Center for Biotechnology Information (American National biotechnology information center) together with the reference genome of a lot of other biological bodies, Www.ncbi.nlm.nih.gov" genome " refers to the complete genetics information of an organism or virus, and this expresses in nucleotide sequence.
In different embodiments, reference sequences is obviously greater than the reading of comparing with it.For instance, it can be as big as less about 100 times, or greatly at least about 1000 times, or greatly at least about 10,000 times, or greatly at least about 10 5Times, or greatly at least about 10 6Times, or greatly at least about 10 7Doubly.
In an example, reference sequences is the sequence of total length human genome.These sequences can be described as the genome reference sequences.In another example, reference sequences is limited to concrete human chromosome, and for example karyomit(e) 13.These sequences can be described as the karyomit(e) reference sequences.Other examples of reference sequences comprise the genome of other species and the karyomit(e) of any species, inferior chromosomal region (such as strand) etc.
In different embodiments, reference sequences is derived from the common sequence of a plurality of individualities or other combinations.Yet in some application, reference sequences can be taken from a concrete individuality.
Term " made Target sequence gene group " refers to contain the allelic known array group of known polymorphic site at this.For instance, " SNP is with reference to genome " is the made Target sequence gene group that comprises the allelic sequence group of containing known SNP.
Term " clinically relevant sequence " refers to a nucleotide sequence at this, this sequence is known be or under a cloud be to be associated or with it implication with situation a kind of heredity or disease.In the diagnosis of determining a kind of medical condition or when confirming the diagnosis of this medical condition or when prediction for a kind of disease progression is provided, it can be useful determining to exist or do not have relevant clinically sequence.
When under the background of a kind of nucleic acid or a nucleic acid mixture, using term " to derive ", refer to obtain from source this or that these nucleic acid must originate from the mode of this or these nucleic acid at this.For example, in one embodiment, refer to that derived from the mixture of the nucleic acid of two different genes groups these nucleic acid (for example cfDNA) are naturally to be discharged by the process (such as necrosis or apoptosis) of natural generation by cell.In another embodiment, refer to that derived from the mixture of the nucleic acid of two different genes groups these nucleic acid are from from extracting two kinds of an experimenter dissimilar cells.
Term " patient's sample " refers to the biological sample of acquisition from patient's (being the recipient of medical aid, nursing or treatment) at this.Patient's sample can be any sample described here.In certain embodiments, patient's sample obtains by the Noninvasive program, for example peripheral blood sample or faecal samples.Method described here is not necessarily limited to the mankind.Therefore, contain different veterinary applications, in the case, patient's sample can be the sample (for example cat, pig, horse, ox etc.) from non-human mammal.
Term " biased sample " refers to contain sample derived from the nucleic acid mixture of different genes group at this.
Term " maternal sample " refers to the biological sample of acquisition from conceived experimenter (for example women) at this.
Term " biological fluid " refers to take from the liquid of biogenetic derivation and comprises for example blood, serum, blood plasma, saliva, irrigating solution, cerebrospinal fluid, urine, seminal fluid, sweat, tears, saliva etc. at this.As used in this, term " blood ", " blood plasma " and " serum " are clearly contained its part or processing part.Equally, take from sample in the situation of examination of living tissue, cotton swab, smear etc., " sample " contained clearly derived from the processing part of examination of living tissue, cotton swab, smear etc. or part.
Term " parent nucleic acid " and " fetal nucleic acid " refer to respectively the nucleic acid of the fetus that the nucleic acid of conceived female subjects and this pregnancy women are entrained at this.
As used in this, term " with ... corresponding " sometimes refer to be present in different experimenters' the genome, and in all genomes, need not to have identical sequence, but in order to identity that interested sequences such as gene or karyomit(e) is provided but not the nucleotide sequences such as gene or karyomit(e) of genetic information.
As used in this, the required sample formulation of removing the cellular component that usually is attached thereto from required sample contained in term " acellular in fact ".For instance, by removing such as common hemocytes that links to each other with blood plasma such as red corpuscle, make plasma sample acellular in fact.In certain embodiments, acellular in fact sample is processed, removing cell, otherwise these cells will be treated the desirable genetic material of testing for CNV and exert an influence.
As used in this, term " fetus mark " refers to comprise the mark of the fetal nucleic acid that exists in the sample of fetus and parent nucleic acid.The fetus mark is often in order to characterize the cfDNA in mother's blood.
As used in this, term " karyomit(e) " refers to bear in the viable cell genophore of heredity, and it is derived from chromatin and comprise DNA and protein component (especially histone).Adopt conventional one or two people's genoid group chromosome numbering system of generally acknowledging in the world at this.
As used in this, term " polynucleotide length " refers in the sequence or the absolute number of the genomic regional amplifying nucleic acid molecule of reference (Nucleotide).Term " chromosome length " refers to the known chromosome length take base pair as unit, be for example found in World Wide Web genome.ucsc.edu/cgi-bin/hgTracks? hgsid=167155613﹠amp; Institute provides in the NCBI36/hg18 set of the human chromosome on the chromInfoPage=.
Term " experimenter " refers to human experimenter and non-human experimenter, for example Mammals, invertebrates, vertebrates, fungi, yeast, bacterium and virus at this.Although relate to the mankind and language mainly is for human problem at this example, concept disclosed here is applicable to the genome from any plant or animal, and is applicable to veterinary science, Animal husbandry, research laboratory etc. field.
Term " symptom " refers to " medical science symptom " at this, as the term of broad sense, it comprises all diseases and illness, also can comprise [damage] and such as normal health situations such as pregnancies, it may affect a people's health, benefits from medical aid or has the meaning that contains of therapeutic treatment.
Term " complete " uses when mentioning chromosomal aneuploidy at this, refers to whole chromosomal acquisition or loses.
When term " part " uses when mentioning chromosomal aneuploidy, refer to the acquisition of a chromosomal part (being section) or lose at this.
Term " mosaic " refers to represent one from the single fertilization egg development and has two kinds of cell colonys with different caryogram the individuality that comes at this.Mosaic may be caused by the sudden change that only spreads to adult's cell subset between the growth period.
Term " non-mosaic " refers to comprise the organism of the cell with a kind of caryogram, for example human foetus at this.
Term " use karyomit(e) " refers to use the sequence information that obtains for karyomit(e) at this when using when determining karyomit(e) dosage mentioning, the number of the sequence label that namely obtains for karyomit(e).
Term " susceptibility " equals the number of true positives divided by true positives and false negative sum as used in this.
Term " specificity " equals the number of true negative divided by true negative and false positive sum as used in this.
Term " hypodiploid " refers to a chromosome number at this, and it is compared to the normal haploid number of the genome feature of these species and wants little one or more.
" polymorphic site " is the locus that the nucleotide sequence difference occurs.Locus may diminish to a base pair.Schematically marker has at least two allelotrope, and the frequency of each appearance is greater than 1% of selected colony, and more typically greater than 10% or 20%.Polymorphic site can be the site that single nucleotide polymorphism (SNP), small-scale polybase base disappearance or insertion, polynucleotide polymorphism (MNP) or short series connection repeat (STR).Term " polymorphic locus " and " polymorphic site " are at this Alternate.
" polymorphic sequence " refers to comprise the nucleotide sequence of one or more polymorphic sites (for example a SNP or a series connection SNP), for example dna sequence dna at this.Can be used for specifically will comprising that parent and non-parent allelotrope are distinguished out in the maternal sample of fetus and parent nucleic acid mixture according to the polymorphic sequence of present technique.
As used in this, " single nucleotide polymorphism " (SNP) appears on the polymorphic site that mononucleotide occupies, and this site is the site of morphing between the allelic sequence.Common front, this site and back are the sequences (sequence that for example changes in less than colony 1/100 or 1/1000 member) of allelotrope high conservative.SNP produces because Nucleotide on the polymorphic site is replaced by another Nucleotide usually.Conversion is that a purine is replaced by another purine or a pyrimidine is replaced by another pyrimidine.Transversion is that purine is replaced by pyrimidine or pyrimidine is replaced by purine.SNP also can be caused by nucleotide deletion or the Nucleotide insertion with respect to reference allele.Single nucleotide polymorphism (SNP) is the situations that two alternative bases occur with considerable frequency (>1%) among the human colony, and is human inheritance's variation of common type.
Term " series connection SNP " refers to two or more SNP of existing at this in a polymorphic target nucleic acid sequence.
As used in this, term " short series connection repeat " or " STR " refer to that pattern when two or more Nucleotide repeats and a class polymorphism of direct appearance when adjacent one another are of tumor-necrosis factor glycoproteins.The length of this pattern can be in base pair (bp) (for example (CATG) n the genome area) scope from 2 to 10, and typically in non-coding intron zone.By checking that what specific STR sequences several str locus seats and counting have repeat at set locus, might set up individual unique genetic profile.
As used in this, term " miniSTR " refers to cross over less than about 300 base pairs, less than about 250 base pairs, less than about 200 base pairs, less than about 150 base pairs, less than about 100 base pairs, repeats less than about 50 base pairs or less than four of about 25 base pairs or the series connection of more base pair at this." miniSTR " is can be from the STR of cfDNA template amplification.
Term " polymorphic target nucleic acid ", " polymorphic sequence ", " polymorphic target nucleic acid sequence " and " polymorphic nucleic acid " refer to comprise the nucleotide sequence (for example dna sequence dna) of one or more polymorphic sites at this Alternate.
Term " a plurality of polymorphic target nucleic acid " refers to respectively comprise a large amount of nucleotide sequences of at least one polymorphic site (for example SNP) at this, so that 1,2,3,4,5,6,7,8,9,10,15,20,25,30,40 or more different polymorphic site be from this polymorphic target nucleic acid amplification, with identification and/or quantize to comprise the fetus allelotrope that exists in the maternal sample of fetus and parent nucleic acid.
The polymorphic target nucleic acid amplification that term " enrichment " comprises in this refers to a maternal sample part and with the process of institute's amplified production with the rest part combination of the maternal sample of removing this part.For instance, the rest part of maternal sample can be original maternal sample.
Term " original maternal sample " this refer to from serve as remove a part with the conceived experimenter (for example women) in the source of the polymorphic target nucleic acid that increases the non-enriched biological sample of acquisition." primary sample " can be any sample and its processing part that obtains from conceived experimenter, the purifying cfDNA sample that for example extracts from the Maternal plasma sample.
As used in this, term " primer " refers to cause the condition lower time (namely at Nucleotide with such as initiators such as DNA polymerases in the presence of and under suitable temperature and pH value) synthetic with the primer extension product of nucleic acid thigh compensation when placing, and can serve as the separate oligonucleotides of synthetic starting point.Be the amplification of top efficiency ground, primer is sub-thread preferably, but scheme as an alternative can be bifilar.If bifilar, before for the preparation of extension products, at first primer is processed to separate its strand so.Primer is oligodeoxyribonucleotide preferably.The necessary sufficiently long of primer, synthetic with initiation extension products in the presence of initiator.The precise length of primer will depend on many factors, comprise the use of temperature, primer source, method and the parameter that is used for design of primers.
Phrase " behavior that remains to be taked (cause) " refers to medical profession (for example doctor) or control or instructs the control that the people of experimenter's medical treatment and nursing takes and/or permit in question one or more medicaments/one or more compounds give experimenter's action.Administration can comprise diagnosis and/or determine suitably treatment or prevention scheme, and/or leaves concrete medicament/compound for the experimenter.This is prescribed and can comprise that for example drafting prescription forms, writes case record etc.Equally, for example diagnostor " pending behavior (cause) is arranged " refers to medical profession (for example doctor) or control or instructs the control that the people of experimenter's medical treatment and nursing takes and/or the action of permitting the experimenter is carried out one or more diagnosis schemes.
Foreword
Disclosed method, equipment, system and the test kit of the copy number variation (CNV) that is used for determining the different interested sequences of specimen at this, this specimen comprises derived from two different genes groups and known or suspect the mixture of the nucleic acid that the amount of one or more interested sequences is different.Method, equipment, system and test kit for the mark of determining to be contributed by two genomes of nucleic acid mixture also are provided.The copy number variation of determining by the method and apparatus that discloses herein comprises whole chromosomal acquisition or loses, relates to the variation of microscopic very big chromosome segment and a large amount of submicroscopic copy number variation of the dna fragmentation of size from kilobase (kb) to megabasse (Mb).In different embodiments, these methods comprise the statistical method that a kind of machine is realized, this statistical method illustrates the variability that naturally increases that variability causes between variability, interchromosomal variability and the sequence of being correlated with by technique.The method is applicable to determine the CNV of any fetus dysploidy, and CNV known or that suspection is relevant with the plurality of medical symptom.Can comprise according to the CNV that the inventive method is determined among karyomit(e) 1 to 22, X and the Y disappearance of any one or a plurality of trisomy and monosomy, other karyomit(e) polysomies and any one or more chromosomal section and/or copy, by only to the nucleic acid sequencing of specimen once, can detect.Any dysploidy can be from by determining the order-checking information that only nucleic acid sequencing of specimen is once namely obtained.
The human diversity of CNV remarkably influenced in human genome and to the susceptibility of the disease (people such as Redon (thunder east), Nature (nature) 23:444-454[2006], people .GenomeRes (genome research) 19:1682-1690[2009 such as Shaikh (Xie He)].Known CNV consists of genetic diseases by different mechanisms, cause in most cases the gene dosage imbalance also or gene disruption.Except they are directly relevant with genetic block, also known CNV mediation can be the phenotypic alternation that is harmful to.Recently, some researchs are reported, as comparing with normal control, lack of proper care in complexity, for example in autism, ADHD (hyperkinetic syndrome) and the schizophrenia, the burden of the increase of rare or CNV again, given prominence to rare or unique CNV potential pathogenic (people such as Sebat (Sai Baite), 316:445-449[2007]; The people such as Walsh (Walsh), Science (science) 320:539-543[2008].CNV from genome rearrangement rises, and is main because lack, copy, insert and unbalanced transposition event.
Method described here, equipment or device can adopt the sequencing technologies of future generation (NGS) that carries out extensive parallel order-checking.In certain embodiments, with the dna profiling of the extensive parallel mode order-checking clone ground amplification in the groove that flows or single dna molecular (such as at the people such as Volkerding (Wo Keerding), ClinChem (clinical chemistry) 55:641-658[2009]; Metzker (maze can) M, Nature Rev (naturally comment) 11:31-46[2010] described in).Except the high-throughput sequence information, NGS provides quantitative information, and wherein each sequence reading is computable " sequence label ", and these sequence labels represent individual cloned DNA template or single dna molecular.The sequencing technologies of NGS comprises the tetra-sodium order-checking, checks order by the synthesis method order-checking of reversible dyestuff terminator, order-checking and the ionic semiconductor that connects by oligonucleotide probe.Can check order individually from the DNA (being the singleplex order-checking) of independent sample, perhaps when single order-checking round, as index genome molecule, DNA from a plurality of samples can be pooled together and be checked order (being multiple order-checking), to produce the reading up to some hundred million dna sequence dna.The example of sequencing technologies below is described, can be used for obtaining the sequence information of the method according to this invention.
In some embodiments, method and apparatus disclosed here can adopt some or all operations of following order: obtain nucleic acid specimen (typically by the Noninvasive program) from the patient; The processing specimen is prepared to check order; Nucleic acid from specimen is checked order, to produce a large amount of readings (for example at least 10,000); These readings and reference sequences/genomic part are compared, and determine to be mapped to the partly amount (for example number of reading) of the DNA of (for example defining karyomit(e) or chromosome segment) that defines of reference sequences; By calculating one or more dosage that define part with the amount normalized mapping that is mapped to for the DNA that defines the selected one or more normalization method karyomit(e)s of part or chromosome segment to the amount of the DNA that defines part; Determine whether this dosage indicates this to define part " influenced " (for example dysploidy or mosaic); Report is determined and chooses wantonly to be converted into diagnosis; With this diagnosis or determine to develop treatment, monitoring or the further plan of test patient.
Determine the normalization method sequence in the qualified samples: normalization method chromosome sequence and normalization method sector sequence
Use is from one group of qualified samples identification normalization method sequence that derives from the experimenter, and these experimenters are known to comprise the have interested any sequence normal copy number of (for example karyomit(e) or its section).Summarized determining of normalization method sequence in the step 110 of the embodiment of the method for in Fig. 1, describing, 120,130,140 and 145.The sequence information that obtains from qualified samples is used for statistically the intentionally chromosomal aneuploidy (Fig. 1 step 165 and example) of free burial ground for the destitute Recognition test sample.
Fig. 1 is provided for determining flow process Figure 100 of the embodiment of the CNV of interested sequences such as karyomit(e) or its section in the biological sample.In some embodiments, obtain biological sample from the experimenter, and this sample comprises the mixture of the nucleic acid that is made of the different genes group.Can be consisted of by the sample of two individualities the different genes group, for example consist of the different genes group by fetus and the parent of nourishing fetus.Alternately, can be by consisting of genome from the sample of the dysploidy cancer cell of same subject and normal multiple cell (for example from cancer patients plasma sample).
Except the specimen of analyzing the patient, also to select each possible interested chromosomal one or more normalization method karyomit(e)s or one or more normalization method chromosome segment.The proper testing of the identification of normalization method karyomit(e) or section and patient's sample is asynchronous to carry out, and both can carry out in a clinical setting.In other words, identification normalization method karyomit(e) or section before the test patient sample.Cognation between storage normalization method karyomit(e) or section and interested karyomit(e) or the section is to use at test period.As described below, this cognation is typically preserved the time period that the many samples of test are crossed over.Below discuss and relate to be used to the normalization method karyomit(e) of selecting indivedual interested karyomit(e)s or section or the embodiment of chromosome segment.
Obtain a combination lattice sample and identify qualified normalization method sequence, and the variation value is provided, for significant identification on the statistics of the CNV that determines specimen.In step 110, obtain a plurality of biology qualified samples from a plurality of experimenters, known these experimenters comprise the cell of the normal copy number with interested any one sequence.In one embodiment, obtain qualified samples from the parent of nourishing fetus, used the cytogenetics means to confirm to have the karyomit(e) of normal copy number.The biology qualified samples can be a kind of biological fluid, for example blood plasma, or any suitable sample as described below.In some embodiments, qualified samples contains the mixture of nucleic acid molecule (for example cfDNA molecule).In some embodiments, qualified samples is the plasma sample of parent that contains the mixture of fetus and cfDNA molecule parent.By using any known sequence measurement, at least a portion in these nucleic acid (for example fetus with nucleic acid parent) is checked order, obtain normalization method karyomit(e) and/or its a part of sequence information.Preferably, any fetus and nucleic acid sequencing parent that is used to the molecule of conduct list or clonal expansion in next generation's order-checking (NGS) method of the application's elsewhere explanation.In different embodiments, qualified samples such as following disclosed is processed before order-checking and during the order-checking.These samples can use such as equipment disclosed here, system and test kit and process.
In step 120, each at least a portion that is included in all qualified nucleic acid in the qualified samples is sequenced, to produce 1,000,000 sequence readings, 36bp reading for example, this with reference to genome, for example hg18 compares.In some embodiments, the sequence reading comprises about 20bp, about 25bp, about 30bp, about 35bp, about 40bp, about 45bp, about 50bp, about 55bp, about 60bp, about 65bp, about 70bp, about 75bp, about 80bp, about 85bp, about90bp, about 95bp, about 100bp, about 110bp, about 120bp, about 130bp, about 140bp, about 150bp, about 200bp, about 250bp, about 300bp, about 350bp, about 400bp, about 450bp or about 500bp.The expectation technical superiority will be so that can carry out single-ended reading greater than 500bp, and when producing pairing end reading, this reading allows to for the reading greater than about 1000bp.In one embodiment, the sequence reading of mapping comprises 36bp.In another embodiment, the sequence reading of mapping comprises 25bp.With the sequence reading of reference genome alignment, and unique mapping is to reference to genomic reading, known they as sequence label.In one embodiment, from the genomic reading of unique mapping reference, obtain at least about 3x 10 6Individual qualified sequence label, at least about 5x 10 6Individual qualified sequence label, at least about 8x 10 6Individual qualified sequence label, at least about 10x 10 6Individual qualified sequence label, at least about 15x 10 6Individual qualified sequence label, at least about 20x 10 6Individual qualified sequence label, at least about 30x 10 6Individual qualified sequence label, at least about 40x 10 6Individual qualified sequence label or at least about 50x 10 6Individual comprise 20 and the 40bp reading between qualified sequence label.
In step 130, counting derives from all labels of the nucleic acid in the order-checking qualified samples, to determine qualified sequence label density.In one embodiment, sequence label density is confirmed as with reference to these a plurality of qualified sequence labels corresponding to interested sequence on the reference genome.In another embodiment, qualified sequence label density is these a plurality of qualified sequence labels that are defined as being mapped to interested sequence, is normalized to the length of the interested qualified sequence of their mappings.Be confirmed as label density with respect to the sequence label density of the ratio of the length of interested sequence referred to here as the label density ratio.Do not need to normalize to the length of interested sequence, and can be included as a step, reduce the figure place in the number, simplify it and be used for manual interpretation.All qualified sequence labels are by mapping and count down to each qualified samples, the sequence label density of the interested sequence in qualified samples (sequence of for example being correlated with clinically) is determined, and order is identified the sequence label density of additional sequences (the normalization method sequence is from it) simultaneously.
In certain embodiments, interested sequence is the karyomit(e) that is associated with complete chromosomal aneuploidy, for example karyomit(e) 21, and qualified normalization method sequence is not to be associated with chromosomal aneuploidy and the variation of sequence label density approaches complete karyomit(e) such as interested sequences (being karyomit(e)) such as karyomit(e)s 21.Selected normalization method karyomit(e) can be near karyomit(e) or a group chromosome of the sequence label variable density of interested sequence.Among karyomit(e) 1-22, X and the Y any one or a plurality of can be interested sequence, and the one or more karyomit(e) can be identified as any karyomit(e) 1-22, the X in the qualified samples, each the normalization method sequence among the Y.Normalization method karyomit(e) can be independent karyomit(e), and perhaps it can be the application's the described group chromosome of elsewhere.
In another embodiment, interested sequence is the chromosome segment that is associated with part dysploidy (for example chromosome deletion or insertion or uneven chromosome translocation), and the normalization method sequence is not to be associated with the part dysploidy and the variation of sequence label density approaches the chromosome segment (or one group of section) of the chromosome segment that is associated with the part dysploidy.Selected normalization method chromosome segment can be near one or more chromosome segments of the sequence label variable density of interested sequence.Any one of any one or a plurality of karyomit(e) 1-22, X and Y or a plurality of section can be interested sequences.
In other embodiments, interested sequence is the chromosome segment that is associated with the part dysploidy, and the normalization method sequence is a whole chromosome or a plurality of whole chromosome.In other embodiments again, whole chromosome and normalization method sequence that interested sequence right and wrong orthoploidy is associated are a chromosome segment or a plurality of chromosome segments that is not associated with this dysploidy.
No matter simple sequence or one group of recognition sequence are the normalization method sequence of any or a plurality of interested sequences in the qualified samples, and be can Selective sequence label variable density the most approaching or effectively approach as the qualified normalization method sequence of the interested sequence determined in qualified samples.For instance, qualified normalization method sequence is when when interested sequence is carried out normalization method, produces the sequence of minimum variability between qualified samples, i.e. the variability of the interested sequence of determining in the most close qualified samples of variability of normalization method sequence.In other words, qualified normalization method sequence is the sequence that is selected as making the variation minimum of sequence dosage (interested sequence) between qualified samples.Therefore, this process choosing estimates to understand the sequence of the variability of the minimum in the karyomit(e) dosage between the different batches that produces interested sequence as normalization method karyomit(e) the time.
It is to select to be used for to determine to exist or do not exist the normalization method sequence of dysploidy to reach a few days, several weeks, several months and time that may the several years in specimen that the normalization method sequence of identifying for any or a plurality of interested sequence in the qualified samples keeps, its condition is that program need to produce sequencing library, and the order-checking that sample is carried out is substantially constant in time.As mentioned above, the normalization method sequence that is used for determining existing dysploidy is selected near its variability as the interested sequence of normalized parameter of use (and possible other reasons) because of the variability that between sample room (for example different samples) and order-checking round (for example on the same day and/or the order-checking round of not carrying out on the same day) is mapped to its sequence label number.The substantially modify of these programs is mapped to impact the number of the label of all sequences, from and which will be determined or where organize sequence identical and/or different order-checking round, on the same day or not on the same day in the variability of sample room near the variability of interested sequence, this will need to determine this group normalization method sequence again.The substantially modify of program comprises that the lab scenario for the preparation of sequencing library changes, comprise with for the preparation of multiple order-checking but not the relevant variation of the sample of single channel order-checking; And the variation of order-checking platform, comprise the variation for the chemical substance of order-checking.
In some embodiments, the normalization method sequence is the sequence that picks out best one or more qualified samples from one or more affected samples, this means that the normalization method sequence is the sequence with maximum resolvability, the resolvability that is the normalization method sequence is like this, so that provide optimum differentiation to the interested sequence in the affected specimen, be used for easily from other unaffected samples, picking out affected specimen.In other embodiments, normalization method sequence is to have the variability of minimum and the sequence of the combination of the resolvability of maximum.
The level of resolvability can be determined to be in sequence dosage (for example karyomit(e) dosage or section dosage) in a group qualified samples and the statistical discrepancy between the one or more karyomit(e) dosage in one or more specimen, as described below and shown in these examples.For example, resolvability can be the T test value by numeral, and it represents karyomit(e) dosage in a group qualified samples and the statistical discrepancy between the one or more karyomit(e) dosage in one or more specimen.Z-score for chromosome doses aslong as the distribution for the NCV is normal. <0{ 〉Alternately, resolvability can be normalized karyomit(e) value (NCV) by numeral, as long as the distribution of NCV is normal, it is exactly the z mark of karyomit(e) dosage.Similarly, resolvability can be the T test value by numeral, and it represents section dosage in a group qualified samples and the statistical discrepancy between the one or more section dosage in one or more specimen.Be in the situation of interested sequence at chromosome segment, the resolvability of section dosage can be expressed as normalized section value (NSV) in number, and the z mark that this normalized section value is chromosome segment dosage is as long as the distribution of NSV normally.In definite z mark, can use mean value and the standard deviation of the dosage of the chromosomal or section in a combination lattice sample.Alternately, can use mean value and the standard deviation of the dosage of chromosomal in the training group that comprises qualified samples and influenced sample or section.In other embodiments, normalization method sequence is the sequence with best of breed of minimum variability and maximum resolvability or little variability and large resolvability.
The method identification has the sequence of similar characteristics inherently, and tends to the similar variation between sample and order-checking round, and it is for determining that the sequence dosage in the specimen is useful.
Determine the sequence dosage (being karyomit(e) dosage or section dosage) in the qualified samples
In step 140, based on the qualified label density of calculating, the qualified sequence dosage of interested sequence (being karyomit(e) dosage or section dosage) is confirmed as the ratio of the qualified sequence label density of the sequence label density of interested sequence and additional sequences (identifying subsequently normalization method sequence from it in step 145).The normalization method sequence of identification is used to the sequence dosage in definite specimen subsequently.
In one embodiment, sequence dosage in the qualified samples is a karyomit(e) dosage, and this karyomit(e) dosage is calculated as the ratio of this sequence label number of the normalization method chromosome sequence in interested chromosomal this sequence label number and the qualified samples.The normalization method chromosome sequence can be monosome, a group chromosome, chromosomal section or from one group of section of coloured differently body.Therefore, interested chromosomal karyomit(e) dosage is confirmed as in sample: (i) ratio of these a plurality of labels of interested chromosomal these a plurality of labels and the normalization method chromosome sequence that consisted of by monosome, (ii) for the number of interested chromosomal label and ratio for the number of the label that comprises two or more chromosomal normalization method chromosome sequences; (iii) for the number of interested chromosomal label and ratio for the number of the label of the normalization method sector sequence that comprises a chromosomal single section; (iv) for the number of interested chromosomal label with for the ratio that comprises from the number of the label of the normalization method sector sequence of chromosomal two or more sections; Or (v) for the number of interested chromosomal label and ratio for the number of the label of the normalization method sector sequence that comprises two or more chromosomal two or more sections.According to (i)-(v), be used for to determine that the example of interested chromosomal karyomit(e) dosage is as follows: the karyomit(e) dosage of interested karyomit(e) (for example karyomit(e) 21) is confirmed as the sequence label density of karyomit(e) 21 and whole each ratios of sequence label density of residue karyomit(e)s (being karyomit(e) 1-20, chromosome 22, chromosome x and karyomit(e) Y); (i) the karyomit(e) dosage of interested karyomit(e) (for example karyomit(e) 21) is confirmed as the ratio of sequence label density and the chromosomal sequence label density that all may make up of two or more residues of karyomit(e) 21; (ii) the karyomit(e) dosage of interested karyomit(e) (for example karyomit(e) 21) is confirmed as the ratio of sequence label density of the section of the sequence label density of karyomit(e) 21 and another karyomit(e) (for example karyomit(e) 9); (iii) the karyomit(e) dosage of interested karyomit(e) (for example karyomit(e) 21) is confirmed as the ratio of the sequence label density of the sequence label density of karyomit(e) 21 and another chromosomal two sections (for example two of karyomit(e) 9 sections); (iv) and the karyomit(e) dosage of interested karyomit(e) (for example karyomit(e) 21) be confirmed as the ratio of the sequence label density of the sequence label density of karyomit(e) 21 and two sections of two coloured differently bodies (for example section of the section of karyomit(e) 9 and karyomit(e) 14).
In another embodiment, the sequence dosage in the qualified samples is section dosage, and it is calculated as in the qualified samples for the number of the sequence label of the interested section of non-whole chromosome and ratio for the number of the sequence label of normalization method sector sequence.The normalization method sector sequence can be for example whole chromosome, one group of whole chromosome, chromosomal section or from one group of section of coloured differently body.For instance, in qualified samples, the section dosage of interested section is confirmed as these a plurality of labels of (i) interested section and the ratio of these a plurality of labels of the normalization method sector sequence that is made of chromosomal single section, (ii) ratio of these a plurality of labels of these a plurality of labels of interested section and the normalization method sector sequence that consisted of by chromosomal two or more sections, or (iii) ratio of these a plurality of labels of these a plurality of labels of interested section and the normalization method sector sequence that consisted of by two or more chromosomal two or more sections.
All determine interested one or more chromosomal karyomit(e) dosage in the qualified samples, and in step 145, identifying the normalization method chromosome sequence.Similarly, all determining the section dosage of interested one or more sections in the qualified samples, and in step 145, identifying the normalization method sector sequence.
From qualified sequence dosage identification normalization method sequence
In step 145, based on the sequence dosage that calculates, the normalization method sequence of identifying interested sequence is the sequence that for example makes sequence dosage minimum variability between all qualified samples of interested sequence.The method identification has the sequence of similar characteristics inherently, and tends to the similar variation of sample and order-checking round, and it is for determining that the sequence dosage in the specimen is useful.
In a combination lattice sample, can identify the normalization method sequence of interested one or more sequences, and the sequence of identifying in qualified samples can be used for the sequence dosage (step 150) of interested one or more sequences of each specimen of calculating subsequently, to determine to exist or do not exist dysploidy in each specimen.When using different order-checking platform, and/or when there are differences in the preparation of the purifying of wanting sequencing nucleic acid and/or sequencing library, to interested karyomit(e) or section, the normalization method sequence of identification can difference.Use the normalization method sequence to provide single-minded and sensitive measurement as the variation of the copy number of karyomit(e) or its section according to method described here, no matter the order-checking platform of sample preparation and/or use how.
In some embodiments, identify the normalization method sequence more than, that is, can determine different normalization method sequences to an interested sequence, and can to an interested sequence, determine a plurality of sequence dosage.For example, when using the sequence label density of karyomit(e) 14, the variation in the karyomit(e) dosage of interested karyomit(e) 21 (for example variation coefficient) minimum.Yet, can identify two, three, four, five, six, seven, eight or more normalization method sequence, be used in the sequence dosage of determining the interested sequence of specimen, using.As an example, can use karyomit(e) 7, karyomit(e) 9, karyomit(e) 11 or karyomit(e) 12 as the normalization method chromosome sequence, determine the second dosage of the karyomit(e) 21 in any one specimen, because these karyomit(e)s all have the CV (referring to example 8 tables 10) near the CV of karyomit(e) 14.Preferably, when selecting monosome as interested chromosomal normalization method chromosome sequence, normalization method karyomit(e) will be a karyomit(e), and this karyomit(e) causes interested chromosomal karyomit(e) dosage to have the minimum variability of striding whole specimen (for example qualified samples).
The normalization method chromosome sequence is as chromosomal normalization method sequence
In other implementations, the normalization method chromosome sequence can be simple sequence, and perhaps it can be one group of sequence.For example, in some embodiments, the normalization method sequence is to be identified as any one of karyomit(e) 1-22, X and Y or one group of sequence of a plurality of normalization method sequence, for example a group chromosome.Consist of this group chromosome of interested chromosomal normalization method sequence (being the normalization method chromosome sequence), can be one group two, three, four, five, six, seven, eight, nine, ten, 11,12,13,14,15,16,17,18,19,20,21 or 20 disomes, and comprise or get rid of among chromosome x and the Y one or these two.>this group chromosome of being identified as the normalization method chromosome sequence is such group chromosome, and they cause interested chromosomal karyomit(e) dosage to have the minimum variability of striding whole specimen (being qualified samples).Preferably, test karyomit(e)s independent or many groups together, for the ability of the interested sequence of they best simulation, select them as the normalization method chromosome sequence for this reason.
In one embodiment, the normalization method sequence of karyomit(e) 21 is to be selected from karyomit(e) 9, karyomit(e) 1, karyomit(e) 2, karyomit(e) 3, karyomit(e) 4, karyomit(e) 5, karyomit(e) 6, karyomit(e) 7, karyomit(e) 8, karyomit(e) 10, karyomit(e) 11, karyomit(e) 12, karyomit(e) 13, karyomit(e) 14, karyomit(e) 15, karyomit(e) 16 and karyomit(e) 17.In another embodiment, the normalization method sequence of karyomit(e) 21 is to be selected from karyomit(e) 9, karyomit(e) 1, karyomit(e) 2, karyomit(e) 11, karyomit(e) 12 and karyomit(e) 14.Alternately, the normalization method sequence of karyomit(e) 21 is group chromosomes that are selected from karyomit(e) 9, karyomit(e) 1, karyomit(e) 2, karyomit(e) 3, karyomit(e) 4, karyomit(e) 5, karyomit(e) 6, karyomit(e) 7, karyomit(e) 8, karyomit(e) 10, karyomit(e) 11, karyomit(e) 12, karyomit(e) 13, karyomit(e) 14, karyomit(e) 15, karyomit(e) 16 and karyomit(e) 17.In another embodiment, this group chromosome is a group that is selected from karyomit(e) 9, karyomit(e) 1, karyomit(e) 2, karyomit(e) 11, karyomit(e) 12 and karyomit(e) 14.
In some embodiments, by using the normalization method sequence further to improve the method, by individually and in all may making up with whole residues are chromosomal, use the system-computed of each chromosomal whole karyomit(e) dosage to determine normalization method sequence (referring to example 13).For example, by using among karyomit(e) 1-22, X and the Y any, and two or more the combination among karyomit(e) 1-22, X and the Y is to determine which karyomit(e) single or in groups is normalization method karyomit(e), this normalization method karyomit(e) causes striding the minimum variability of the interested chromosomal karyomit(e) dosage of a combination lattice sample, system-computed all may karyomit(e) thus, can determine the normalization method karyomit(e) (referring to example 13) that system determines to each interested karyomit(e).Therefore, in one embodiment, the group chromosome that the normalization method sequence of the system-computed of karyomit(e) 21 is comprised of karyomit(e) 4, karyomit(e) 14, karyomit(e) 16, karyomit(e) 20, and chromosome 22.To the whole karyomit(e)s in the genome, can determine karyomit(e) single or in groups.
In one embodiment, the normalization method sequence of karyomit(e) 18 is to be selected from karyomit(e) 8, karyomit(e) 2, karyomit(e) 3, karyomit(e) 4, karyomit(e) 5, karyomit(e) 6, karyomit(e) 7, karyomit(e) 9, karyomit(e) 10, karyomit(e) 11, karyomit(e) 12, karyomit(e) 13 and karyomit(e) 14.Preferably, the normalization method sequence of karyomit(e) 18 is to be selected from karyomit(e) 8, karyomit(e) 2, karyomit(e) 3, karyomit(e) 5, karyomit(e) 6, karyomit(e) 12 and karyomit(e) 14.In one embodiment, the normalization method sequence of karyomit(e) 18 is group chromosomes that are selected from karyomit(e) 8, karyomit(e) 2, karyomit(e) 3, karyomit(e) 4, karyomit(e) 5, karyomit(e) 6, karyomit(e) 7, karyomit(e) 9, karyomit(e) 10, karyomit(e) 11, karyomit(e) 12, karyomit(e) 13 and karyomit(e) 14.Preferably, this group chromosome is a group that is selected from karyomit(e) 8, karyomit(e) 2, karyomit(e) 3, karyomit(e) 5, karyomit(e) 6, karyomit(e) 12 and karyomit(e) 14.
In another embodiment, by individually and by normalization method chromosomal all may be used in combination each may normalization method karyomit(e), the normalization method sequence (explaining such as the application's elsewhere) that system-computed all may karyomit(e) dose determination karyomit(e) 18 thus.Therefore, in one embodiment, the normalization method karyomit(e) that the normalization method sequence of karyomit(e) 18 is comprised of a group chromosome, this group chromosome is comprised of karyomit(e) 2, karyomit(e) 3, karyomit(e) 5 and karyomit(e) 7.
In one embodiment, the normalization method sequence of chromosome x is to be selected from karyomit(e) 1, karyomit(e) 2, karyomit(e) 3, karyomit(e) 4, karyomit(e) 5, karyomit(e) 6, karyomit(e) 7, karyomit(e) 8, karyomit(e) 9, karyomit(e) 10, karyomit(e) 11, karyomit(e) 12, karyomit(e) 13, karyomit(e) 14, karyomit(e) 15 and karyomit(e) 16.Preferably, the normalization method sequence of chromosome x is to be selected from karyomit(e) 2, karyomit(e) 3, karyomit(e) 4, karyomit(e) 5, karyomit(e) 6 and karyomit(e) 8.In one embodiment, the normalization method sequence of chromosome x is a group chromosome that is selected from karyomit(e) 1, karyomit(e) 2, karyomit(e) 3, karyomit(e) 4, karyomit(e) 5, karyomit(e) 6, karyomit(e) 7, karyomit(e) 8, karyomit(e) 9, karyomit(e) 10, karyomit(e) 11, karyomit(e) 12, karyomit(e) 13, karyomit(e) 14, karyomit(e) 15 and karyomit(e) 16.Preferably, this group chromosome is a group that is selected from karyomit(e) 2, karyomit(e) 3, karyomit(e) 4, karyomit(e) 5, karyomit(e) 6 and karyomit(e) 8.
In another embodiment, by individually and by normalization method chromosomal all may be used in combination each may normalization method karyomit(e), the normalization method sequence (explaining such as the application's elsewhere) that system-computed all may karyomit(e) dose determination chromosome x thus.Therefore, in one embodiment, the normalization method karyomit(e) that the normalization method sequence of chromosome x is comprised of this group of karyomit(e) 4 and karyomit(e) 8.
In one embodiment, the normalization method sequence of karyomit(e) 13 is a karyomit(e) that is selected from karyomit(e) 2, karyomit(e) 3, karyomit(e) 4, karyomit(e) 5, karyomit(e) 6, karyomit(e) 7, karyomit(e) 8, karyomit(e) 9, karyomit(e) 10, karyomit(e) 11, karyomit(e) 12, karyomit(e) 14, karyomit(e) 18 and karyomit(e) 21.Preferably, the normalization method sequence of karyomit(e) 13 is a karyomit(e) that is selected from karyomit(e) 2, karyomit(e) 3, karyomit(e) 4, karyomit(e) 5, karyomit(e) 6, and karyomit(e) 8.In another embodiment, the normalization method sequence of karyomit(e) 13 is group chromosomes that are selected from karyomit(e) 2, karyomit(e) 3, karyomit(e) 4, karyomit(e) 5, karyomit(e) 6, karyomit(e) 7, karyomit(e) 8, karyomit(e) 9, karyomit(e) 10, karyomit(e) 11, karyomit(e) 12, karyomit(e) 14, karyomit(e) 18 and karyomit(e) 21.Preferably, this group chromosome is a group that is selected from karyomit(e) 2, karyomit(e) 3, karyomit(e) 4, karyomit(e) 5, karyomit(e) 6 and karyomit(e) 8.
In another embodiment, to use individually each may normalization method karyomit(e) and the chromosomal all possible combinations of normalization method for the normalization method sequence of karyomit(e) 13, (the explaining such as the application's elsewhere) all may karyomit(e) dosage determined by system-computed.Therefore, in one embodiment, the normalization method sequence of karyomit(e) 13 is the normalization method karyomit(e) that comprises this group of karyomit(e) 4 and karyomit(e) 5.In another embodiment, the normalization method karyomit(e) that formed by this group of karyomit(e) 4 and karyomit(e) 5 of the normalization method sequence of karyomit(e) 13.
Be independent of which the normalization method karyomit(e) that uses in determining karyomit(e) Y dosage, the variation in the karyomit(e) dosage of karyomit(e) Y is greater than 30.Therefore, one group of two or more karyomit(e) that is selected from karyomit(e) 1-22 and chromosome x can be used as the normalization method sequence of karyomit(e) Y.In one embodiment, the group chromosome that formed by karyomit(e) 1-22 and chromosome x of at least one normalization method karyomit(e).In another embodiment, this group chromosome is comprised of karyomit(e) 2, karyomit(e) 3, karyomit(e) 4, karyomit(e) 5 and karyomit(e) 6.
In another embodiment, by individually and by normalization method chromosomal all may be used in combination each may normalization method karyomit(e), the normalization method sequence (explaining such as the application's elsewhere) that system-computed all may karyomit(e) dose determination karyomit(e) Y thus.Therefore, in one embodiment, the normalization method sequence of karyomit(e) Y is the normalization method karyomit(e) that comprises this group chromosome that is comprised of karyomit(e) 4 and karyomit(e) 6.In another embodiment, the normalization method karyomit(e) that the normalization method sequence of karyomit(e) Y is comprised of a group chromosome, this group chromosome is comprised of karyomit(e) 4 and karyomit(e) 6.
Normalization method sequence for the dosage that calculates interested coloured differently body or interested different sections can be identical, and perhaps respectively for coloured differently body or section, it can be different normalization method sequence.For example, the normalization method sequence, the normalization method sequence of interested karyomit(e) A (for example normalization method karyomit(e)) (one or one group) can be identical, and perhaps it can be different from the normalization method sequence (for example normalization method karyomit(e)) (or a group) of interested karyomit(e) B.
Complete chromosomal normalization method sequence can be a complete karyomit(e) or one group of complete karyomit(e), and perhaps it can be chromosomal section, or one or more chromosomal one group of section.
The normalization method sector sequence is as chromosomal normalization method sequence
In another embodiment, chromosomal normalization method sequence can be the normalization method sector sequence.The normalization method sector sequence can be single section, and perhaps it can be chromosomal one group of section, and perhaps they can be a plurality of sections from two or more coloured differently bodies.The system-computed of the whole combinations by sector sequence in the genome can be determined the normalization method sector sequence.For instance, the normalization method sector sequence of karyomit(e) 21 can be that for example the normalization method section can be a section of karyomit(e) 9, its about 140Mbp than the large or little single section of size of the karyomit(e) 21 of about 47Mbp (megabase to).Scheme as an alternative, the normalization method sequence of karyomit(e) 21 can be for example from the combination of the sector sequence of two coloured differently bodies (for example from karyomit(e) 1 with from karyomit(e) 12).
In one embodiment, the normalization method sequence for karyomit(e) 21 is the section of karyomit(e) 1-20,22, X and Y or a normalization method sector sequence of one group of two or more section.In another embodiment, the normalization method sequence for karyomit(e) 18 is the section of karyomit(e) 1-17,19-22, X ' and Y or organizes section more.In another embodiment, the normalization method sequence for karyomit(e) 13 is the section of karyomit(e) 1-12,14-22, X ' and Y or organizes section more.In another embodiment, the normalization method sequence for chromosome x is the section of karyomit(e) 1-22 and Y or organizes section more.In another embodiment, the normalization method sequence for karyomit(e) Y is a section or one group of section of karyomit(e) 1-22 and X.Can determine single or organize the normalization method sequence of section more the whole karyomit(e)s in the genome.Two or more sections of normalization method sector sequence can be from a chromosomal section, perhaps these two or more sections section that can be two or more coloured differently bodies.As illustrated to the normalization method chromosome sequence, a normalization method sector sequence can be identical for two or more coloured differently bodies.
The normalization method sector sequence is as the normalization method sequence of chromosome segment
When interested sequence is a chromosomal section, can determine to exist or do not exist the CNV of interested sequence.Variation in the copy number of chromosome segment allows to determine to exist or do not exist a kind of chromosome dyad dysploidy.Below explanation is the example of the chromosome dyad dysploidy that is associated from different fetal abnormalities and the state of an illness.Chromosomal section can have any length.For example, it can scope from kilobase to several hundred million bases.Human genome only accounts for and surpasses 3,000,000,000 DNA bases, and it can be divided into tens of, thousands of, hundreds thousand of and millions of sections with different sizes, and their copy number can the method according to this invention be determined.The normalization method sequence of a chromosome segment is a kind of like this normalization method sector sequence, and it can be from any one single section among karyomit(e) 1-22, X and the Y, and perhaps it can be from any one one group of section among karyomit(e) 1-22, X and the Y.
Normalization method sequence for an interested section is such sequence, and this sequence has the variability of striding a plurality of karyomit(e)s and striding a plurality of samples, and this variability is near the variability of interested segment.When this normalization method sequence is any one or a plurality of one group of section among karyomit(e) 1-22, X and the Y, can carry out as described determining of normalization method sequence, be used for determining interested chromosomal normalization method sequence.By use for the interested section in each sample of a combination lattice sample (be known be the diplontic sample of interested section) as one of two or more sections of normalization method sequence and all possible combination come calculation of sector dosage, can identify the normalization method sector sequence of one or one group section, and this normalization method sequence has been confirmed as providing the normalization method sequence of a section dosage, this section dosage is striden whole qualified samples and is had minimum variability for this interested section, as above explanation to the normalization method chromosome sequence.
For example, it is 1Mb (megabasse) to interested section, approximately 300 ten thousand sections of residue (deducting interested 1mg section) in the 3Gb human genome can be used in combination individually or mutually, to calculate the section dosage of the interested section in the sample of qualified clusters, will be as normalization method sector sequence qualified and sample test thereby which is determined or where organize section.Interested section can change to tens million of bases from about 1000 bases.The normalization method sector sequence can be made of the one or more sections identical with interested sequence size.In other embodiments, the normalization method sector sequence can be by being different from interested sequence, and/or the section that differs from one another consists of.For example, can be that 20,000 bases are long for the normalization method sequence of the sequence of 100,000 base length, and for example can comprise at 7,000+8 000+5, the combination of the sequence of the different lengths of 000 base.Illustrated to the normalization method chromosome sequence such as the elsewhere in the application, by independently and may be used in combination each possible normalization method chromosome segment with normalization method section whole and systematically calculate all possible karyomit(e) and/or section dosage, can determine normalization method sector sequence (explaining such as the application's elsewhere).To whole sections and/or the karyomit(e) in the genome, can determine section single or in groups.
Normalization method sequence for the dosage that calculates interested coloured differently tagma section can be identical, and perhaps it can be the different normalization method sequences for different interested chromosome segments.For example, normalization method sequence for interested chromosome segment A, for example a normalization method section (or a group) can be identical, and perhaps it can be different from the normalization method sequence for interested chromosome segment B, for example a normalization method section (or a group).
The normalization method chromosome sequence is as the normalization method sequence of chromosome segment
In another embodiment, the variation of the copy number of chromosome segment can use normalization method karyomit(e) to determine that this normalization method karyomit(e) can be aforesaid monosome or a group chromosome.The normalization method chromosome sequence can be by determining that systematically which or which group chromosome makes the variability of karyomit(e) dosage in the combination lattice sample minimum, comes normalization method karyomit(e) or karyomit(e) group for interested karyomit(e) identification in the combination lattice sample.For instance, there is for determining or do not exist the excalation of karyomit(e) 7, be used for the normalization method karyomit(e) of analysis part disappearance or karyomit(e) group and be at first being identified as at a combination lattice sample karyomit(e) or the karyomit(e) group of the minimum normalization method sequence of the karyomit(e) dosage that makes whole karyomit(e) 7.As at this elsewhere for as described in the interested chromosomal normalization method chromosome sequence, can pass through to use the indivedual and all possible karyomit(e) dosage of the chromosomal institute of normalization method possible combined system ground calculating of each possibility normalization method karyomit(e), determine the normalization method chromosome sequence (as explaining at this elsewhere) of chromosome segment.Can determine monosome or karyomit(e) group for all chromosome segments in the genome.Illustrate and use normalization method karyomit(e) to determine that the example that exists chromosome dyad disappearance and chromosome dyad to copy is provided as example 17 and 18.
In certain embodiments, determine the CNV of chromosome segment by the section or the data box that at first interested karyomit(e) are divided into again variable-length.Data box length can be at least about 1kbp, at least about 10kbp, at least about 100kbp, at least about 1mbp, at least about 10mbp or at least about 100mbp.Data box length is less, and the resolving power that obtains in order to locate the CNV of section in the interested karyomit(e) is higher.
Determine to exist or do not exist the CNV of interested chromosome segment can be by each dosage of interested chromosomal data box in the specimen is compared to realize with average for each corresponding data case dosage of determining of data box of equal length in the combination lattice sample.The normalized binary value of each data box can be calculated as normalized binary value (NBV) as above as described in normalized section value, this normalized binary value is associated the data box dosage in the specimen with the average of corresponding data case dosage in the combination lattice sample.This NBV is calculated as:
NSV ij = x ij - μ ^ j σ ^ j
Wherein With
Figure BDA00002366924900743
Respectively estimation average and the standard deviation for j data box dosage in the combination lattice sample, and x IjJ the data box dosage that specimen i is observed.
Determine the dysploidy in the specimen
Based on one or more normalization method sequences of identifying in the qualified samples, determine a sequence dosage for an interested sequence in specimen, this sample comprises nucleic acid mixture, and these nucleic acid are derived from different genome on one or more interested sequences.
In step 115, obtain a specimen from suspection or a known experimenter who carries the clinical relevant CNV of interested sequence.This specimen can be a kind of biological fluid (for example blood plasma) or any suitable sample as described below.As described, sample can use such as Noninvasive programs such as simple blood drawings and obtain.In some embodiments, specimen contains the mixture of nucleic acid molecule (for example cfDNA molecule).In some embodiments, this specimen is a Maternal plasma sample that contains the mixture of fetus and cfDNA molecule parent.
In step 125, such as the situation illustrated to qualified samples, the test of at least a portion in this specimen nucleic acid is checked order, to produce millions of sequence reading (for example 36bp reading).In in step 120, the reading that produces from the nucleic acid this specimen is checked order be mapped to uniquely one with reference on the genome or with one with reference to genome alignment to produce label.As described in the step 120, from the genomic reading of mapping reference uniquely, obtain at least about 3x106 qualified sequence label, at least about 5x106 qualified sequence label, at least about 8x106 qualified sequence label, at least about 10x106 qualified sequence label, at least about 15x106 qualified sequence label, at least about 20x106 qualified sequence label, at least about 30x106 qualified sequence label, at least about 40x106 qualified sequence label or at least about 50x106 qualified sequence label, these qualified sequence labels comprise 20 and 40bp between reading.In certain embodiments, the reading that produces by sequencing device provides with electronic format.Use the calculating device of following discussion to finish comparison.Individual readings and the frequent greatly reference genome of (millions of base pairs) are compared, with the identification reading with reference to site corresponding to genome uniqueness.In certain embodiments, comparison program allows reading and with reference to mispairing limited between the genome.In some cases, allow in the reading 1,2 or 3 base pairs with reference to corresponding base-pair mismatch in the genome, yet still produce mapping.
In step 135, use calculating device as described below, all that will obtain from the nucleic acid the specimen is checked order or most of label counting are to determine cycle tests label density.In certain embodiments, compared in each reading and the genomic concrete zone of reference (in most of the cases being a karyomit(e) or section), and by site information is appended on the reading, make reading be transformed into label.When this process was carried out, calculating device can keep being mapped to the number of label/reading with reference to genomic each zone (in most of the cases being karyomit(e) or the section) counting that rolls.Store the corresponding normalization method karyomit(e) with each of each interested karyomit(e) or section or section person's counting.
In certain embodiments, have one or more zones that are excluded with reference to genome, this or these zone that is excluded is the part of real biological genome, but is not included in reference in the genome.To may not counting with the reading of comparing in these zones that is excluded.The example in the zone that is excluded comprises zone similarity between zone, X and the Y chromosome of long tumor-necrosis factor glycoproteins etc.
In certain embodiments, the method determine when a plurality of readings when comparing with reference to the same site on genome or the sequence whether to label counting above once.May exist two labels have identical sequence therefore with reference sequences on identical site when comparing.Same label derived from identical order-checking sample can be repelled outside counting in some cases in order to the method for counting label.If the label of disproportionate number is identical in the set sample, show so to have huge deviation or other defect in the program.Therefore, according to some embodiment, counting process not to from set sample with count from the identical label of the former label of counting of this sample.
When ignoring identical label from simple sample, different indexs can be set be used for selecting.In certain embodiments, the counting label that defines per-cent must be unique.If the label of Duoing than this threshold value is not unique, ignore so these labels.For instance, be unique if define per-cent requirement at least 50%, until the per-cent of the unique label of sample surpasses 50%, just count identical label so.In other embodiments, the chain-reacting amount of unique label is at least about 60%.In other embodiments, the critical percentage of unique label is at least about 75%, or at least about 90%, or at least about 95%, or at least about 98%, or at least about 99%.For karyomit(e) 21, threshold value can be located at 90% time.If 30M label and karyomit(e) 21 are compared, the label of 27M must be unique so at least.If 3M counting label is not unique and the 30th, 000,000 label is not unique, it is not counted interior so.
Suitable statistical study be can use, concrete threshold value or other indexs of not counting identical in addition label in order to determine when selected.A factor that affects this threshold value or other standards is the amount of the genomic size that can compare with respect to label of order-checking sample.Other factors comprise size and the similar Consideration of reading.
In one embodiment, be mapped to a sequence label number on the interested sequence and be normalized to them and be mapped on the known length of a top interested sequence, so that a cycle tests label density ratio to be provided.As described in to these qualified samples, might not require to normalize on the known length of an interested sequence, thereby and this can be included as a step and reduce by a digit in the number it is simplified for human interpretation.Along with the cycle tests label that all shines upon in the specimen all is counted, sequence label density for interested sequence (sequence of for example being correlated with clinically) in these specimen is determined, what be determined equally is sequence label density for additional sequences, and these additional sequences are corresponding at least one the normalization method sequence that identifies in these qualified samples.
In step 150, based on the identification of at least one the normalization method sequence in these qualified samples, an interested sequence in the specimen is determined relevant cycle tests dosage.In different embodiments, cycle tests dosage is determined in the mode of calculating by the sequence label density of operation as interested sequence described here and corresponding normalization method sequence.Be responsible for the cognation between the interested sequence of calculating device electronics access of this task normalization method sequence associated with it, it can be stored in database, table, the chart or as code and be included in the programmed instruction.
Illustrated such as the elsewhere in the application, this at least one normalization method sequence can be a simple sequence or one group of sequence.In specimen, be sequence label density that interested sequence in this specimen is determined and the ratio of the sequence label density of at least one normalization method sequence of in this specimen, determining for the sequence dosage of an interested sequence, wherein the normalization method sequence in this specimen corresponding in these qualified samples for the normalization method sequence of interested concrete recognition sequence.For example, if not being confirmed as for the normalization method sequence of the identification of the karyomit(e) 21 in these qualified samples is a karyomit(e) (for example karyomit(e) 14), just be confirmed as for the sequence label density of karyomit(e) 21 and ratio for the sequence label density of karyomit(e) 14 for the cycle tests dosage of karyomit(e) 21 (interested sequence) so, each is determined in specimen.Similarly, determined for karyomit(e) 13,18, X, Y and other chromosomal karyomit(e) dosage of being associated with multiple karyomit(e) dysploidy.Can be one or a group chromosome for interested chromosomal normalization method sequence, or one or a group chromosome section.As mentioned above, an interested sequence can be a chromosomal part, for example a chromosome segment.Therefore, can be confirmed as the sequence label density determined for this section in this specimen and ratio for the sequence label density of the normalization method chromosome segment in this specimen for the dosage of a chromosome segment, wherein the normalization method section in this specimen corresponding in these qualified samples for the normalization method section (single or one group of section) of interested concrete section identification.Chromosome segment can be that scope is from kilobase (kb) to megabasse (Mb) in size.(for example about 1kb is to 10kb, or about 10kb is to 100kb, or about 100kb is to 1Mb). <}0{>
In step 155, from the standard deviation of setting up to the qualified sequence dosage determined a plurality of qualified samples with to the known sequence dosage that is the aneuploid sample of interested sequence is determined, derive a plurality of threshold values.Note the analysis asynchronous execution of this operate typical ground and patient's specimen.It can with for example select the normalization method sequence to carry out simultaneously from qualified samples.Accurately classification is depended on for the difference between the probability distribution of different classes of (that is: dysploidy type).In some instances, from distributing, the experience for the dysploidy (for example trisomy 21) of each type selects a plurality of threshold values.As described in the example, for being classified, trisomy 13, trisomy 18, trisomy 21 and monosomy X dysploidy set up possible threshold value, they have illustrated and have been used for by to extracting the purposes that checks order to determine the method for karyomit(e) dysploidy from the cfDNA of a maternal sample that this maternal sample comprises mixture fetus and nucleic acid parent.Be confirmed as be used to picking out for a kind of chromosomal non-multiple this threshold value of affected sample and be confirmed as that the threshold value of influenced sample can be identical or different be used to picking out for a kind of different dysploidy.As shown in these examples, be that variability from the interested chromosomal dosage of striding a plurality of samples and a plurality of order-checking rounds is determined for each interested chromosomal threshold value.Mutability for any interested any chromosomal karyomit(e) dosage is less, and is just narrower for the dispersion in the interested chromosomal dosage of striding whole uninfluenced samples, and these samples are used to set the threshold value for determining different dysploidy.
Get back to and the technical process that patient's specimen is classified and is associated, in step 160, by comparing with at least one threshold value of setting up from these qualified samples dosage for the cycle tests dosage of interested sequence, in this specimen, determined the copy number variation of interested sequence.This operation can be by carrying out in order to the identical calculations device of measuring sequence label density and/or calculation of sector dosage.
In step 165, to compare for interested the cycle tests dosage that calculates and the dosage that is set as threshold value, and the selection of these threshold values is the reliability thresholds according to user definition, take this with this sample classification for " normally ", " affected " or " nothing judgement (no call) ".These " without judging " samples are can not make the really sample of etiologic diagnosis of reliability to it.The influenced sample of each type (for example trisomy 21,21 partial trisomies, X monosomy) all has its oneself threshold value, and one is used for judging that normal (uninfluenced) sample and another are used for judging influenced sample (although two threshold values overlap in some cases).As describing at this elsewhere, in some cases, if the fetus mark of specimen amplifying nucleic acid is enough high, can be transformed into judgement (influenced or normal) without judging so.The classification of cycle tests can be passed through the calculating device report for other operations of this technical process.In some cases, classification is reported with electronic format, and can be shown, sends e-mails, send short messages to relevant people etc.
Some embodiment provides a kind of method, and the method is used for being provided at the antenatal diagnosis of the fetus dysploidy of a biological sample that comprises fetus and nucleic acid molecule parent.This diagnosis is based on following steps and makes: obtain the sequence information to checking order derived from least a portion in nucleic acid molecule mixture fetus and parent of a biology specimen (for example Maternal plasma sample); From this sequencing data, calculate for one or more interested chromosomal normalization method karyomit(e) dosage and/or for a normalization method section dosage of one or more interested sections; And determine in this specimen accordingly interested chromosomal karyomit(e) dosage for this and/or for one between the section dosage of this interested section and the threshold value of in a plurality of qualified (normally) samples, establishing significant difference statistically, and provide antenatal diagnosis based on this statistical discrepancy.As described in the step 165 of the method, make a normal or affected diagnosis.Can not make in the situation of normal or affected diagnosis confidently, one " without judging " is being provided.
Sample and sample processing
Sample
The sample that is used for definite CNV such as chromosomal aneuploidy, part dysploidy can comprise the sample of the copy number variation that will determine one or more interested sequences of taking from any cell, tissue or organ.Wish that these samples comprise nucleic acid and/or " acellular " nucleic acid (for example cfDNA) that is present in the cell.
In certain embodiments, advantageously obtain acellular nucleic acid, for example Cell-free DNA (cfDNA).The acellular nucleic acid that comprises Cell-free DNA can from the biological sample that includes but not limited to blood plasma, serum and urine, obtain by diverse ways as known in the art (referring to such as people such as models (Fan), periodical (the Proc Natl Acad Sci) 105:16266-16271[2008 of institute of NAS]; Littlely go out people such as (Koide), antenatal diagnosis (Prenatal Diagnosis) 25:604-607[2005]; People such as old (Chen), Natural medicine (Nature Med.) 2:1033-1035[1996]; The people such as Lu (Lo), lancet (Lancet) 350:485-487[1997]; Baud is pricked figure people such as (Botezatu), clinical chemistry (Clin Chem.) 46:1078-1084,2000; And the people such as (Su) of Soviet Union, molecular diagnostics magazine (J Mol.Diagn.) 6:101-107[2004]).For with Cell-free DNA in the sample and cellular segregation, can use diverse ways, include but not limited to fractional separation, centrifugal (for example density gradient centrifugation), DNA specificity precipitation or high-flux cell sorting and/or other separation methods.Can obtain for manually (Indianapolis, state of Indiana city Luo Shi diagnoses (Roche Diagnostics with the commercially available test kit that automatically separates cfDNA, Indianapolis, IN), the triumphant outstanding person (Qiagen in Valencia, California city, Valencia, CA), Delaware State Di Lun city Mai Kairuinajieer (Macherey-Nagel, Duren, DE)).The biological sample that comprises cfDNA has been used for by detecting the order-checking check of chromosomal aneuploidy and/or different polymorphism, be used in the check of determining to exist or do not exist such as chromosome abnormalties such as trisomy 21s.
In different embodiments, be present in before use (for example before the preparation sequencing library) specific enrichment or the nonspecific enrichment of cfDNA in the sample.The nonspecific enrichment of sample DNA refers to the whole genome amplification of the genomic DNA fragment of sample, and it is used in the front content that improves sample DNA of preparation cfDNA sequencing library.Nonspecific enrichment can be at the selective enrichment that comprises one of two genomes existing in the genomic sample more than one.For instance, nonspecific enrichment can have selectivity to Fetal genome in the maternal sample, and it can realize increasing in the sample foetal DNA with respect to the ratio of mother body D NA by currently known methods.Scheme as an alternative, nonspecific enrichment can be two genomic non-selective amplifications that exist in the sample.For instance, nonspecific amplification can be the amplification of fetus and mother body D NA in comprising from the sample of the mixture of the DNA of fetus and maternal gene group.The method of whole genome amplification is known in the art.Degenerate oligonucleotide primed PCR (DOP), primer extension round pcr (PEP) and multiple displacement amplification (MDA) are the examples of whole genome amplification method.In certain embodiments, comprise the genomic cfDNA that exists in not enrichment of the sample mixture from the mixture of the cfDNA of different genes group.In other embodiments, comprise that the not specific enrichment of sample from the mixture of the cfDNA of different genes group is present in any genome in the sample.
The applied sample of nucleic acid that comprises of method described here typically comprises biological sample (" specimen "), and is for example above-described.In certain embodiments, come purifying or separate the nucleic acid that preparation is screened one or more CNV by the either method in a large amount of well-known methods.
Therefore, in certain embodiments, sample comprises or it consists of polynucleotide through purifying or separation, maybe can comprise samples such as tissue sample, biological fluid sample, cell sample.The biological fluid sample that is fit to includes but not limited to that blood, blood plasma, serum, sweat, tears, phlegm, urine, phlegm, ear effluent, lymph, saliva, brains liquid, irrigating solution (ravages), marrow suspension, vaginal fluid, transcervical irrigating solution, brain liquid, ascites, milk, respiratory tract, intestines and genitourinary tract secretory product, amniotic fluid, milk and white corpuscle penetrate sample.In certain embodiments, sample is by non-invasive program easily obtainable sample, for example blood, blood plasma, serum, sweat, tears, phlegm, urine, phlegm, ear effluent, saliva or the ight soil crossed.In certain embodiments, sample is blood plasma and/or the serum part of peripheral blood sample or peripheral blood sample.In other embodiments, this biological sample is cotton swab or smear, examination of living tissue sample or cell cultures.In another embodiment, this sample is the mixture of two or more biological samples, and the biological example product of imitating can comprise two or more biological fluid samples, tissue sample and cell cultures sample.As used in this, term " blood ", " blood plasma " and " serum " are clearly contained their classification part or the part of processing.Similarly, when a sample is when taking from a kind of examination of living tissue, cotton swab, smear etc., should " sample " contain clearly separated part or part derived from the processing of this examination of living tissue, cotton swab, smear etc.
In certain embodiments, sample can derive from a plurality of sources, include but not limited to: from the sample of Different Individual, sample from the different stages of development of identical or different individuality, from different diseased individuals (for example suffering from individuality cancer or that suspection has genetic block), the sample of normal individual, the sample that obtains in the different steps of the disease of individuality, derive from experience to the sample of the individuality of the difference treatment of disease, sample from the individuality that experiences the varying environment factor, from the sample to a kind of individuality of state of an illness susceptible, from individuality that is exposed to a kind of transmissible disease factor (for example HIV) etc.
One schematic but in the nonrestrictive embodiment, this sample is the maternal sample that derives from conceived female (for example pregnant woman).In this case, this sample can be analyzed with the method in this explanation, so that the antenatal diagnosis of potential chromosome abnormalty in the fetus to be provided.This maternal sample can be tissue sample, biological fluid sample or cell sample.Biological fluid comprises (as limiting examples): blood, blood plasma, serum, sweat, tears, phlegm, urine, phlegm, the ear effluent, lymph, saliva, cerebrospinal fluid, irrigating solution, marrow suspension, vagina effluent, through the irrigating solution of uterine neck, brain liquid, ascites, milk, the secretory product of breathing, intestines and genitourinary tract, and white corpuscle exclusion sample.
In another schematic but nonrestrictive embodiment, maternal sample is the mixture of two or more biological samples, and for example, this biological sample can comprise two or more biological fluid samples, tissue sample and cell cultures sample.In some embodiments, this sample is by non-invasive process obtainable sample easily, for example, and blood, blood plasma, serum, sweat, tears, phlegm, urine, milk, phlegm, ear effluent, saliva and ight soil.In some embodiments, this biological sample is peripheral blood sample and/or its blood plasma or serum part.In other embodiments, this biological sample is the sample of cotton swab or smear, examination of living tissue sample or cell cultures.Such as above disclosure, term " blood ", " blood plasma " and " serum " are clearly contained their separated part or the part of processing.Similarly, when a sample was taken from examination of living tissue, cotton swab, smear etc., this " sample " clearly contained separated part or the part derived from the processing of examination of living tissue, cotton swab, smear etc.
In certain embodiments, sample can also be that the tissue that derives from vitro culture, cell or other contain the source of polynucleotide.The sample of these cultivations can be taken from a plurality of sources, include but not limited to: maintain the culture (for example tissue or cell) under different culture media and the condition (for example pH value, pressure or temperature), kept the culture (for example tissue or cell) of the period of different lengths, with the different factors or reagent (drug candidate for example, or conditioning agent) culture of processing (for example tissue or cell), or the culture of dissimilar tissues and/or cell.
From the method for biological origin isolating nucleic acid be people know and the character that depends on the source with difference.Those of ordinary skill in the art can easily isolate from a source as at needed one or more nucleic acid of the method for this explanation.In some cases, can be favourable with the nucleic acid molecule fragmentization in the nucleic acid samples.Fragmentation can be at random, and perhaps it can be special, the situation of for example using digestion with restriction enzyme to reach.The method that is used for random fragmentation is known in this area, and comprises that for example restricted dnase digestion, alkaline purification and physics are sheared.In one embodiment, sample nucleic acid obtains with the cfDNA form, and it does not experience fragmentation.
In other schematic embodiments, sample nucleic acid obtains with the genomic dna form, and into about 300 or more, about 400 or more or about 500 or the fragment of more base pairs, and the NGS method can easily be applied thereon by fragmentation for it.
The sequencing library preparation
In one embodiment, method described here can be utilized sequencing technologies of future generation (NGS), and these technology allow a plurality of samples to check order individually (be single channel order-checking) with genome molecule form or compile sample in single order-checking batch check order (for example multiple order-checking) as what comprise the genome molecule of indexing.These methods can produce several hundred million readings that reach of dna sequence dna.In different embodiments, the sequence of genomic nucleic acids and/or the genomic nucleic acids of indexing for example can use sequencing technologies of future generation described here (NGS) to determine.In different embodiments, can be with analyze a large amount of sequence datas that use NGS to obtain such as one or more treaters described here.
In different embodiments, the use of these sequencing technologies does not relate to the preparation of sequencing library.
Yet, in certain embodiments, relate to the preparation of sequencing library at this sequence measurement of containing.In an exemplary process, the preparation of sequencing library comprises the dna fragmentation (for example polynucleotide) that a series of preparations through the aptamer modification at random of generation are checked order.The sequencing library of polynucleotide can be from the coordinator that comprises DNA or cDNA (for example as the complementary or copy DNA that is produced by the RNA template DNA or cDNA) under the effect of ThermoScript II, analogue in interior DNA or RNA preparation.Polynucleotide can originate in bifilar form (for example dsDNA (for example genomic DNA fragment), cDNA, pcr amplification product etc.), or in certain embodiments, polynucleotide can originate in sub-thread form (for example ssDNA, RNA etc.) and be transformed into the dsDNA form.For instance, in certain embodiments, sub-thread mRNA molecule can copy into the bifilar cDNA that is applicable to prepare sequencing library.The accurate sequence of main polynucleotide molecule is unimportant concerning the method for library preparation generally, and may be known or unknown.In one embodiment, polynucleotide molecule is dna molecular.More particularly, in certain embodiments, polynucleotide molecule represents the whole hereditary complement of organism or the whole hereditary complement of organism in fact, and is to comprise that typically intron sequences and exon sequence (encoding sequence) and non-coding regulate the genomic dna molecule (for example cell DNA, Cell-free DNA (cfDNA) etc.) of sequence (for example promotor and strengthen subsequence).In certain embodiments, main polynucleotide molecule comprises human genome DNA's molecule, for example is present in the cfDNA molecule in conceived experimenter's the peripheral blood.
Promote the preparation of the sequencing library of some NGS order-checking platform by the polynucleotide with the fragment size that comprises specified range.The preparation in these libraries typically comprises large polynucleotide (for example cell genomic dna) fragmentation to obtain the polynucleotide in the desired size scope.
Can realize fragmentation by in the several different methods known to persons of ordinary skill in the art any one.For instance, can by include but not limited to spray, the mechanical means of sonication and hydraulic shear realize fragmentation.Yet, the machinery fragmentation typically can make dna backbone cracking on C-O, P-O and C-C key, thereby produce have C-O, the P-O of disconnection and the blunt end and 3 of C-C key '-and 5 '-multiphase mixture of overhang (referring to for example A Nairui (Alnemri) and Li Wake (Liwack), journal of biological chemistry (J Biol.Chem) 265:17323-17333[1990]; Richard (Richards) and Bu Waye (Boyer), molecular biology periodical (J Mol Biol) 11:327-240[1965]), these ends may need to repair because its may lack concerning preparation for necessary 5 the needed subsequently enzyme reaction of DNA (connection of the aptamer that for example checks order) of checking order '-phosphoric acid salt.
By contrast, cfDNA typically exists with the pieces less than about 300 base pairs, therefore for producing sequencing library with the cfDNA sample, does not typically need fragmentation.
Typically, no matter polynucleotide are firmly to be broken into fragment (for example exsomatize and be broken into fragment), or naturally exist with pieces, its all to be transformed into have 5 '-phosphoric acid salt and 3 '-the blunt end DNA of hydroxyl.Such as be used for using such as in the standard scheme guides user such as scheme of the described Yi Luna platform order-checking of this elsewhere sample DNA being carried out end reparation, carry out the terminal product of repairing and the product of purifying dA tailing before the aptamer Connection Step that the library prepares with purifying before the dA tailing.
Sequence described here library preparation method's different embodiment need not that the operative norm scheme typically requires in order to obtain one or more steps of modified DNA product that can be by the NGS order-checking.Simple method (ABB method), single stage method and two-step approach have below been described.Continuous dA tailing is connected referred to here as two step process with aptamer.Continuous dA tailing, aptamer connect and increase referred to here as single stage method.In different embodiments, ABB method and two-step approach can be in solution or solid surface carry out.In certain embodiments, single stage method is carried out at solid surface.
Illustrate the standard methods and the simple method (ABB that supplies for the preparation of dna molecular according to embodiment of the present invention to check order by NGS such as Zhu Ru Yi Luna among Fig. 2; Example 2), the comparison of two-step approach and single stage method (example 3-6).
Simple preparation-ABB
In one embodiment, provide the simple method for the preparation of the sequence library (ABB method), it comprises the consecutive steps (ABB) that terminal reparation, dA tailing and aptamer connect.In the embodiment that need not dA tailing step (referring to the scheme of for example using Luo Shi 454 and SOLIDTM3 platform to check order) for the preparation of sequencing library, the terminal step that is connected with aptamer of repairing can be not included in the front step of the product of end reparation being carried out purifying of aptamer connection.
Comprise terminal repair, the sequencing library preparation method of consecutive steps that dA tailing and aptamer connect is referred to here as simple method (ABB), and demonstrate and produced quality and improve unexpectedly the sequencing library (referring to for example example 2) that sample analysis is simultaneously accelerated.According to some embodiments of the method, the ABB method can be carried out in solution, as at this illustration.The ABB method can also be carried out at solid surface, be to repair and the dA tailing by at first in solution DNA being carried out end, and preparation is described to be attached to solid surface with DNA as going on foot for the step or two on solid surface at this elsewhere subsequently.Comprise aptamer is connected to three enzymatic steps of the step on the DNA of dA tail and all carry out in the situation of polyoxyethylene glycol not having.Be used for to carry out and to comprise that the open scheme guides user that aptamer is connected to the ligation of DNA connects existing to carry out in the situation of polyoxyethylene glycol.The applicant determines that aptamer is connected to can carrying out in the situation of polyoxyethylene glycol not having on the DNA of dA tail.
In another embodiment, the preparation sequencing library need not before dA tailing step cfDNA to be carried out end reparation.The applicant determines, the cfDNA that need not to be broken into fragment needn't carry out end reparation, and prepare the cfDNA sequencing library according to embodiment of the present invention and do not comprise terminal step and the purification step repaired, thereby combination enzymatic reaction and further simplify the preparation of DNA to be checked order.CfDNA with blunt end and 3 '-and 5 '-form of mixtures of overhang exists, these ends be make cell genomic dna be cracked into end to be 5 '-phosphoric acid salt and 3 '-produce in vivo under the effect of the nuclease of the cfDNA fragment of hydroxyl.Terminal elimination of repairing step will be selected natural cfDNA molecule and the natural cfDNA molecule with 5 ' overhang that exists with the blunt end molecular form, these 5 ' overhangs by such as for as described below one or more deoxynucleotides are attached to 3 '-polymerase activity of the enzymes such as the circumscribed polysaccharase of Ke Lienuo (Klenow Exo-) of OH (dA tailing) is filled.The elimination that the end of cfDNA is repaired step do not select to have 3 '-overhang (3 '-OH) cfDNA molecule.Unexpectedly, these 3 '-OH cfDNA molecule gets rid of the expression do not affect genome sequence in the library outside sequencing library, this shows that the end of cfDNA molecule repairs step and can exclude (referring to example) from the preparation of sequencing library.Except cfDNA, the not reparation polynucleotide that can be used for preparing the other types of sequencing library comprise the dna molecular that produced by RNA molecule (for example mRNA, siRNA, sRNA) reverse transcription and as the not DNA plerosis molecule from synthetic DNA cloning of phosphorylation primer.When using not the phosphorylation primer, from the DNA of RNA reverse transcription and/or from the DNA (being DNA cloning) of dna profiling amplification also can be after synthetic by polynucleotide kinase phosphorylation.
In another embodiment, the DNA that repairs is not used to prepare sequencing library according to two-step approach, does not wherein comprise the end reparation of DNA, and the DNA that does not repair carries out the dA tailing and is connected this two consecutive steps (referring to Fig. 2) with aptamer.Two-step approach can be in solution or solid surface carry out.When in solution, carrying out, two-step approach comprises utilizes the DNA that obtains from biological sample, do not comprise this DNA carried out terminal step of repairing, and such as add by the activity of some type DNA polymerases such as Plutarch (Taq) polysaccharase or the circumscribed polysaccharase of Ke Lienuo 3 of the polynucleotide of monodeoxyribonucleotide (for example Desoxyadenosine (A)) in the DNA sample of not repairing '-hold.In consecutive steps subsequently, the product of dA tailing is connected to aptamer, and the `T` overhang that exists on the 3 ' end in each double helix zone of these products and commercially available aptamer is compatible.The dA tailing has prevented that the oneself of two blunt end polynucleotide from connecting, and is beneficial to form through connecting the sequence of aptamer.Therefore, in some embodiments, the cfDNA that repairs does not carry out the consecutive steps that the dA tailing is connected with aptamer, is that the DNA that never repairs prepares with the DNA of dA tail wherein and does not carry out purification step after dA adds end reaction.Bifilar aptamer can be connected to the two ends with the DNA of dA tail.Can utilize one group of aptamer or one group of two different aptamer with identical sequence.In different embodiments, can also use one group or a plurality of not on the same group identical or different aptamer.Aptamer can comprise that index sequence is carrying out multiple order-checking to library DNA.Aptamer is connected to randomly carrying out in the situation of polyoxyethylene glycol not having on the DNA of dA tail.
Two go on foot-prepare in solution
In different embodiments, when two-step approach is carried out in solution, but the product of purifying aptamer ligation with remove not the aptamer that connects, may aptamer connected to one another.Purifying can also be selected the size range for the template of cluster generation, can randomly increase first, for example pcr amplification before.Connecting product can be by any one purifying in the several different methods that includes but not limited to gel electrophoresis, solid phase reversible fixing (SPRI) etc.In some embodiments, the DNA of purified connection aptamer increases before order-checking, for example pcr amplification.Some order-checking platform requires library DNA further to carry out another time amplification.For instance, according to the Yi Luna technology, the Yi Luna platform requires the cluster amplification of library DNA to be performed as an integral part of order-checking.In other embodiments, make the DNA sex change of purified connection aptamer and make the single-stranded dna molecule be attached to the flow cell of sequenator.Therefore, in certain embodiments, be used for preparing sequencing library at the DNA that solution is never repaired and comprise from sample acquisition dna molecular for the method for NGS order-checking; And the dna molecular of not repairing that obtains from sample is carried out the consecutive steps that the dA tailing is connected with aptamer.
As above indicated, in different embodiments, these methods of library preparation are integrated in the method for determining such as the copy numbers such as dysploidy variation (CNV).Therefore, in a schematic embodiment, provide a kind of method for determining to exist or do not exist one or more fetal chromosomal aneuploidies, the method comprises: (a) acquisition comprises the maternal sample of the mixture of fetus and parent Cell-free DNA; (b) fetus is separated from described sample with the mixture of parent cfDNA; (c) mixture by fetus and parent cfDNA prepares sequencing library; Wherein prepare this library and comprise cfDNA is carried out the consecutive steps that the dA tailing is connected with aptamer, and wherein prepare this library and do not comprise cfDNA is carried out end reparation, and this preparation is to carry out in solution; (d) at least a portion in this sequencing library is carried out extensive parallel order-checking, in order to obtain the sequence information for fetus in the sample and parent cfDNA; (e) at least temporarily this sequence information is stored in a kind of computer-readable medium; (f) use the sequence information of this storage, identify the number of the sequence label of the normalization method sequence of each in the number of the sequence label of each in one or more interested karyomit(e)s and any or a plurality of interested karyomit(e) in the mode of calculating; (g) use the number of the sequence label of the normalization method sequence of each in the number of the sequence label of each in this or these interested karyomit(e) and this or these the interested karyomit(e), calculate karyomit(e) dosage in this or these interested karyomit(e) each in mode of calculating; And (h) will for each the karyomit(e) dosage in this or these interested karyomit(e) with compare for each the respective threshold in this or these interested karyomit(e), and determine to exist or do not exist thus fetal chromosomal aneuploidy in sample, wherein step (e)-(h) is to use one or more treaters to carry out.This method illustration is in example 3 and 4.
Two steps and step-solid phase preparation
In certain embodiments, sequencing library prepares at solid surface for the described two-step approach in preparation library in solution according to above.Comprise from sample at solid surface preparation sequencing library according to two-step approach obtaining such as dna moleculars such as cfDNA, and carry out the consecutive steps that the dA tailing is connected with aptamer, wherein the aptamer connection is carried out at solid surface.Can use the DNA that repairs or do not repair.In certain embodiments, with the product that connects aptamer from solid surface separation, purifying and before order-checking, increase.In other embodiments, with the product that connects aptamer from solid surface separation, purifying and before order-checking, do not increase.In other other embodiments, the product that connects aptamer is increased, separates and purifying from solid surface.In certain embodiments, purified product is increased.In other embodiments, purified product is not increased.The order-checking scheme can comprise amplification, for example cluster amplification.In different embodiments, the product of the connection aptamer of separation is purified before amplification and/or order-checking.
In certain embodiments, sequencing library is to prepare at solid surface according to single stage method.In different embodiments, comprise from the sample acquisition such as dna moleculars such as cfDNA at solid surface preparation sequencing library according to single stage method, and carry out the consecutive steps of dA tailing, aptamer connection and amplification, wherein the aptamer connection is carried out at solid surface.The product that connects aptamer need not before purifying separated.
Fig. 3 has described to be used for preparing at solid surface two-step approach and the single stage method of sequencing library.Can use the DNA that repairs or do not repair to prepare sequencing library at solid surface.In certain embodiments, use the DNA that does not repair.The example that is used in the DNA that does not repair of preparation sequencing library on the solid surface includes but not limited to cfDNA, the DNA (being the phosphorylated cdna amplicon) that has used the phosphorylation primer from the DNA of RNA reverse transcription, to use the phosphorylation primer to increase from dna profiling.The example of DNA that is used on the solid surface reparation of preparation sequencing library includes but not limited to cfDNA and the genomic dna that becomes fragment that forms blunt end and phosphorylation (phosphorylated cdna of the reparation that namely produces by RNA reverse transcriptions such as mRNA, sRNA, siRNA).In some schematic embodiment, the cfDNA that does not repair that obtains from maternal sample is used to prepare sequencing library.
Solid surface preparation sequencing library comprise surperficial with first part's applying solid of two portions binding substances, be attached to aptamer by the second section with two portions binding substances and modify the binding interactions of the first aptamer and first and second part by two portions binding substances aptamer is fixed on the solid surface.For instance, can comprise an end that polypeptide, polynucleotide or small molecules is attached to the library aptamer at solid surface preparation sequencing library, this polypeptide, polynucleotide or small molecules can form in conjunction with mixture with the polypeptide, polynucleotide or the small molecules that are fixed on the solid surface.Can be used for any upholder that immobilized polypeptide, polynucleotide or micromolecular solid surface include but not limited to plastics, paper, film, filter paper, chip, pin or slide glass, silica or polymer beads (for example polypropylene, polystyrene, polycarbonate), 2D or 3D molecular skeleton or be used for solid-phase synthetic peptide or polynucleotide.
One-tenth key between polypeptide-polypeptide, polypeptide-polynucleotide, polypeptide-small molecules and polynucleotide-polynucleotide binding substances can be covalently or non-covalently.Preferably, in conjunction with mixture by the non covalent bond combination.For instance, be used in the binding substances for preparing sequencing library on the solid surface and include but not limited to streptavidin-vitamin H binding substances, antibody-antigen binding substances and ligand-receptor binding substances.The example that is used on the solid surface polypeptide of preparation sequencing library-polynucleotide binding substances includes but not limited to that DNA-is in conjunction with the protein-dna binding substances.The example that is used on the solid surface polynucleotide of preparation sequencing library-polynucleotide binding substances includes but not limited to oligodT-oligoA and oligodT-oligodA.The example of polypeptide-small molecules and polynucleotide-small molecules binding substances comprises streptavidin-vitamin H.
According to the embodiment (step and two steps) of as shown in Figure 3 solid surface method, use the solid surface that is coated with the container (for example polypropylene PCR manages or 96 porose discs) for the preparation of sequencing library such as polypeptide such as streptavidins.The end of first group of aptamer is modified by small molecules such as attached biological example element molecules, and biotinylated aptamer is incorporated into the streptavidin (1) on the solid surface.Subsequently, the DNA that does not repair or repair is connected on the biotinylation aptamer of streptavidin combination, thereby is fixed on the solid surface (2).Second group of aptamer is connected on the fixing DNA (3).
Two go on foot-prepare in solid phase
In one embodiment, two-step approach is to carry out with the DNA that does not repair such as cfDNA etc., is used for preparing sequencing library at solid surface.The DNA that does not repair carries out the dA tailing by being attached on 3 ' end of thigh of the DNA that does not repair such as cfDNA etc. such as mononucleotide bases such as dA.Randomly, a plurality of nucleotide bases can be attached on the DNA that does not repair.The mixture that comprises the DNA of band dA tail is added in the aptamer that is fixed on the solid surface, and this DNA is connected on the aptamer.It is continuous that DNA is carried out the step that the dA tailing is connected with aptamer, namely do not carry out purifying through the product of dA tailing (as among Fig. 2 for shown in the two-step approach).As mentioned above, aptamer can have with the dna molecular of not repairing on the overhang of overhang complementation.Subsequently, second group of aptamer is added in the DNA-biotinylation aptamer mixture so that the DNA library that connects aptamer to be provided.Randomly, prepare the library with the DNA that repairs.The DNA that repairs has become fragment and has carried out 3 ' and the genomic dna repaired of the unorganized ferment of 5 ' end.In one embodiment, in such as the consecutive steps that described terminal reparation, dA tailing and aptamer connect for the simple method of carrying out in solution, to carry out end reparation such as DNA such as parent cfDNA, dA tailing and aptamer are connected on the aptamer that is fixed on the solid surface.
In utilizing some embodiment of two-step approach, the DNA that connects aptamer is separated (4a Fig. 2), purifying (among Fig. 2 5) by chemistry or physical means (for example heat, ultraviolet etc.) from solid surface, and randomly, beginning the Cheng Qian of checking order, it increases in solution.In other embodiments, the DNA that connects aptamer is not increased.In situation about not increasing, the aptamer that is connected to DNA can be configured to comprise the sequence (people such as Ku Zhariwa (Kozarewa) of the oligonucleotide hybridization that exists on the flow cell with sequenator, natural method (Nat Methods) 6:291-295[2009]), and avoided the amplification of meeting introducing for the sequence that the flow cell of library DNA and sequenator is hybridized.As described for the DNA of the connection aptamer that in solution, produces, extensive parallel order-checking (among Fig. 2 6) is carried out in the library of the DNA that connects aptamer.In certain embodiments, order-checking is the extensive parallel order-checking of using by the synthesis method order-checking of reversible dyestuff terminator.In other embodiments, order-checking is to use the connection method order-checking to carry out extensive parallel order-checking.Order-checking technique can comprise solid-phase amplification, and for example cluster amplification is as described at this elsewhere.
Therefore, in different embodiments, the DNA that is used for never repairing on solid surface prepares sequencing library and can comprise from sample for the method for NGS and obtain dna molecular; And the dna molecular of not repairing is carried out the consecutive steps that the dA tailing is connected with aptamer, and wherein the aptamer connection is carried out in solid phase.In certain embodiments, aptamer can comprise index sequence, to allow in single reaction container (for example flow cell passage) a plurality of samples being carried out multiple order-checking.As mentioned above, dna molecular can be the cfDNA molecule, and it can be the dna molecular from rna transcription, and it can be amplicon of dna molecular etc.
As indicated above, in different embodiments, these libraries preparation method is integrated in the method for determining such as the copy numbers such as dysploidy variation (CNV).Therefore, in certain embodiments, the method that the cfDNA that is used for never repairing on solid surface prepares sequencing library is integrated into for analyzing maternal sample to determine to exist or do not exist the method for fetal chromosomal aneuploidy.Therefore, in one embodiment, provide a kind of method for determining to exist or do not exist one or more fetal chromosomal aneuploidies, the method comprises: (a) acquisition comprises the maternal sample of the mixture of fetus and parent Cell-free DNA; (b) fetus is separated from described sample with the mixture of parent cfDNA; (c) mixture by fetus and parent cfDNA prepares sequencing library; Wherein prepare this library and comprise cfDNA is carried out the consecutive steps that the dA tailing is connected with aptamer, wherein prepare this library and do not comprise cfDNA is carried out end reparation, and preparation is to carry out at solid surface; (d) at least a portion in this sequencing library is carried out extensive parallel order-checking, in order to obtain the sequence information for fetus in the sample and parent cfDNA; (e) at least temporarily this sequence information is stored in a kind of computer-readable medium; (f) use the sequence information of this storage, identify the number of the sequence label of the normalization method sequence of each in the number of the sequence label of each in one or more interested karyomit(e)s and any or a plurality of interested karyomit(e) in the mode of calculating; (g) use the number of the sequence label of the normalization method sequence of each in the number of the sequence label of each in one or more interested karyomit(e)s and this or these the interested karyomit(e), calculate karyomit(e) dosage in this or these interested karyomit(e) each in mode of calculating; And (h) will for each karyomit(e) dosage in this or these interested karyomit(e) with compare for the respective threshold of each in this or these interested karyomit(e), and determine to exist or do not exist thus fetal chromosomal aneuploidy in sample, wherein the one or more treaters of the use of step (e)-(h) are carried out.Sample can be the biological fluid sample, for example blood plasma, serum, urine and saliva.In certain embodiments, sample is parent blood sample or its blood plasma and serum part.This method illustration is in example 4.
One go on foot-prepares in solid phase
In another embodiment, the DNA that does not repair is carried out the dA tailing, but before amplification, dA tailing product is not carried out purifying, like this so that the dA tailing, aptamer connects and the step of amplification continuously or consistently execution.Continuous dA tailing, aptamer connection and amplification before order-checking, subsequent purificn are referred to here as a step process.Single stage method can be carried out (referring to for example Fig. 3) at solid surface.First group of aptamer is attached to solid surface (1), will do not repair and is connected to the aptamer (2) of surface bonding step upper and that second group of aptamer is connected on the DNA (3) of surface bonding with the DNA of dA tail and can carry out as described in two-step approach as above.Yet, in single stage method, can the DNA of the surface bonding that connects aptamer be increased, be attached to simultaneously (4b among Fig. 2) on the solid surface.Subsequently, the gained library of the DNA of the connection aptamer that will produce at solid surface is separated and purifying (among Fig. 2 5), then as for the DNA of the connection aptamer that produces in solution is described carries out extensive parallel order-checking.In certain embodiments, order-checking is the extensive parallel order-checking of using by the synthesis method order-checking of reversible dyestuff terminator.In other embodiments, order-checking is the extensive parallel order-checking of using the connection method order-checking.
Therefore, in certain embodiments, provide a kind of for the preparation of the method for the sequencing library of NGS order-checking, the method comprises that by execution the step of the following carries out: obtain dna molecular from a sample; And to the consecutive steps that dna molecular carries out dA tailing, aptamer connection and amplification, wherein the aptamer connection is carried out at solid surface.As described in for two-step approach, in different embodiments, aptamer can comprise index sequence, to allow in single reaction container (for example flow cell passage) a plurality of samples being carried out multiple order-checking.
In certain embodiments, DNA can repair.Dna molecular can be the cfDNA molecule, and it can be the dna molecular from rna transcription, or dna molecular can be the amplicon of dna molecular.Aptamer connects to be carried out as mentioned above.The excessive aptamer that does not connect can be from the upper flush away of the DNA of fixing connection aptamer; The required reagent of amplification is added among the DNA of fixing connection aptamer, and this DNA stands many wheel amplifications, pcr amplification for example, as known in the art.In other embodiments, the DNA that connects aptamer is not increased.In situation about not increasing, the DNA that connects aptamer can remove from solid surface by chemistry or physical means (such as heat, UV-lamp etc.).In situation about not increasing, the aptamer that is connected to DNA can comprise the sequence (people such as Ku Zhariwa (Kozarewa), natural method (Nat Methods) 6:291-295[2009]) of the oligonucleotide hybridization that exists on the flow cell with sequenator.
In different embodiments, sample can be biological fluid sample (for example blood, blood plasma, serum, urine, brains liquid, amniotic fluid, saliva etc.).In certain embodiments, comprise for the cfDNA that on solid surface, never repairs with the method for determining to exist or do not exist fetal chromosomal aneuploidy and prepare the method for sequencing library as a step for analyzing maternal sample a kind of.
Therefore, in one embodiment, provide a kind of method for determining to exist or do not exist one or more fetal chromosomal aneuploidies, the method comprises: (a) acquisition comprises the maternal sample of the mixture of fetus and parent Cell-free DNA; (b) fetus is separated from described sample with the mixture of parent cfDNA; (c) mixture by fetus and parent cfDNA prepares sequencing library; Wherein prepare this library and comprise the consecutive steps that cfDNA is carried out dA tailing, aptamer connection and amplification, and wherein preparation is carried out at solid surface; (d) at least a portion in this sequencing library is carried out extensive parallel order-checking, in order to obtain the sequence information for fetus in the sample and parent cfDNA; (e) at least temporarily this sequence information is stored in a kind of computer-readable medium; (f) use the sequence information of this storage, with the mode of calculating identify in one or more interested karyomit(e)s each the number of sequence label and each the number of sequence label of normalization method sequence in any or a plurality of interested karyomit(e); (g) use in this or these interested karyomit(e) each the number of sequence label and each the number of sequence label of normalization method sequence in this or these interested karyomit(e), calculate karyomit(e) dosage in this or these interested karyomit(e) each in mode of calculating; And (h) will for each the karyomit(e) dosage in this or these interested karyomit(e) with compare for each the respective threshold in this or these interested karyomit(e), and determine to exist or do not exist thus fetal chromosomal aneuploidy in sample, wherein step (e)-(h) is to use one or more treaters to carry out.In certain embodiments, DNA is carried out end reparation.In other embodiments, prepare this library and do not comprise that cfDNA is carried out end to be repaired.This method illustration is in example 5 and 6.
Technique for the preparation of sequencing library is applicable to sample analysis method as mentioned above, include but not limited to for the method for determining copy number variation (CNV), comprise the method for being determined to exist or do not exist the polymorphism of any interested sequence by sample known or that suspect at least two genomic mixtures that its one or more interested sequences are different with being used in the sample neutralization that comprises the single-gene group.
May be on the solid phase or the amplification of the product of the connection aptamer that in solution, prepares, introduce and be connected in the template molecule of aptamer hybridizing required oligonucleotide sequence with the flow cell that exists in some NGS platforms or other surfaces.The content of amplified reaction is known to persons of ordinary skill in the art and comprises suitable substrate (for example dNTPs), enzyme (for example DNA polymerase) and the required buffer components of amplified reaction.Randomly, can save the amplification of the polynucleotide that connect aptamer.Generally, amplified reaction needs at least two amplimers, primer tasteless nucleotide for example, these primers can be identical or different and can be comprised " the aptamer specific part " that can be during annealing steps be annealed into the primer binding sequence in polynucleotide molecule to be amplified (if or template regard sub-thread as, its complement so).
In case form, can be used for the solid-phase nucleic acid amplification that some NGS platform may needs according to the library of the template of method preparation described above.As used in this, term " solid-phase amplification " refers on the solid support thing or at any nucleic acid amplification reaction that carries out explicitly with the solid support thing, so that the amplified production of all or a part is fixed on the solid support thing when it forms.In specific embodiment, solid state polymerization polymerase chain reaction (Solid phase PCR) and its solid phase isothermal duplication contained in this term, these reactions are to be similar to the reaction that standardized solution increases mutually, except the one or both of forward with reverse amplimer is fixed on the solid support thing.Solid phase PCR also comprises for example the following system: emulsion, one of them primer anchor to bead and another primer is in the free solution; Colony forms in the solid phase gel matrix, and one of them primer anchors to the surface and a primer is in the free solution.
In different embodiments, after the amplification, can analyze sequencing library by the microfluid capillary electrophoresis and not contain aptamer dimer or single-stranded dna to guarantee the library.The library of template polynucleotide molecule is particularly useful in the solid phase sequencing method.Except the template that is provided for solid phase sequencing and Solid phase PCR, the library template also is provided for the template of whole genome amplification.
The marker nucleic acid that is used for tracking and verification sample integrity
In different embodiments, can be by sample gene group nucleic acid (for example cfDNA) and the order-checking of for example having introduced the mixture of the marker nucleic acid of following in the sample before processing be come the integrity of verification sample and follow the trail of sample.
Marker nucleic acid can make up with specimen (biological example source sample) and for example stand to comprise the process of following one or more steps: the biogenetic derivation sample classification is separated, for example from whole blood sample obtain substantially acellular blood plasma part, from the biogenetic derivation sample (for example blood plasma) or the lower purification of nucleic acid of biogenetic derivation sample (for example tissue sample) that does not carry out fractional separation that carry out fractional separation and check order.In certain embodiments, order-checking comprises the preparation sequencing library.Unique through selecting to the source sample with the sequence of marker molecules of source sample combination or combined sequence.In certain embodiments, the unique tag thing molecule in the sample all has identical sequence.In other embodiments, the unique tag thing molecule in the sample is a plurality of sequences, for example two, three, four, five, six, seven, eight, nine, ten, 15,20 or more not homotactic combination.
In one embodiment, the integrity of sample can use a plurality of marker nucleic acid molecule with identical sequence to verify.Scheme as an alternative, the identity of sample can be used to be had at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 25, at least 30, at least 35, at least 40, at least 50 or more not homotactic a plurality of marker nucleic acid molecule and verifies.It is that the marker nucleic acid of unique sequence carries out mark with having in a plurality of specimen of institute's mark each that the integrity of verifying a plurality of biological samples (being two or more biological samples) needs in these two or more samples each.For instance, first sample can be with the marker nucleic acid marking with sequence A, and second sample can be with the marker nucleic acid marking with sequence B.Scheme as an alternative, first sample can be with a plurality of marker labeled nucleic acid molecules that all have sequence A, and the mixture mark of second sample available sequences B and C, and wherein sequence A, B and C have not homotactic marker molecules.
Marker nucleic acid can prepare in the library in any stage of the sample preparation of generation before (if preparation library) and the order-checking and is added in the sample.In one embodiment, marker molecules can with undressed source sample combination.For instance, marker nucleic acid can be provided to collect in the collection tube of blood sample.Scheme as an alternative, marker nucleic acid can add in the blood sample after blood drawing.In one embodiment, marker nucleic acid is added in order to collection of biological and learns in the container of fluid sample, and for example marker nucleic acid is added into to collect in the blood collection tube of blood sample.In another embodiment, marker nucleic acid is added in the part of biological fluid sample.For instance, marker nucleic acid is added in the blood plasma and/or serum part (for example Maternal plasma sample) of blood sample.In another embodiment again, marker molecules is added into the purified sample (for example from the nucleic acid samples of biological sample purifying).For instance, marker nucleic acid is added in the sample of purified parent and fetus cfDNA.Equally, marker nucleic acid can be added into before processing specimen in the examination of living tissue sample.In certain embodiments, marker nucleic acid can with send the carrier combinations of marker molecules in the cell of biological sample.The cell delivery vector comprises pH susceptibility liposome and cationic liposome.
In different embodiments, marker molecules has the antigene strand sequence, and these sequences are non-existent sequences in the genome of biological origin sample.In an exemplary embodiment, has non-existent sequence in human genome in order to the marker molecules of the integrity of verifying human biogenetic derivation sample.In an alternate embodiment, marker molecules has non-existent sequence in the source sample neutralizes any or a plurality of known group.For instance, have in human genome in order to the marker molecules of the integrity of verifying human biogenetic derivation sample and the mouse gene group in non-existent sequence.Replacement scheme allows checking to comprise the integrity of two or more genomic specimen.For instance, have in the genome of human genome and invasion and attack bacterium all the marker molecules of non-existent sequence and verify from being used by the integrity of the human Cell-free DNA sample that obtains the experimenter of pathogenic agent (for example bacterium) invasion and attack.The genomic sequence of many pathogenic agent (for example bacterium, virus, yeast, fungi, protozoon etc.), the public can obtain at World Wide Web ncbi.nlm.nih.gov/genomes.In another embodiment, marker molecules is the nucleic acid with non-existent sequence in any known group.The sequence of marker molecules can produce at random by algorithm.
In different embodiments, marker molecules can be naturally occurring thymus nucleic acid (DNA), Yeast Nucleic Acid or artificial nucleic acid analogue (nucleic acid mimics), and these artificial nucleic acid analogues comprise peptide nucleic acid(PNA) (PMA), morpholino nucleic acid, lock nucleic acid, glycol nucleic acid and threose nucleic acid (difference of itself and naturally occurring DNA or RNA is that molecular backbone chain changes) or do not have the dna analog of phosphodiester backbone.Thymus nucleic acid can come from naturally occurring genome and maybe can produce by using enzyme or synthesizing in the laboratory by solid state chemistry.Chemical process also can be in order to produce natural undiscovered dna analog.Phosphodiester bond is replaced, but the got DNA derivative that ribodesose keeps includes but not limited to have the dna analog of the main chain that forms by sulphur methylal〔Su〕 or methane amide key, and verified these stand-in are good structural DNA stand-in.Other dna analog comprises the morpholino derivative and comprises peptide nucleic acid(PNA) (PNA) based on the false peptide main chain of N-(2-aminoethyl) glycine (comment (Ann Rev Biophys Biomol Struct) 24:167-183[1995 in biophysics and biomolecular structure year]).PNA is very good DNA (or Yeast Nucleic Acid [RNA]) structural simulation thing, and the PNA oligomer can form very stable double-spiral structure with Wo Sen-Ke Like (Watson-Crick) complementary DNA and RNA (or PNA) oligomer, and it can also invade on the target that is attached in the duplex DNA by spiral (molecular biotechnology (Mol Biotechnol) 26:233-248[2004]).Structural simulation thing/the analogue that can be used as another good DNA analogue of marker molecules is phosphorothioate DNA, and one of them non-bridge joint oxygen is replaced by sulphur.This modification reduced comprise 5 ' to 3 ' and 3 ' to the effect of endonuclease and the exonuclease 2 of 5 ' DNA POL, 1 exonuclease, s1 nuclease and P1, rnase, serum nuclease and snake venom phosphodiesterase.
The length of marker molecules can be different from the length of sample nucleic acid or similar, and namely the length of marker molecules can be similar to the length of sample gene component, and perhaps it can be greater than or less than the length of sample gene component.The length of marker molecules is to measure by the Nucleotide that consists of marker molecules or the number of nucleotide analog base.Can use separation method as known in the art that marker molecules and source nucleic acid that length is different from the sub-length of sample gene component are distinguished out.For instance, the difference in length of marker and sample nucleic acid molecule can be by measuring such as electrophoretic separation such as capillary electrophoresis.Size distinction may be conducive to quality to marker nucleic acid and sample nucleic acid and quantize and evaluate.Preferably, marker nucleic acid is shorter than genomic nucleic acids, and length is enough to get rid of it and is mapped to the sample gene group.For instance, uniqueness is mapped to the human sequence that human genome needs 30 bases.Therefore, in certain embodiments, the marker molecules that is used for the order-checking biological test of human sample should be at least 30bp length.
The selection of marker molecules length is mainly by determining in order to the sequencing technologies of checking source sample integrity.It is also conceivable that the length of the sample gene group nucleic acid that checks order.For instance, some sequencing technologies adopts the clonal expansion of polynucleotide, and it can require to treat that the genome polynucleotide that increase in clone's mode have minimum length.For instance, use Yi Luna GAII sequential analyser to check order to comprise that clonal expansion exsomatizes as the bridge-type PCR (also claiming the cluster amplification) of the polynucleotide of 110bp by minimum length, aptamer is connected on these polynucleotide, to provide at least 200bp that increases in clone's mode and less than nucleic acid and the order-checking of 600bp.In certain embodiments, the length of the marker molecules of connection aptamer is between about 200bp and about 600bp, between about 250bp and the 550bp, between about 300bp and the 500bp or between about 350 and 450.In other embodiments, the length of the marker molecules of connection aptamer is about 200bp.For instance, when the fetus cfDNA that exists in to maternal sample checked order, the length of selectable marker molecule was the length that is similar to fetus cfDNA molecule.Therefore, in one embodiment, be used in and comprise cfDNA in the maternal sample is carried out extensive parallel order-checking with the length of the marker molecules in the check of determining to exist or do not exist fetal chromosomal aneuploidy approximately 150bp, about 160bp, 170bp, about 180bp, about 190bp or about 200bp; Marker molecules is about 170bp preferably.Wait other sequence measurements to use emulsion-based PCRs cloning mode DNA amplification molecule for order-checking such as SOLiD order-checking, polysaccharase cloning and sequencing (Polony Sequencing) and 454 order-checkings, and each technology has all been stipulated minimum and the maximum length of molecule to be amplified.The length that is the marker molecules to be checked order of the nucleic acid form that increases in clone's mode can reach about 600bp.In certain embodiments, the length of marker molecules to be checked order can be greater than 600bp.
The single-molecule sequencing technology that does not adopt the molecular cloning amplification and can check order to the nucleic acid in extremely wide template length range does not require all that in most of situation molecule to be checked order has any length-specific.Yet the sequence productive rate of per unit mass depends on the number of 3 ' terminal hydroxy group, and therefore having relatively short template is more more effective than having long template for order-checking.If from the nucleic acid of being longer than 1000nt, these nucleic acid should be clipped to so generally 100 to 200nt mean length, in order to can produce more sequence information from the nucleic acid of equal in quality.Therefore, the length of marker molecules can be in tens bases in thousands of base scopes.The length that is used for the marker molecules of single-molecule sequencing can reach about 25bp, reaches about 50bp, reaches about 75bp, reaches about 100bp, reaches about 200bp, reaches about 300bp, reaches about 400bp, reaches about 500bp, reaches about 600bp, reaches about 700bp, reaches about 800bp, reaches about 900bp, reach about 1000bp or more.
The length of selecting to be used for marker molecules is also determined by the length of the genomic nucleic acids that checks order.For instance, cfDNA circulates in human blood flow as the genomic fragment of cell genomic dna.The fetus cfDNA molecule of finding in pregnant woman blood plasma is generally than parent cfDNA molecule short (people such as old (Chan), clinical chemistry (Clin Chem) 50:8892[2004]).The size fractional separation of circulation foetal DNA is verified, the mean length<300bp of circulation foetal DNA fragment, and estimate mother body D NA (people such as Lee (Li), clinical chemistry, 50:1002-1011[2004]) between about 0.5Kb and 1Kb.These are found with to use NGS to determine that fetus cfDNA seldom surpasses the people's such as model (Fan) of 340bp (people such as model, clinical chemistry 56:1279-1286[2010]) discovery consistent.Use based on the standard method of silica and formed by two portions from the DNA that urine separates: derive from the high-molecular-weight DNA of cast-off cells and partly (baud is pricked the people such as figure through the lower molecular weight (150-250 base pair) of kidney DNA (Tr-DNA), clinical chemistry 46:1078-1084,2000; And Su Dengren, molecular diagnostics magazine 6:101-107,2004).The technology for separate acellular nucleic acid from body fluid of newly-developed shows many (U.S. Patent Application Publication No. 20080139801) that the DNA that exists in the urine and RNA fragment are shorter than 150 base pairs in separation through the application of kidney nucleic acid.Be that the marker molecules of selection can roughly reach the length of cfDNA in the embodiment of the genomic nucleic acids that checks order at cfDNA.For instance, being the length nucleic acid form, that be used for the marker molecules of parent cfDNA sample to be checked order that mononucleotide molecular form or be increases in clone's mode can be between about 100bp and 600.In other embodiments, sample gene group nucleic acid is more macromolecular fragment.For instance, the sample gene group nucleic acid that checks order is into the cell DNA of fragment.In the embodiment that the cell DNA that becomes fragment is checked order, the length of marker molecules can reach the length of dna fragmentation.In certain embodiments, the length of marker molecules be at least sequence reading uniqueness is mapped to suitable to the needed minimum length of genome.In other embodiments, the length of marker molecules is to get rid of marker molecules to be mapped to sample with reference to the needed minimum length of genome.
In addition, marker molecules can be used for verifying the sample of not testing by nucleic acid sequencing and can pass through common biotechnology (PCR in real time) checking except order-checking.
Sample contrast (the process positive control that for example is used for order-checking and/or analyzes)
In different embodiments, for example the marker sequence in the above-described introducing sample can be served as positive control, with checking order-checking and with accuracy and the effectiveness of post-treatment and analysis.
Therefore, provide composition and the method that is used for providing positive control in the process that sample DNA is checked order (IPC).In certain embodiments, provide and be used for the positive control that the cfDNA to the sample that comprises the genome mixture checks order.IPC can be used for never on the same group in the sample (for example in the difference order-checking batch at sample that different time checks order) baseline shift of the sequence information of acquisition be associated.Therefore, for instance, IPC can be associated the sequence information that obtains for the parent specimen with the sequence information that obtains from a combination lattice sample that checks order at different time.
Equally, in the situation of fragment analysis, IPC can will be associated with the sequence (similar sequence) that obtains from a combination lattice sample that checks order at different time for the sequence information that concrete fragment obtains from the experimenter.In certain embodiments, IPC can will be associated with the sequence information that obtains from combination lattice samples (for example from known amplification/disappearance etc.) for the sequence information that concrete cancer related gene seat obtains from the experimenter.
In addition, IPC can be used as the marker of following the trail of sample in the order-checking process.IPC can also provide the qualitative positive sequence dose value (for example NCV) of interested chromosomal one or more dysploidy (for example trisomy 21,13 trisomys, 18 trisomys) more appropriate deciphering to be provided and to guarantee the reliability of the data and accuracy.In certain embodiments, can set up the IPC that comprises from the genomic nucleic acid of masculinity and femininity, so that the dosage of chromosome x and Y in the maternal sample to be provided, thereby determine the whether male sex of fetus.
The type that contrasts in the process and number depend on type or the character of required test.For instance, for the test that need to check order to determine whether to exist to the DNA from the sample that comprises the genome mixture chromosomal aneuploidy, contrast can comprise the DNA that comprises the specimen acquisition of identical chromosomal aneuploidy from known in the process.In certain embodiments, IPC comprises the DNA that comprises the sample of interested chromosomal aneuploidy from known.For instance, in order to determine in maternal sample, existing or the do not exist IPC of the test of fetus trisomy (for example trisomy 21) to comprise the DNA that obtains from the individuality with trisomy 21.In certain embodiments, IPC comprises the mixture of the DNA that obtains from two or more individualities with different dysploidy.For instance, in order to determine there being or not existing the test of 13 trisomys, 18 trisomys, trisomy 21 and X monosomy, IPC comprises the combination of the DNA sample that obtains from the pregnant woman of the fetus of carrying separately one of test trisomy.Except complete chromosomal aneuploidy, can be established as the IPCs that positive control is provided in order to the test that determine to have or do not exist the part dysploidy.
The IPC that serves as for detection of the contrast of single dysploidy can set up with the mixture of the cell genomic dna that obtains from two experimenters, and one of them experimenter is the genomic donor of aneuploid.For instance, can be by making up to set up from the sex experimenter's who carries this trisomy chromosome genomic dna and the known genomic dna that does not carry the female subjects of this trisomy chromosome as the IPC in order to the contrast of the test of determining fetus trisomy (for example trisomy 21).Genomic dna can extract from two experimenters' cell, and shear to provide about 100bp between the 400bp, about 150bp between the 350bp or about 200bp to the fragment between the 300bp with the circulation cfDNA fragment in the simulation maternal sample.Selection, comprises and comprises about 5%, about 10%, about 15%, about 20%, about 25%, about 30% the IPC from the DNA mixture of the one-tenth fragment of the experimenter's who carries this dysploidy DNA and provide so that the ratio of the circulation fetus cfDNA that simulation is found in maternal sample from the ratio of the DNA of the experimenter's who carries dysploidy (trisomy 21) one-tenth fragment.This IPC can comprise the DNA from the different experimenters that carry separately different dysploidy.For instance, IPC can comprise about 80% not ill women DNA, and to remain 20% can be DNA from three different experimenters that carry separately a kind of trisomy chromosome 21, trisomy chromosome 13 and trisomy chromosome 18.The mixture of the DNA of preparation section type is used for order-checking.The mixture of the DNA that becomes fragment processed can comprise the preparation sequencing library, this sequencing library can use any extensive parallel mode with single channel or multiplex mode order-checking.The stoste of genome IPC can be stored and is used for a plurality of diagnostic tests.
Scheme as an alternative, IPC can be with setting up from the known cfDNA that obtains mother of the fetus with known chromosomal aneuploidy that carried.For instance, cfDNA can obtain from the pregnant woman who carries the fetus with trisomy 21.CfDNA extracts from maternal sample, and is cloned in the bacteria carrier and in bacterium and grows, so that continual IPC source to be provided.Can use restriction enzyme that DNA is extracted from bacteria carrier.Scheme as an alternative, clone's cfDNA can be by for example pcr amplification.Can process IPC DNA, with order-checking in identical from the cfDNA of the specimen that has or do not exist a chromosomal aneuploidy to be analyzed batch.
Although more than described the foundation of IPC with respect to trisomy, should be appreciated that, can set up the IPC that reflection comprises other part dysploidy of for example different fragment amplifications and/or disappearance.Therefore, for instance, in known different cancer and situation that concrete amplification is associated (for example breast cancer is associated with 20Q13), can set up the IPCs that has merged those known amplifications.
Sequence measurement
As noted above, the part as the program of differentiating the copy number variation checks order to prepared sample (for example, sequencing library).Can utilize in the multiple sequencing technologies any.
Some sequencing technologies commercially can be buied, such as (the Sani Wei Er of A Feimei company, CA) (Affymetrix Inc. (Sunnyvale, CA)) hybrid method order-checking platform and 454 life science (Bradfords, CT) (454 Life Sciences (Bradford, CT)), her Rumi/Suo Lekesa (Hayward, CA) (Illumina/Solexa (Hayward, and (the Cambridge of Cohan bio-science in the sea CA)), MA) (Helicos Biosciences (Cambridge, MA)) synthesis method order-checking platform, and applying biological system (Foster city, CA) (Applied Biosystems (Foster City, CA)) connection method order-checking platform, as mentioned below.Except the single-molecule sequencing that the synthetic sequencing of using nautical mile Cohan bio-science is carried out, other single-molecule sequencing technology include but not limited to the SMR of Pacific Ocean bio-science (Pacific Biosciences) TTM technology, ION TORREN TTM technology and the nanoporous sequencing of Oxford nanoporous technology (OxfordNanopore Technologies) exploitation for example.
Although the Sang Geer method of automatization (Sanger method) is considered to ' first-generation ' technology, also can use the Sang Geer sequencing that comprises automatization Sang Geer sequencing in the method described herein.Suitable sequence measurement in addition includes but not limited to the nucleic acid imaging technique, for example atomic force microscope (AFM) or transmission electron microscopy (TEM).Schematically sequencing technologies is described in greater detail in hereinafter.
Schematic but in the nonrestrictive embodiment at one, method described herein comprises (for example uses real single-molecule sequencing (tSMS) technology of nautical mile Cohan, the people such as Harris T.D. (Harris T.D.), science (Science) 320:106-109[2008] described in) this single-molecule sequencing technology obtains the sequence information of the nucleic acid in the specimen, for example the cfDNA in the maternal sample, for the experimenter's of cancer institute examination cfDNA or cell DNA etc.In the tSMS technology, the DNA sample splits into has roughly 100 thighs to 200 Nucleotide, and many A sequence is added to 3 ' end of each DNA thigh.Each personal share is by adding in addition mark of fluorescently-labeled adenosine nucleoside acid.Then make the hybridization of DNA thigh and flow cell, flow cell contains millions of the few T catch sites that are fixed to the flow cell surface.In certain embodiments, template density can be about 100,000,000 template/cm 2Then flow cell is loaded in the instrument, for example HeliScope TMSequenator, and the laser radiation flow cell is surperficial, thus show the position of each template.Ccd video camera can be measured template in the lip-deep position of flow cell.Then the template fluorescent mark divides and washes off.Sequencing reaction begins by introducing DNA polymerase and fluorescently-labeled Nucleotide.Few T nucleic acid serves as primer.Polysaccharase makes the Nucleotide of institute's mark be attached in the primer in the template-directed mode.Remove polysaccharase and unconjugated Nucleotide.Guide the template of the combination of fluorescently-labeled Nucleotide to distinguish by flow cell surface imaging.After the imaging, step toward division has been removed fluorescent mark, and other fluorescently-labeled Nucleotide are repeated this program, until obtain the desirable length that reads.Utilize each Nucleotide to add the collection step sequence information.Carry out the amplification that PCR-based can get rid of or typically avoid in preparation to genome sequencing during sequencing library by the single-molecule sequencing technology, and these methods allow direct measure sample, but not measure the copy of that sample.
In another schematic but nonrestrictive embodiment, method described herein comprises (for example uses 454 sequencing (Roche), agate Gulass M. (Margulies, the people such as M.), nature (Nature) 437:376-380[2005] described in) obtain the sequence information of the nucleic acid in the specimen, for example the cfDNA in the parent specimen, for the experimenter's of cancer institute examination cfDNA or cell DNA etc.454 sequencing typically comprise two steps.The first step cuts into DNA and has the roughly fragment of 300 to 800 base pairs, and these fragments are blunt end.Then oligonucleotide aptamer is connected to the end of fragment.Aptamer serves as the primer of fragment amplification and order-checking.Fragment can be used the aptamer B that for example contains 5 '-biotin label to attach to DNA and catch on the bead, for example is coated with the bead of streptavidin.The fragment that attaches on the bead is carried out pcr amplification in O/w emulsion drips.Dna fragmentation the multiple copy on each bead of result for increasing in clone's mode.Second step catches bead in hole (for example, skin rises the hole of size).The parallel tetra-sodium that carries out of each dna fragmentation is checked order.Add one or more Nucleotide and produce optical signal, this optical signal is recorded to by ccd video camera in the order-checking instrument.The Nucleotide number of strength of signal and combination is proportional.The tetra-sodium sequencing is to utilize tetra-sodium (PPi) can break away from when Nucleotide adds.PPi is converted into ATP by the ATP sulfurylase in the presence of adenosine 5 ' phosphoric acid vitriol.Luciferase uses ATP that fluorescein is converted into oxyluciferin, and this reaction produces light, measures this light and is analyzed.
In another schematic but nonrestrictive embodiment, method described herein comprises the sequence information that uses SOLiDTM technology (Applied Biosystems, Inc. (Applied Biosystems)) to obtain the nucleic acid in the specimen, for example the cfDNA in the parent specimen, for the experimenter's of cancer institute examination cfDNA or cell DNA etc.In SOLiDTM connects sequencing, genomic dna is cut into fragment, and aptamer is attached to 5 ' end of fragment and 3 ' end with generation fragment library.Scheme as an alternative, aptamer in can following introducing: the 5 ' end and 3 ' that aptamer is connected to fragment is held, make fragment Cheng Huan, digest this one-tenth ring plate section producing interior aptamer, and aptamer is attached to 5 ' end of gained fragment and 3 ' end matches the library to produce.Next, preparation clone bead group in the microreactor that contains bead, primer, template and PCR component.After PCR, template sex change and enrichment bead are had the template that has increased with separation bead.Template on the bead of selecting is carried out 3 ' modify, be attached on the slide glass allowing.The sequentially hybridization of the base (or base pair) that can measure by the part random oligonucleotide and by the center that concrete fluorophore is differentiated be connected to measure sequence.After the record color, with the division of the oligonucleotide that connects and remove, then repeat this process.
In another schematic but nonrestrictive embodiment, method described herein comprise use Pacific Ocean Biological Science Co., Ltd unit molecule in real time (SMRTTM) sequencing technologies obtain the sequence information of the nucleic acid in the specimen, for example the cfDNA in the parent specimen, for the experimenter's of cancer institute examination cfDNA or cell DNA etc.In the SMRT sequencing, at DNA between synthesis phase, imaging is carried out in the continuous combination of the Nucleotide of dye marker.Single DNA polymerase molecule attaches to the basal surface of the independent null mode wavelength detecting (ZMW detector) that has obtained sequence information, and the Nucleotide that phosphoric acid connects just is being combined into the primer strand of growth.The ZMW detector comprises closed structure, and it allows to observe single Nucleotide take the fluorescent nucleotide of (for example microsecond) rapid diffusion outside the ZMW scope as background and passes through the combination of DNA polymerase.Nucleotide is combined into the growth thigh typically needs several milliseconds.During this period, fluorescent mark is excited and produces fluorescent signal, and makes the fluorescence labels division.Which base measures corresponding dye fluorescence has indicated combined.Repeat this process to obtain sequence.
In another schematic but nonrestrictive embodiment, method described herein comprises (for example uses the nanoporous sequencing, GV and Mai Le A. in the rope, clinical chemistry (Clin Chem) 53:1996-2001[2007]) obtain the sequence information of the nucleic acid in the specimen, for example the cfDNA in the parent specimen, for the experimenter's of cancer institute examination cfDNA or cell DNA etc.Nanoporous sequenced dna analytical technology is by the exploitation of a plurality of companies, comprise for example Oxford nanoporous technology company (England Oxford city) (OxfordNanopore Technologies (Oxford, United Kingdom)), this stalwart imperial company (Sequenom), Na Bosi company (NABsys) etc.The nanoporous sequencing is the single-molecule sequencing technology, wherein when unique DNA passes nanoporous directly to its order-checking.Nanoporous is aperture, and its diameter typically is about 1 nanometer.Nanoporous immersed in the conductive fluid and across it apply current potential (voltage), because ionic conduction produces Weak current by nanoporous.The magnitude of current that flows through is responsive to the size and dimension of nanoporous.When dna molecular passed through nanoporous, each nucleotide pair nanoporous on the dna molecular caused obstruction in various degree, thereby made the current magnitude generation variation in various degree by nanoporous.Therefore, this variation of the electric current that occurs during by nanoporous when dna molecular provides the reading of dna sequence dna.
In another schematic but nonrestrictive embodiment, method described herein comprises (for example uses chemosensitivity field-effect transistor (chemFET) array, described in the U.S. Patent Application Publication No. 2009/0026082) obtain the sequence information of the nucleic acid in the specimen, for example the cfDNA in the parent specimen, for the experimenter's of cancer institute examination cfDNA or cell DNA etc.In an example of this technology, can put into reaction chamber to dna molecular, and can make template molecule and the sequencing primer hybridization that is attached on the polysaccharase.One or more triphosphates are combined into new nucleic acid thigh at sequencing primer 3 ' end and can be distinguished with curent change by chemFET.An array can have a plurality of chemFET sensors.In another example, can make mononucleotide attach to bead, and can be on bead amplification of nucleic acid, and independent bead can be transferred in the independent reaction chamber on the chemFET array, wherein each chamber has the chemFET sensor, and can check order to nucleic acid.
In another embodiment, the inventive method comprises the sequence information that utilizes the Hall health molecular engineering (Halcyon Molecular ' s technology) that uses transmission electron microscopy (TEM) to obtain the nucleic acid in the specimen, for example cfDNA in the parent specimen.Being called independent molecule settles the method for rapid nano transmission (IMPRNT) to comprise: utilize monatomic resolving power transmission electron microscope that high molecular (150kb or the larger) DNA through heavy atom marker selected marker is carried out imaging, and make these molecules with consistent base to the base spacing, be arranged on the ultrathin film with the parallel array of highly dense (the 3nm thigh is to thigh).Electron microscope is used for to the molecular imaging on the film with the position of measuring the heavy atom marker and the base sequence information of extracting DNA.The method is further described among the PCT patent disclosure WO 2009/046445.The method allows ten minutes sequences with the complete human genome of interior mensuration.
In another embodiment, the dna sequencing technology is ionic current (Ion Torrent) single-molecule sequencing method, and it cooperates semiconductor technology with chemical code information (A, C, G, T) is directly changed into the numerical information (0,1) on the semi-conductor chip with simple order-checking chemical technology.In fact, when Nucleotide was combined into the DNA thigh by polysaccharase, hydrogen ion discharged as by product.Ionic current is the high density arrays that uses little machining hole, carries out this biological process with extensive parallel mode.Each pore volume is received different dna moleculars.The below, hole is the ion-sensitive layer, and ion-sensitive layer below is ionization sensor.When adding Nucleotide (for example C) to dna profiling, when then being combined into the DNA thigh, with release hydrogen ions.The electric charge of that ion will change the pH value of solution, and this can detect by the ionization sensor of ionic current (Ion Torrent).Sequenator (being essentially in the world minimum solid-state PH meter) reads base (from chemical information directly to numerical information).Ion human genome machine (PGMTM) sequenator is then with Nucleotide bump chip one by one.Do not mate if impact the next Nucleotide of chip, then can not be recorded to voltage change and can not be determined base.If there are two identical bases on the DNA thigh, then voltage can double, and chip can record two identical bases that are determined.Direct-detection can record the Nucleotide combination in the several seconds.
In another embodiment, the inventive method comprises the sequence information that uses Sequencing by hybridization to obtain the nucleic acid in the specimen, for example cfDNA in the parent specimen.Sequencing by hybridization comprises makes a plurality of polynucleotide sequences contact with a plurality of polynucleotide probes, wherein each in a plurality of polynucleotide probes randomly mooring to substrate.Substrate may be the flat surfaces that comprises known nucleotide sequence array.Can be used for the polynucleotide sequence that exists in the working sample with the pattern of this hybridization array.In other embodiments, each probe mooring to bead, magnetic bead etc. for example.Can measure with the hybridization of bead and be used for differentiating a plurality of polynucleotide sequences in the sample.
In another embodiment, the inventive method comprises uses the synthetic sequencing of Yi Lumina (Illumina) and based on the order-checking chemical technology of reversible terminator (for example, the people such as Bentley (Bentley), nature (Nature) 6:53-59[2009] described in), by millions of dna fragmentations being carried out the sequence information that extensive parallel order-checking obtains the nucleic acid in the specimen, for example cfDNA in the parent specimen.Template DNA can be genomic dna, for example cfDNA.In certain embodiments, the genomic dna of institute's isolated cell is used as template, and its fragmentation is become the length of a hundreds of base pair.In other embodiments, cfDNA is used as template, and because cfDNA exists as the short-movie section, so do not require fragmentation.For instance, fetus cfDNA is as the length fragment of 170 base pairs (the bp) (people such as model (Fan) of circulating in blood flow roughly, clinical chemistry (Clin Chem) 56:1279-1286[2010]), and before order-checking, do not require dna fragmentation.The genomic dna that the Yi Lumina sequencing technologies depends on into fragment is attached on the optical clear flat surfaces of oligonucleotide anchor institute combination.The template DNA end produces 5 through reparation '-the phosphorylation blunt end, and the polymerase activity of Klenow fragment (Klenow fragment) is used for making single A base to add 3 of blunt end phosphorylated cdna fragment ' end to.This interpolation has prepared the dna fragmentation that is used for being connected on the oligonucleotide aptamer, and these fragments have single T base overhang to improve joint efficiency at its 3 ' end.Aptamer oligonucleotide and flow cell anchor are complementary.Under restricted diluting condition, the sub-thread template DNA of modifying through aptamer added in the flow cell and by hybridization to being fixed on the anchor.Extend and dna fragmentation that the bridge-type amplification is attached has hundreds of millions clumps super-high density order-checking flow cell with foundation, each clump contains the same template of 1,000 copy of having an appointment.In one embodiment, become at random the genomic dna (for example cfDNA) of fragment before standing the cluster amplification, to use PCR to be increased.Scheme as an alternative, use the genomic library preparation without amplification, and use separately cluster TRAP (people such as Gao Nawa (Kozarewa), natural method (Nature Methods) 6:291-295[2009]) enrichment to become at random the genomic dna of fragment, for example cfDNA.Utilization has used the synthetic sequencing technologies of the reliable four look DNA with the reversible terminator that can remove fluorescence dye that template is checked order.Use laser excitation and total internal reflection Optical devices to obtain the highly sensitive fluoroscopic examination.About 20bp is compared through the reference genome that repeated fragment covers to the short sequence reading contrast of 40bp (for example 36bp), and differentiate that with the data analysis pipeline software of special exploitation short sequence reading is to the genomic unique mapping of reference.The reference genome that can also use non-repeated fragment to cover.The reference genome that no matter uses repeated fragment to cover also is the reference genome that non-repeated fragment covers, and only is mapped to reference to genomic reading counting unique.After reading and finish the first time, can be with the template in-situ regeneration in order to can carry out reading the second time from the end opposite of fragment.Therefore, can use the single-ended of dna fragmentation or the order-checking of pairing end.Carry out the part order-checking to being present in dna fragmentation in the sample, and to the reading that comprises predetermined length (for example 36bp), be mapped to the genomic sequence label of known reference and count.In one embodiment, classify the NCBI36/hg18 sequence as with reference to genome sequence, can it be at World Wide Web genome.ucsc.edu/cgi-bin/hgGateway? org=Human﹠amp; Db=hg18﹠amp; Hgsid=166260105 obtains.Scheme is classified GRCh37/hg19 as with reference to genome sequence as an alternative, and it can obtain at World Wide Web genome.ucsc.edu/cgi-bin/hgGateway.Other common sequence information sources comprise GenBank, dbEST, dbSTS, EMBL (European Molecular Bioglogy Laboratory (European Molecular BiologyLaboratory)) and DDBJ (Japanese DNA database).There is multiple computerized algorithm for aligned sequences, to include but not limited to BLAST (people such as Ao Ciqiu (Altschul), 1990), BLITZ (MPsrch) (Si Teluoke and Collins (Sturrock; Collins), 1993), FASTA (inferior and Lippmann (the Person ﹠amp of pul; Lipman), 1988), the BOWTIE (people such as youth's lattice rice (Langmead), genome biology (Genome Biology) 10:R25.1-R25.10[2009]) or ELAND (Yi Lumina company, San Diego, CA, USA (Illumina, Inc., San Diego, CA, USA)).In one embodiment, one end of the copy that increases in clone's mode of blood plasma cfDNA molecule is checked order and processed by the information biology compare of analysis of Yi Lumina genome analysis instrument (Illumina Genome Analyzer), and Yi Lumina genome analysis instrument uses RiboaptDB (ELAND) software of extensive efficient comparison.
In some embodiment of said method, the sequence label that shines upon comprises the sequence reading of about 20bp, about 25bp, about 30bp, about 35bp, about 40bp, about 45bp, about 50bp, about 55bp, about 60bp, about 65bp, about 70bp, about 75bp, about 80bp, about 85bp, about 90bp, about 95bp, about 100bp, about 110bp, about 120bp, about 130bp, about 140bp, about 150bp, about 200bp, about 250bp, about 300bp, about 350bp, about 400bp, about 450bp or about 500bp.Estimate that technical progress can realize the single-ended reading greater than 500bp, when producing pairing end reading, can realize the reading greater than about 1000bp.In one embodiment, the sequence label that shines upon comprises 36bp sequence reading.Determine that by comparing sequence label and reference sequences nucleic acid (for example cfDNA) the karyomit(e) starting point of molecule that checks order can obtain the mapping of sequence label, and do not need concrete genetic sequence information.The mispairing of less degree (0 to 2 mispairing of each sequence label) can the explanation reference genome and biased sample in genome between the small polymorphism that may exist.
Every kind of sample typically obtains a plurality of sequence labels.In certain embodiments, utilize reading to be mapped to reference to genome, every kind of sample has obtained at least about 3 * 10 6Individual sequence label, at least about 5 * 10 6Individual sequence label, at least about 8 * 10 6Individual sequence label, at least about 10 * 10 6Individual sequence label, at least about 15 * 10 6Individual sequence label, at least about 20 * 10 6Individual sequence label, at least about 30 * 10 6Individual sequence label, at least about 40 * 10 6Individual sequence label, at least about 50 * 10 6Individual sequence label, these sequence labels comprise the reading of (for example 36bp) between 20bp and the 40bp.In one embodiment, all sequences reading is mapped to reference to genomic All Ranges.In one embodiment, count being mapped to reference to the label of genomic All Ranges (for example all karyomit(e)s), and measure the CNV (that is, excessively representing or represent deficiency) of interested sequence (for example karyomit(e) or its part) in the hybrid dna sample.The method does not require between two genomes makes differentiation.
Whether correct to determine to exist in the sample or lacks the necessary accuracy of CNV (for example dysploidy) be variation (variability between the sequence) judgement that is mapped to the genomic sequence label number of reference with reference to genomic sequence label number according to being mapped in the order-checking operation in the variation (interchromosomal variability) of each sample room and different order-checking operations.For instance, being mapped to the variation of the label of rich GC or poor GC reference sequences may be remarkable especially.Other variations can be because using different nucleic acid extraction with purification scheme, preparation sequencing library and using different order-checking platforms to cause.The inventive method is according to using sequence dosage (karyomit(e) dosage or section dosage) to the understanding of normalization method sequence (normalization method chromosome sequence or normalization method sector sequence), thereby explains in itself because of the naturally variability of increase due to the variability relevant with platform of variability (between round) between interchromosomal variability (with batch) and sequence.Karyomit(e) dosage is based on the understanding to the normalization method chromosome sequence, and the normalization method chromosome sequence can comprise monosome, or comprises that two or more are selected from the karyomit(e) of karyomit(e) 1 to 22, X and Y.Scheme as an alternative, the normalization method chromosome sequence can comprise the monosome section, or comprises a karyomit(e) or two or more chromosomal two or more sections.Section dosage is based on the understanding to the normalization method sector sequence, and the normalization method sector sequence can comprise any chromosomal single section, or comprises any two or more chromosomal two or more sections among karyomit(e) 1 to 22, X and the Y.
The substance order-checking
Fig. 4 has showed the schema of an embodiment of the method, and wherein the source sample nucleic acid with marker nucleic acid and monocyte sample makes up to analyze genetic abnormality, determines simultaneously the integrity of biological cosmogony sample.In step 410, obtained to comprise the biological cosmogony sample of genomic nucleic acids.In step 420, with marker nucleic acid and biological cosmogony sample combination and obtain the marker sample.The sequencing library of the source sample gene group nucleic acid that preparation is increased in clone's mode in step 430 and the mixture of marker nucleic acid, and in step 440, the library is checked order the order-checking information that provides relevant with marker nucleic acid with the sample source genomic nucleic acids with extensive parallel mode.Extensive parallel sequence measurement provides the order-checking information about the sequence reading, these sequence readings be mapped to one or more with reference to genome to produce the sequence label that can analyze.In step 450, analyze all order-checking information, and in step 460, according to the order-checking information relevant with marker molecules, the integrity of check source sample.Check source sample integrity is to finish in the order-checking information of the marker molecules that step 450 obtains and in the consistence that step 420 is added between the known array of the marker molecules in the original source sample by determining.Can be to a plurality of sample application identical process that check order respectively, wherein each sample comprises the molecule with the exclusive sequence of this sample, namely sample is with unique marker molecules mark, and itself and the flow cell of sequenator or other samples in the slide glass are separated order-checking.If the check sample integrity then can be analyzed the order-checking information relevant with sample gene group nucleic acid, with the relevant information of the situation that the experimenter who is derived from the source sample for example is provided.For instance, if the check sample integrity is then analyzed the order-checking information relevant with genomic nucleic acids to determine to exist or do not exist chromosome abnormalty.If the check sample integrity is not then considered order-checking information.
Method depicted in figure 4 also is applicable to comprise the biological analysis of unit molecule being carried out the substance order-checking, the for example BASE of the SMRT of the tSMS of nautical mile Cohan, Pacific Ocean bio-science, Oxford nanoporous, and other technology, such as the technology that IBM proposes, it does not require the preparation library.
Multiple order-checking
The sample that a large amount of sequence readings that the every batch of order-checking operation can obtain allow to be combined is analyzed, i.e. multiple analysis, and it has maximized the order-checking ability and has reduced workflow.For instance, the extensive parallel order-checking of using eight swimming lane flow cells of Yi Lumina genome analysis instrument that eight libraries are carried out can multiplely be carried out with to two or more samples order-checkings in each swimming lane, so as in single operation to 16,24,32 etc. or more sample check order.A plurality of samples are carried out parallel order-checking (that is, multiple order-checking) to be required during the sequencing library preparation sample specificity index sequence (also being called barcode) to be merged.The order-checking index is unique base sequence of about 5, about 10, about 15, about 20 of adding of 3 ' end at genomic nucleic acids and marker nucleic acid, about 25 or more bases.Multiplicated system can check order to hundreds of biological samples in single batch of order-checking operation.Can prepare the sequencing library of indexing and the sequence that increases in clone's mode is checked order being used for by index sequence being incorporated into one of PCR primer for cluster amplification.Scheme as an alternative, index sequence can be incorporated in the aptamer, is connected to cfDNA before pcr amplification.The index library that is used for single-molecule sequencing can be by setting up in 5 ' end merging index sequence of the 3 ' end that is positioned at marker and genome molecule or the interpolation sequence (for example adding many A tail in order to use tSMS to carry out single-molecule sequencing) required with the hybridization of flow cell anchor.To uniquely tagged and the nucleic acid of indexing check order provide differentiate merge the index sequence information of the sample in the sample library, and the sequence information of marker molecules makes order-checking information and the sample source of genomic nucleic acids interrelated.In the embodiment of (that is, substance order-checking) that a plurality of samples are checked order separately, only need to modify the marker of each sample and genomic nucleic acids molecule to comprise the aptamer sequence and to get rid of index sequence by the order-checking platform as required.
Fig. 5 provides the schema of the embodiment 500 of the method that is used for the check sample integrity, and these samples are carried out the multiple order-checking biological analysis of multi-step, that is, checked order with the nucleic acid combination of individual samples and as complex mixture.In step 510, obtain a plurality of biological cosmogony samples, each sample comprises genomic nucleic acids.In step 520, with uniquely tagged thing nucleic acid and each biological cosmogony sample combination and obtain a plurality of uniquely tagged samples.In step 530, for the sequencing library of each uniquely tagged sample preparation sample gene group nucleic acid and marker nucleic acid.The predetermined library preparation of carrying out the sample of multiple order-checking comprises incorporates in the marker nucleic acid of sample and each uniquely tagged sample unique index tab into to provide its source nucleic acid sequence can be interrelated with correspondence markings thing nucleotide sequence and to be differentiated sample in complex solution.In the embodiment of the method that comprises the marker molecules (for example DNA) that to carry out enzymatic modification, can be connected 3 ' end with marker molecules at sample and incorporate indexed molecule into by the checked order aptamer sequence that connection comprises index sequence.In the embodiment of the method that comprises the marker molecules (the DNA analogue that does not for example have phosphate backbone) that can not carry out enzymatic modification, index sequence is that 3 ' end in the analogue marker molecules is incorporated between synthesis phase.The sequencing library of two or more samples is merged and be loaded in the flow cell of sequenator, in step 540, with extensive parallel mode they are checked order.In step 550, analyze all order-checking information and in step 560, check the integrity of source sample according to the order-checking information relevant with marker molecules.Check each integrity of a plurality of sources sample be by at first with the sequence label grouping relevant with the same index sequence so that belong to these genome sequences in each library that the genome molecule by a plurality of samples consists of and be correlated with distinguishing sequence with the marker sequence and finish.Then the marker and the genome sequence that divide into groups are analyzed, with check for sequence that marker molecules was obtained corresponding to known unique sequence of adding in the sample of corresponding source.If the check sample integrity then can be analyzed the order-checking information relevant with sample gene group nucleic acid, so that the experimenter who is derived from the source sample to be provided relevant genetic information.For instance, if the check sample integrity is then analyzed the order-checking information relevant with genomic nucleic acids to determine to exist or do not exist chromosome abnormalty.The expression sample that is lack of consistency between the order-checking information of marker molecules and the known array is chaotic, and does not consider the subsidiary order-checking information relevant with genome cfDNA molecule.
Measure CNV and be used for antenatal diagnosis
The acellular foetal DNA that circulates in maternal blood and RNA can be used to the early stage Non-invasive Prenatal Diagnosis (NIPD) of the ever-increasing hereditary situation of number, both can be used for management and also can help the reproduction decision-making.The existence of the Cell-free DNA that circulates in blood flow is known having surpassed 50 years.Recently, in the parent blood flow of gestation time, found to exist the foetal DNA (people such as Lo (sieve), Lancet (lancet) 350:485-487[1997]) of in a small amount circulation.Be considered to be derived from dying placenta cells, acellular foetal DNA (cfDNA) be proved to be by on the length typically the short-movie section less than 200bp form, (the people such as Chan (old), clinical chemistry, 50:88-92[2004]), early can distinguished (the people such as Illanes (she draws Nice) when only having for 4 weeks pregnant, Early Human Dev (early stage human the growth), 83:563-566[2007]), and knownly within a few hours of childbirth, namely from circulating, parent removed the (people such as Lo (sieve), Am J Hum Genet (American Journal of Human Genetics), 64:218-224[1999]).Except cfDNA, in the parent blood flow, can also distinguish the fragment of (cfRNA) of acellular fetal rna, this is to be derived from the gene of being transcribed in fetus or placenta.Provide new chance for NIPD from the extraction of these fetuses heredity key elements of maternal blood sample and analysis subsequently.
Present method is a kind of method that is independent of polymorphism, it be in NIPD and it does not require from parent cfDNA and pick out fetus cfDNA in order to can determine the fetus dysploidy.In some embodiments, this dysploidy is a kind of complete karyomit(e) trisomy or monosomy, or a kind of partial trisomy or monosomy.The part dysploidy is caused by acquisition or lost part karyomit(e), and contains chromosome imbalance, these uneven generations from unbalanced transposition, unbalanced inversion, deletion and insertion.So far, with the compatible modal known dysploidy of life be trisomy 21, i.e. Down's syndrome (DS), it is by existing part or all of karyomit(e) 21 to cause.Under the few cases, DS can cause by a kind of heredity or accidental defective, and an all or part of additional copy of karyomit(e) 21 becomes and is attached on another karyomit(e) (normally karyomit(e) 14) thus, to form a single distortion karyomit(e).DS is associated with intellectual damage, serious difficulty of learning and the excess mortality rate that is caused by long-term health problem (for example heart trouble).Other dysploidy with known clinical significance comprise Edward's syndromes (trisomy 18) and handkerchief tower syndrome (trisomy 13), and their life in the past few months often is fatefulue.The dysploidy relevant with the sex chromosome number also is known and comprises monosomy X, the Turner syndrome in women newborn infant (XO) for example) and three times of X syndromess (XXX), and Ke Lin Fitow syndrome (XXY) and XYY syndromes in male neonate, they all join from comprising the different phenotypic correlations that sterile and intellectual skill reduces.Monosomy X[45, X] be the common cause of Abortion, it accounts for about 7% in spontaneous abortion.Based on 1-2/10,000 45, X (being also referred to as Turner syndrome) life birth frequency, estimate less than 1% 45, the X carcass term of surviving.About 30% Turner syndrome patient is 45, X cell system and 46, XX clone or contain the mosaic (Hooke (Hook) and Patrick Warburton (Warburton), 1983) of the clone of resetting X chromosome.Life birth baby's phenotype relatively gentle (considering high embryonic death rate) and having supposed suffers from may all life birth women carrying of Turner syndrome and contains two heterosomal clones.Monosomy X can betide among the women with 45, X or with 45, X/46XX, and betides among the male sex with 45, X/46XY.Euchromosome monosomy among the mankind is considered to inconsistent with life generally; Yet, the report of considerable cytogenetics described life birth child's a karyomit(e) 21 complete monosomy (people such as the blue baby of Butterworth (Vosranova), molecular cytogenetics (Molecular Cytogen.) 1:13[2008]; The people such as Zhu Tan (Joosten), antenatal diagnosis (Prenatal Diagn.) 17:271-5[1997]).Method described here can be used for antenatal diagnosis these and other chromosome abnormalty.
According to some embodiments, method disclosed here can be determined the existence of arbitrary chromosomal karyomit(e) trisomy among karyomit(e) 1 to 22, X and the Y or not exist.Can include but not limited to trisomy 21 (T21 according to the karyomit(e) trisomy example that the inventive method detects; Mongolism), trisomy 18 (T18; Edward's syndrome), trisomy 16 (T16), trisomy 20 (T20), trisomy 22 (T22; Cat's eye syndrome), trisomy 15 (T15; Pu Ruide Willie syndrome), trisomy 13 (T13; Handkerchief tower syndrome), trisomy 8 (T8; Hua Kani syndrome (Warkany Syndrome)), trisomy 9 and XXY (the special syndrome of gram Lay Lifei's that), XYY or XXX trisomy.Other autosomal complete trisomys are fatal when existing with non-chimeric attitude, can be compatible with life when still existing with chimeric attitude.Should be appreciated that in fetus cfDNA, different complete trisomy (no matter existing with chimeric attitude or non-chimeric attitude) and partial trisomy can be measured according to the content of teaching that provides at this.
Can utilize the limiting examples of the partial trisomy of the inventive method mensuration to include but not limited to partial trisomy 1q32-44, trisomy 9p, trisomy 4 mosaics, trisomy 17p, partial trisomy 4q26-qter, part 2p trisomy, partial trisomy 1q and/or partial trisomy 6p/ monosomy 6q.
Method disclosed here can also be used for be measured karyomit(e) monosomy X, karyomit(e) monosomy 21 and partial monosomy, such as monosomy 13, monosomy 15, monosomy 16, monosomy 21 and monosomy 22, known they with conceived miscarry relevant.Can also utilize method described here to measure typically relevant with complete dysploidy chromosomal partial monosomy.The limiting examples of the deletion syndrome of can the method according to this invention determining comprises because of the syndrome due to the partial deletion of chromosome.The example of the excalation that can measure according to method described here includes but not limited to karyomit(e) 1,4,5,7,11,18,15,13,17,22 and 10 excalation, and it is described in hereinafter.
The rare deformity of karyomit(e) 1 1q21.1 deletion syndrome or 1q21.1 (recurrent) are micro-deleted.After the deletion syndrome, also exist 1q21.1 to copy syndrome.Although deletion syndrome lacks the part of DNA at specified point, copy there is the similar portions of DNA in syndrome at identical point two or three copies.Having mentioned disappearance in the document and having copied is 1q21.1 copy number variation (CNV).1q21.1 disappearance can be relevant with TAR syndrome (thrombocytopenia companion's absence of radius).
Wolf-He Qihuoen syndrome (Wolf-Hirschhorn syndrome, WHS) (OMIN#194190) be the hemizygote disappearance of a kind of and karyomit(e) 4p16.3 relevant adjoin genetically deficient syndrome.Wolf-He Qihuoen syndrome is a kind of congenital malformation syndrome, it is characterized by in utero, in various degree dysplasia not enough with postnatal growth, cranium face feature (be that ' the Greece soldier helmet ' nose of appearance, high forehead, protruding cheek, hypertelorism, high arc eyebrow, eyes are outstanding, in the epicanthus, short people, turn and micromandible under the distinct corners of the mouth of face) and the epilepsy of characteristics are arranged.
The excalation of karyomit(e) 5 (also be called 5p-or 5p subtracts, and be called cat's cry syndrome (Cris duChat syndrome (OMIN#123450)) be because of the galianconism (galianconism) of karyomit(e) 5 (5p15.3-p15.2) lack due to.The baby who suffers from this symptom often sends and sounds the high-pitched tone cry that resembles mewing.This illness be characterized as disturbance of intelligence and growth delays, area of bed little (microcephalus), baby weight is low and muscular tension infancy weak (hypotonia), the facial characteristics of characteristics and the heart defect that may exist are arranged.
Also be called the William of karyomit(e) 7q11.23 deletion syndrome (OMIN 194050)-Bi Ren syndrome (Williams-Beuren Syndrome) and be cause the multisystem illness adjoin genetically deficient syndrome, it causes to the hemizygote disappearance of 1.8Mb that because of the 1.5Mb on the karyomit(e) 7q11.23 this hemizygote disappearance contains roughly 28 genes.
The Jacobsen syndrome (Jacobsen Syndrome) that also is called 11q disappearance illness is a kind of rare congenital illness, and it causes with the stub area disappearance of the karyomit(e) 11 of 11q24.1 because comprising the district.It can cause disturbance of intelligence, looks and the various practical problems of characteristics are arranged, and comprises heart defect and the illness of bleeding.
The partial monosomy that is called as the karyomit(e) 18 of monosomy 18p is a kind of rare karyomit(e) illness, wherein all or part of galianconism (p) (monosomic) of deletion 18.This disease typically is characterised in that of short and small stature, the mental retardation that degree is variable, and development of speech is slow, the deformity in skull and face (cranium face) zone, and/or extra body abnormality.For different cases, relevant craniofacial defect can alter a great deal in scope and seriousness.
Change the patient's condition that causes by the structure of karyomit(e) 15 or copy number purpose and comprise peace lattice Mann syndrome and Pu Ruide-Willie Cotard, they relate to losing of gene activity in the same part (15q11-q13 zone) at karyomit(e) 15.Should be appreciated that in the father and mother carrier, some transpositions and micro-deleted can be asymptomatic, but still can cause the main genetic diseases among the offspring.For example, carry the micro-deleted healthy mother of 15q11-q13 and can bear the child who suffers from peace lattice Mann syndromes (a kind of serious neurodegenerative disease).Therefore, method described here, equipment and system can be used for this type of excalation and other disappearances of identification fetus.
Partial monosomy 13q is a kind of rare chromosomal disorders, when it occurs in long-armed (q) one section disappearance of karyomit(e) 13 (monomer).The baby who suffers from partial monosomy 13q during birth can show the deformity in low birthweight, head and face (craniofacial region territory), skeletal abnormality (especially hand and pin) and other body abnormalities.Mental retardation is the feature of this patient's condition.Suffer from birth in the individuality of this disease, infantile mortality ratio is very high.The case of nearly all partial monosomy 13q does not all have obvious cause and (sporadic) occurs at random.
Smith-Margie Nice syndrome (Smith-Magenis syndrome) is because of due to the disappearance on the copy of karyomit(e) 17 or genetic material lose (SMS-OMIM#182290).This famous syndrome unusual with hypoevolutism, mental retardation, mental retardation, congenital anomaly (such as heart and kidney defective) and neurobehavioral (such as serious sleep disordered and self-injury behavior) is relevant.Smith-Margie Nice syndrome (SMS) is because of due to the 3.7-Mb intercalary deletion among the karyomit(e) 17p11.2 under most applications (90%).
22q11.2 deletion syndrome is also referred to as DiGeorge syndrome, is the syndromes that the disappearance by a bit of chromosome 22 causes.This disappearance (22q 11.2) occurs in this to the karyomit(e) near middle on one of karyomit(e) long-armed.The feature of this syndromes even also can change very extensively in the member of same family and affects a lot of parts of health.Characteristic sign and symptom can comprise inborn defect, such as congenital heart disease, relate to the jaw defective of the neuromuscular problem (velopharyngeal insufficiency) of closing, learning disorder, the Light Difference in the facial characteristics, and recurrent infection the most commonly.Micro-deleted among the chromosomal region 22q11.2 is to be associated with schizoid 20 to 30 times risk increase.
Disappearance on karyomit(e) 10 galianconism is relevant with the phenotype of DiGeorge syndrome sample.The partial monosomy of karyomit(e) 10p is rare, but observes in the patient of part demonstration DiGeorge syndrome feature.
In one embodiment, method described here, equipment and system are used to measure the part monosomy, include but not limited to karyomit(e) 1,4,5,7,11,18,15,13,17,22 and 10 partial monosomy can also be measured for example partial monosomy 1q21.11 with the method, partial monosomy 4p16.3, partial monosomy 5p15.3-p15.2, partial monosomy 7q11.23, partial monosomy 11q24.1, partial monosomy 18p, the partial monosomy of karyomit(e) 15 (15q11-q13), partial monosomy 13q, partial monosomy 17p11.2, the partial monosomy of chromosome 22 (22q11.2), and partial monosomy 10p.
Can comprise according to other partial monosomies that method described here is measured: unbalanced translocation t (8; 11) (p23.2; P15.5); 11q23 is micro-deleted; 17p11.2 disappearance; 22q13.3 disappearance; Xp22.3 is micro-deleted; The 10p14 disappearance; 20p micro-deleted [del (22) (q11.2q11.23)], 7q11.23 and 7q36 disappearance; The 1p36 disappearance; 2p is micro-deleted; 1 type neurofibromatosis (17q11.2 is micro-deleted), Yq disappearance; 4p16.3 micro-deleted; 1p36.2 micro-deleted; The 11q14 disappearance; 19q13.2 micro-deleted; Rubinstein-Taybi syndrome (Rubinstein-Taybi) (16p13.3 is micro-deleted); 7p21 is micro-deleted; Miller-Di Ke syndrome (Miller-Dieker syndrome) (17p13.3); And 2q37 is micro-deleted.Excalation can be the little disappearance of a chromosomal part, or it can be chromosomal micro-deleted, wherein monogenic disappearance can occur.
Identified a part because of the chromosome arm several syndrome (referring to the online human Mendelian inheritance of OMIN[(Online Mendelian Inheritance in Man), checking online at ncbi.nlm.nih.gov/omim) that copies due to copying.In one embodiment, the inventive method can be used for determining the existence that copies and/or increase of any chromosome segment among karyomit(e) 1 to 22, X and the Y or not existing.Can comprise copying of karyomit(e) 8,15, a part of 12 and 17 according to the syndromic limiting examples that copies that the inventive method is determined, it is described in hereinafter.
8p23.1 copy syndrome and be the caused rare genetic block that copies because of a zone of human chromosomal 8.This copies the sickness rate of syndrome in going out the survivor and is estimated as 1/64,000, and be the 8p23.1 deletion syndrome inverse.8p23.1 copy relevant from different phenotypes, comprise slow, hypoevolutism in a minute, mile abnormality form, with forehead protrude and arc eyebrow and congenital heart disease (CHD) in one or more.
It is a kind of syndrome that can differentiate clinically that karyomit(e) 15q copies syndrome (Dup15q), and its institute of copying because of karyomit(e) 15q11-13.1 causes.The baby who suffers from Dup15q presents hypotonia (muscular tension is low), growth retardation usually; They may suffer from harelip and/or cleft palate or heart, kidney or other organs deformity from birth; They show some degree cognitive slow/obstacle (mental retardation), speak and retardation of speaking and sense organ are processed imbalance.
Pa Nisite-Kai Lian syndrome (Pallister Killian syndrome) is the result of extra #12 chromosomal material.Usually have cell mixture (mosaic), some has extra #12 material, and some is normal (46 karyomit(e)s that do not have extra #12 material).Suffer from this syndromic baby and have a lot of problems, comprise that serious mental retardation, muscular tension facial characteristics and forehead low, " vulgarity " protrude.They tend to have very thin upper lip, thicker lower lip and brachyrhinia.Other health problems comprise epilepsy, it is bad to feed, ankylosis, adulthood cataract, hearing loss and heart defect.Suffer from syndromic people's life-span of Pa Nisite-Kai Lian and shorten.
Suffer from and be appointed as dup (17) (p11.2p11.2) or the individuality of the hereditary symptom of dup17p carries extra genetic information (be called as and copy) at the galianconism of karyomit(e) 17.Copying of karyomit(e) 17p11.2 causes Bai Tuoqi-Lu Puqi syndrome (Potocki-Lupski syndrome, PTLS), its hereditary symptom for just having identified, and the case of reporting in the medical literature only has tens examples.Have this patient who copies often present muscular tension low, feed bad and infantile arrested development, and the development that presents action and language milestone delays.Suffering from a lot of individualities of PTLS has any problem in pronunciation and Language Processing.In addition, the patient may have the behavioural characteristic seen in the autism of being similar to or the autism pedigree impaired patients.Suffer from the individuality of PTLS and may suffer from heart defect and sleep apnea.Comprising that copying of larger zone among the karyomit(e) 17p12 of gene PMP22 is known causes investigating Te-Ma Li-tell this disease (Charcot-Marie-Tooth disease).
CNV is relevant with stillbirth.Yet, because the genetic inherent limitations of conventional cell, think that therefore it is (people such as Harris (Harris), antenatal diagnosis (PrenatalDiagn) 31:932-944[2011]) that is not fully represented that CNV causes stillbirth.As shown in the example and herein other places state the existence that present method can the determining section dysploidy, for example disappearance of chromosome segment and amplification, and can be used for differentiating with the existence of determining the CNV relevant with stillbirth or do not exist.
Determine complete fetal chromosomal aneuploidy
In one embodiment, provide for the method for determining to exist or do not exist any or multiple different, complete fetal chromosomal aneuploidy in the parent specimen that comprises fetus and parent nucleic acid molecule.Preferably, the method has determined to exist or do not exist any four kinds or more kinds of different, complete fetal chromosomal aneuploidy.The step of the method comprises: (a) obtain the sequence information for the fetus in the parent specimen and parent nucleic acid; And (b) with this sequence information come in any one or a plurality of interested karyomit(e) that are selected from karyomit(e) 1-22, X and Y each and identify a number of sequence label, and for be used for described any one or a plurality of interested karyomit(e) each a normalization method chromosome sequence and identify a number of sequence label.This normalization method chromosome sequence can be a monosome, and perhaps it can be a group chromosome that is selected from karyomit(e) 1-22, X and Y.The method further uses the number for the number of each the described sequence label that identifies in described any one or a plurality of interested karyomit(e) and the described sequence label that described normalization method chromosome sequence identifies for each to come for each calculates a monosome dosage in described any one or a plurality of interested karyomit(e) in step (c); And (d) will in described any one or a plurality of interested karyomit(e) each each described monosome dosage with compare for each the threshold value in described any one or a plurality of interested karyomit(e), determine thus in this female parent specimen, to exist or do not exist any or multiple complete, different fetal chromosomal aneuploidy.
In some embodiments, step (c) comprises for each described interested karyomit(e) calculates a monosome dosage, as the ratio of the sequence label number that identifies for each described interested karyomit(e) with the sequence label number that identifies for each described interested chromosomal described normalization method chromosome sequence.
In other embodiments, step (c) comprises for each described interested karyomit(e) calculates a monosome dosage, as the ratio of the sequence label number that identifies for each described interested karyomit(e) with the sequence label number that identifies for each described interested chromosomal described normalization method karyomit(e).In other embodiments, step (c) comprises by making the sequence label number that obtains for interested karyomit(e) carry out related with interested chromosomal length, and make for the number of tags of interested chromosomal corresponding normalization method chromosome sequence and carry out the related interested chromosomal sequence label ratio that calculates with the length of normalization method chromosome sequence, and calculate a karyomit(e) dosage as interested chromosomal sequence label density and ratio for the sequence label density of normalization method chromosome sequence for interested karyomit(e).Repeat this calculating for each of whole interested sequences.Can repeating step (a)-(d) for the specimen from different parent experimenters.
An example by this embodiment has been determined four kinds or more kinds of complete fetal chromosomal aneuploidy in the parent specimen of a mixture that comprises fetus and parent Cell-free DNA molecule, this example comprises: (a) at least a portion in the Cell-free DNA molecule is checked order in order to obtain sequence information for the Cell-free DNA molecule of the fetus in specimen and parent; (b) come a number identifying a number of sequence label and come to identify for the normalization method karyomit(e) of each in described interested 20 or the more karyomit(e) sequence label for being selected among karyomit(e) 1-22, X and the Y each interested any 20 or more karyomit(e) with this sequence information; (c) use the number of the sequence label that identifies for the number of each described sequence label that identifies in described interested 20 or the more karyomit(e) and for each normalization method karyomit(e) to calculate a monosome dosage in described interested 20 or the more karyomit(e) each; And (d) will in described interested 20 or the more karyomit(e) each each monosome dosage with compare for the threshold value of each in described interested 20 or the more karyomit(e), and determine thus in specimen, to exist or do not exist any 20 kinds or more kinds of different, complete fetal chromosomal aneuploidy.
In another embodiment, as previously discussed be used for determine to exist or do not exist the method for any one or a plurality of different, complete fetal chromosomal aneuploidies to use a normalization method sector sequence to be used for determining interested chromosomal dosage in the parent specimen.In this case, the method comprises: (a) obtain the sequence information for the fetus in described sample and parent nucleic acid; And (b) come to identify a number of sequence label in any one or a plurality of interested karyomit(e) that are selected from karyomit(e) 1-22, X and Y each with described sequence information, and identify a number of sequence label for each a normalization method chromosome sequence that is used for described any one or a plurality of interested karyomit(e)s.This normalization method sector sequence can be chromosomal single section, and perhaps it can be one group of section from one or more coloured differently bodies.The method is further used for each the described sequence label number that identifies in described any one or a plurality of interested karyomit(e) and for the described sequence label number that described normalization method sector sequence identifies to come for each calculates a monosome dosage in described any one or a plurality of interested karyomit(e) in step (c); And (d) will in described any one or a plurality of interested karyomit(e) each each described monosome dosage with compare for each the threshold value in described one or more interested karyomit(e)s, and determine thus in described sample, to exist or do not exist one or more different, complete fetal chromosomal aneuploidies.
In some embodiments, step (c) comprises for each described interested karyomit(e) calculates a monosome dosage, as the ratio of the sequence label number that identifies for each described interested karyomit(e) with the sequence label number that identifies for each described interested chromosomal described normalization method sector sequence.
In other embodiments, step (c) comprises by making the sequence label number that obtains for interested karyomit(e) carry out related with interested chromosomal length, and make for the number of tags of interested chromosomal corresponding normalization method sector sequence and carry out the related interested chromosomal sequence label ratio that calculates with the length of normalization method sector sequence, and calculate a karyomit(e) dosage as described interested chromosomal sequence label density and ratio for the sequence label density of normalization method sector sequence for described interested karyomit(e).Repeat this calculating for each of whole interested sequences.Can be for the specimen repeating step (a)-(d) from different parent experimenters.
By determining that normalized karyomit(e) value (NCV) provides the means that are used for the karyomit(e) dosage of more different sample sets, this make in the specimen karyomit(e) dosage with carry out related at a mean value that makes up the corresponding karyomit(e) dosage in the lattice sample.Calculate this NCV, as:
NCV ij = x ij - &mu; ^ j &sigma; ^ j
Wherein
Figure BDA00002366924901212
With
Figure BDA00002366924901213
Estimation mean value and the standard deviation for j karyomit(e) dosage in a combination lattice sample accordingly, and
Figure BDA00002366924901214
For viewed j the karyomit(e) dosage of specimen i.
In some embodiments, determined to exist or do not exist at least a complete fetal chromosomal aneuploidy.In other embodiments, in a sample, determined to exist or do not have at least two kinds, at least three kinds, at least four kinds, at least five kinds, at least six kinds, at least seven kinds, at least eight kinds, at least nine kinds, at least ten kinds, at least ten one kinds, at least stone is two kinds, at least ten three kinds, at least ten four kinds, at least ten five kinds, at least ten six kinds, at least ten seven kinds, at least ten eight kinds, at least ten nine kinds, at least two ten kinds, at least two ten one kinds, at least two ten two kinds, at least two ten three kinds, or 24 kinds of complete fetal chromosomal aneuploidies, wherein 22 kinds in the complete fetal chromosomal aneuploidy are corresponding to any or multiple autosomal complete karyomit(e) dysploidy; The 23 and the 24 kind of karyomit(e) dysploidy corresponding to the complete fetal chromosomal aneuploidy of chromosome x and Y.Because heterosomal dysploidy can comprise tetrasomy, five body constituents and other polysomies, so can be at least 24 kinds, at least 25 kinds, at least 26 kinds, at least 27 kinds, at least 28 kinds, at least 29 kinds or at least 30 kinds of complete karyomit(e) dysploidy according to the number of the definite different complete karyomit(e) dysploidy of present method.The number of the different complete karyomit(e) dysploidy that therefore, is determined is relevant with the interested chromosomal number of selecting to be used for analyzing.
In one embodiment, really fix on as previously discussed in the parent specimen and to exist or do not exist any one or a plurality of different, complete fetal chromosomal aneuploidy to use for an interested chromosomal normalization method sector sequence, it is to be selected from karyomit(e) 1-22, X and Y.In other embodiments, two or more interested karyomit(e)s be selected among karyomit(e) 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22, X or the Y any two or more.In one embodiment, any one or a plurality of interested karyomit(e) that are selected from karyomit(e) 1-22, X and Y comprise at least two ten karyomit(e)s that are selected from karyomit(e) 1-22, X and Y, and have wherein determined to exist or do not exist at least two ten kinds of different, complete fetal chromosomal aneuploidies.In other embodiments, any one or a plurality of interested karyomit(e) that are selected from karyomit(e) 1-22, X and Y are whole karyomit(e) 1-22, X and Y, and have wherein determined to exist or do not exist the complete fetal chromosomal aneuploidy of whole karyomit(e) 1-22, X and Y.Confirmable complete different fetal chromosomal aneuploidies comprise complete karyomit(e) trisomy, complete karyomit(e) monosomy and complete karyomit(e) polysomy.The example of complete fetal chromosomal aneuploidy is including, but not limited to any one or a plurality of autosomal trisomy, for example trisomy 2, trisomy 8, trisomy 9, trisomy 20, trisomy 21, trisomy 13, trisomy 16, trisomy 18, trisomy 22; Heterosomal trisomy, for example 47, XXY, 47XXX and 47XYY; Heterosomal tetrasomy, for example 48, XXYY, 48, XXXY, 48XXXX and 48, XYYY; Heterosomal five body constituents, for example 49, XXXYY, 49, XXXXY, 49, XXXXX, 49, XYYYY; And monosomy X.Other the complete fetal chromosomal aneuploidies that can determine according to present method below will be described.
The fetal chromosomal aneuploidy of determining section
In another embodiment, provide for the method for determining to exist or do not exist any or multiple fetal chromosomal aneuploidy different, part in the parent specimen that comprises fetus and parent nucleic acid molecule.The step of the method comprises: (a) obtain the sequence information for the fetus in the described sample and parent nucleic acid; And (b) come to identify a number of sequence label in any one or a plurality of interested chromosomal any one or a plurality of section that are selected from karyomit(e) 1-22, X and Y each with this sequence information, and identify a number of sequence label for each the normalization method sector sequence in described any one or a plurality of section that are used for any one or a plurality of interested karyomit(e)s.This normalization method sector sequence can be a chromosomal single section, and perhaps it can be one group of section from one or more coloured differently bodies.The method further uses the number of the number of the described sequence label that identifies for described any one or a plurality of interested chromosomal any one or a plurality of section and the described sequence label that described normalization method sector sequence identifies for each to come to calculate a single section dosage in described any one or a plurality of interested chromosomal any one or a plurality of section each in step (c); And (d) will in described any one or a plurality of interested chromosomal any one or a plurality of section each each described monosome dosage with compare for each a threshold value of described any one or a plurality of interested chromosomal any one or a plurality of chromosome segment, and determine thus in described sample, to exist or do not have one or more fetal chromosomal aneuploidies different, part.
In some embodiments, step (c) comprises in any one or a plurality of interested chromosomal any one or a plurality of section each and calculates a single section dosage, as for the ratio of each the sequence label number that identifies in any one or a plurality of interested chromosomal any one or a plurality of section with the sequence label number that identifies for each the described normalization method sector sequence in described any one or a plurality of interested chromosomal any one or a plurality of section.
In other embodiments, step (c) comprises and followingly calculates a sequence label ratio for an interested section: carry out related by the number that makes the sequence label that obtains for interested section with the length of interested section, and make the number for the label of the corresponding normalization method sector sequence of interested section carry out related with the length of normalization method sector sequence, and calculate a section dosage as the sequence label density of interested section and ratio for the sequence label density of this normalization method sector sequence for interested section.Repeat this calculating for each of whole interested sequences.Can be for the specimen repeating step (a)-(d) from different parent experimenters.
By determining that a normalized section value (NSV) provides the means that are used for the section dosage of more different sample sets, this makes the section dosage in the specimen carry out related with a mean value that makes up the corresponding section dosage in the lattice sample.Calculate NSV, as:
NSV ij = x ij - &mu; ^ j &sigma; ^ j
Wherein
Figure BDA00002366924901232
With
Figure BDA00002366924901233
Estimation mean value and the standard deviation for j section dosage in a combination lattice sample accordingly, and x IjFor viewed j the section dosage of specimen i.
In some embodiments, determined to exist or do not exist a kind of fetal chromosomal aneuploidy of part.In other embodiments, in a sample, determined to exist or not exist the fetal chromosomal aneuploidy of two kinds, three kinds, four kinds, five kinds, six kinds, seven kinds, eight kinds, nine kinds, ten kinds, 15 kinds, 20 kinds, 25 kinds or more kinds of parts.In one embodiment, an any one interested section that is selected among karyomit(e) 1-22, X and the Y is to be selected from karyomit(e) 1-22, X and Y.In another embodiment, interested two or more sections that are selected from karyomit(e) 1-22, X and Y are to be selected from chromosome dyeing body 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22, X or Y.In one embodiment, interested any one or a plurality of section that are selected from karyomit(e) 1-22, X and Y comprise at least one that is selected from karyomit(e) 1-22, X and Y, five, ten, 15,20,25 or more section, and have wherein determined to exist or do not exist at least a, five kinds, ten kinds, 15 kinds, 20 kinds, 25 kinds fetal chromosomal aneuploidies different, part.Confirmable fetal chromosomal aneuploidy different, part comprises partial replication, part multiplication, partial insertion and excalation.The example of the fetal chromosomal aneuploidy of part comprises autosomal partial monosomy and partial trisomy.Autosomal partial monosomy comprises the partial monosomy of karyomit(e) 1, the partial monosomy of karyomit(e) 4, the partial monosomy of karyomit(e) 5, the partial monosomy of karyomit(e) 7, the partial monosomy of karyomit(e) 11, the partial monosomy of karyomit(e) 15, the partial monosomy of karyomit(e) 17, the partial monosomy of karyomit(e) 18 and the partial monosomy of chromosome 22.Below will illustrate can be according to the fetal chromosomal aneuploidy of other definite parts of present method.
In above-mentioned any one embodiment, this specimen is the maternal sample that is selected from blood, blood plasma, serum, urine and saliva sample.In some embodiments, this parent specimen is plasma sample.The nucleic acid molecule of this maternal sample is mixture fetus and Cell-free DNA molecule parent.Can use as carrying out the order-checking of nucleic acid in the illustrated next generation's order-checking (NGS) of the application's elsewhere.In some embodiments, order-checking is the extensive parallel order-checking of using by the synthesis method order-checking of reversible dyestuff terminator.In other embodiments, order-checking is the connection method order-checking.In other other embodiments, order-checking is single-molecule sequencing.Optionally, before order-checking, carry out an amplification step.
Measure the CNV of clinical disease
Measure the inborn defect except early stage, method described here can be used for measuring any unusual on expressing of genetic sequence in the genome.The abnormal number of genetic sequence in the genome on expressing is relevant from different symptom.This type of symptom includes but not limited to cancer, infectivity and autoimmune disorder, nervous system disorders, metabolism and/or cardiovascular disorder etc.
Correspondingly, in different embodiments, considered method described herein for the purposes of diagnosing and/or monitoring and/or treat these symptom.For instance, these methods can be used for determine the existence of disease or do not have, monitor the progress of disease and/or the effect for the treatment of plan, determine the existence of pathogenic agent (for example virus) nucleic acid or do not have, determine the chromosome abnormalty relevant with graft versus host disease (GVHD) and definite individuality in the effect of forensic analysis.
The CNV of cancer
Verified, the tumour DNA that contains measurable value from cancer patients's blood plasma and serum DNA, it can be recovered and as the source that substitutes of tumour DNA, and tumour be characterized as dysploidy or gene order or even complete chromosomal inappropriate number.Determine in prognosis and the diagnosis that can therefore be used for medical condition from the difference of the amount of a given sequence in the sample of an individuality (being interested sequence).In some embodiments, present method can be used for determining having or do not exist the karyomit(e) dysploidy in suspection or the known patient who suffers from cancer.
In certain embodiments, dysploidy is experimenter's genomic feature and the overall raising that has caused the cancer susceptibility.In certain embodiments, easily suffer from the specific cells (for example, tumour cell, former tumour neoplastic cell etc.) that tumour forms or tumour formation susceptibility improves and have the dysploidy feature.Specific dysploidy is relevant, as mentioned below with particular cancers or particular cancers susceptibility.
Correspondingly, the different embodiments of said method provides the mensuration of interested sequence (for example clinical correlated series) copy number variation in the specimen to the experimenter, and wherein certain variation of copy number provides there being the index of cancer and/or cancer susceptibility.In certain embodiments, this sample comprises the mixture of the nucleic acid that derives from two or more cells.In one embodiment, this nucleic acid mixture derives from normal cell and cancer cells, and cancer cells is to derive from the experimenter who suffers from medical science symptom (for example cancer).
The variation of whole chromosome number is often followed in the development of cancer, it is complete chromosomal aneuploidy, and/or the variation of chromosome segment number, it is the part dysploidy, these variations result from the process (people such as Tom (Thoma), Switzerland's medical science weekly (Swiss Med Weekly) 2011:141:w13170) that is called as chromosome instability (CIN).It is believed that a lot of solid tumors (such as breast cancer) by the accumulation of some genetic freaies from beginning to develop into transfer.[people such as Sa Tuo (Sato), cancer research (Cancer Res.), 50:7184-7189[1990]; The people such as Jian Sima (Jongsma), clinical pathology magazine: molecular pathology (J Clin Pathol:Mol Path) 55:305-309[2002])].This type of genetic freak may be given hyperplasia sexual clorminance, genetic instability and fast-developing drug-fast subsidiary ability and angiogenesis enhancing, proteolysis and transfer when accumulation.Genetic freak may affect recessive " tumor suppressor gene " or the oncogene of dominance action.Disappearance and the recombinant that causes loss of heterozygosity,LOH (LOH) are considered to play a major role in tumour progression by the tumor suppression allelotrope that discloses sudden change.
CfDNA has been found in the recycle system of diagnosing the patient who suffers from malignant diseases, these malignant diseases include but not limited to the lung cancer (people such as Pa Saka (Pathak), clinical medicine 52:1833-1842[2006]), the prostate cancer (people such as Xue Hua Zibaqi (Schwartzenbach), Clinical Cancer Research (Clin Cancer Res) 15:1032-8[2009]) and breast cancer (people such as Xue Hua Zibaqi can obtain online at breast-cancer-research.com/content/11/5/R71 [2009]).Differentiate that the genomic instability relevant with cancer (can determine according to the circulation cfDNA of cancer patient) is a kind of potential diagnosis and prognosis instrument.In one embodiment, method described herein is used to the working sample (sample that for example comprises nucleic acid mixture, these nucleic acid derive from suspects to suffer from or the known experimenter who suffers from cancer, for example cancer, sarcoma, lymphoma, leukemia, gonioma and blastoma) in the CNV of one or more interested sequences.In one embodiment, this sample is the derive plasma sample of (treated) of peripheral blood, and this peripheral blood may comprise the mixture of the cfDNA that derives from normal cell and cancer cells.In another embodiment, need to determine whether that having the biological sample of CNV is to derive from the cell that other biological is learned tissue, if there is cancer, then this cell comprises the mixture of cancer cells and non-cancer cells, other biological is learned tissue and is included but not limited to biological fluid, such as serum, sweat, tears, phlegm, urine, phlegm, the ear effluent, lymph, saliva, celiolymph, irrigating solution, marrow suspension, vaginal fluid, the transcervical irrigating solution, the brain fluid, ascites, milk, respiratory tract, the juice of enteron aisle and genitourinary tract, and the leucopheresis sample, or at biopsy, in cotton swab or the smear.In other embodiments, this biological sample is stool (ight soil) sample.
Method described herein is not limited to the analysis of cfDNA.Should be appreciated that, can carry out similar analysis to the cell DNA sample.
In different embodiments, interested sequence comprises known or suspect the nucleotide sequence that works in cancer development and/or progress.The example of interested sequence is included in the nucleotide sequence that increases or lack in the cancer cells as mentioned below, for example complete karyomit(e) and/or chromosome segment.
Total CNV number and risk of cancer.
Common cancer SNPs and common cancer CNVs by that analogy make disease risks only produce small increase separately.Yet in general, they may cause risk of cancer to raise in fact.About this point, the kind system that should point out the large dna fragmentation reported obtains and loses as the individual factor of easily suffering from neuroblastoma, prostate cancer and colorectal carcinoma, breast cancer and the relevant ovarian cancer of BRCA1 (referring to such as restraining very people such as (Krepischi) of Lay, breast cancer research (Breast Cancer Res.), 14:R24[2012]; The people such as Di Sijin (Diskin), nature (Nature) 2009,459:987-991; The people such as Liu (Liu), cancer research (Cancer Res) 2009,69:2176-2179; The people such as Lu Situo (Lucito), carcinobiology and treatment (Cancer Biol Ther) 2007,6:1592-1599; The people such as Si En (Thean), gene karyomit(e) cancer (Genes Chromosomes Cancer) 2010,49:99-106; The people such as Fan Katachalan (Venkatachalam), international journal of cancer (Int J Cancer) 2011,129:1635-1642; With people such as Jis former (Yoshihara), gene karyomit(e) cancer (Genes Chromosomes Cancer) 2011,50:167-177).Should point out, often the CNVs (common CNVs) that finds in healthy population is considered to work in the cancer nosetiology (referring to for example silk woods (Shlien) and wheat gold (Malkin) (2009) genome medical science (GenomeMedicine), 1 (6): 62).In a research test, test following hypothesis: common CNVs and malignant diseases (people such as silk woods (Shlien) etc., periodical (the Proc NatlAcad Sci USA) 2008 of institute of NAS, 105:11264-11269) relevant, this is the mapping of a kind of each known CNV, its locus is consistent with the locus of true cancer related gene (as breathing out the people such as gold (Higgins), nucleic acids research (Nucleic AcidsRes) 2007 is classified among the 35:D721-726).These CNV are called " cancer C NVs ".The initial analysis the (people such as silk woods (Shlien), periodical (the Proc Natl Acad Sci USA) 2008 of institute of NAS, 105:11264-11269), use 770 healthy genomes of A Feimei 500K (Affymetrix 500K) array collection (its average probe spacing is from being 5.8kb) assessment.Owing to think that generally CNVs is excluded the (people (2006) such as thunder Tang (Redon) in gene regions, nature (Nature) 2006,444:444-454), therefore find surprisingly, in many people of a restricted publication of international news and commentary entitled population, 49 cancer genes are directly contained by CNV or are overlapping.In front ten genes, can or more find cancer C NVs among the people at four.
Therefore think, can use the CNV frequency as the tolerance of risk of cancer (referring to for example U.S. Patent Publication No.: 2010/0261183A1).The CNV frequency can be measured by organic constitutive gene group simply or it can represent the part that derives from one or more tumours (neoplastic cell) (if these existence).
In certain embodiments, use in this makes a variation described method mensuration specimen (sample that for example comprises composition (planting system) nucleic acid) for copy number or the CNVs number in the nucleic acid mixture (for example plant is nucleic acid and the nucleic acid that derives from neoplastic cell).CNVs number raising (for example comparing with reference value) the expression experimenter who identifies in the specimen has risk of cancer or the cancer susceptibility is arranged.Should be understood that reference value can be with specifying population to become.The absolute value that should also be understood that CNV frequency amplification will depend on for the resolving power of the method for measuring CNV frequency and other parameters and become.Typically, determine the CNV frequency increase to reference value at least about 1.2 times of expression risk of cancer (referring to for example U.S. Patent Publication No.: 2010/0261183 A1), for example, it is the index that improves of risk of cancer (for example, with normal health with reference to the groupy phase ratio) that the CNV frequency increases at least 1.5 times of reference value or about 1.5 times or larger (such as 2 to 4 times of reference value).
Think that also the structure variation (comparing with reference value) that determines the mammalian genes group represents risk of cancer.In this context, in one embodiment, the available mammiferous CNV frequency of term " structure variation " multiply by mammiferous average CNV size (bp) and is defined.Therefore, high structure variation mark will increase and/or because large genomic nucleic acids disappearance occuring or copying because of the CNV frequency.Therefore, in certain embodiments, use the CNVs number in the method mensuration specimen described herein (for example, comprising the sample of composition (planting system) nucleic acid), to measure copy number variation size and number.In certain embodiments, represent risk of cancer greater than about 1 megabasse or greater than about 1.1 megabasses or greater than about 1.2 megabasses or greater than about 1.3 megabasses or greater than about 1.4 megabasses or greater than about 1.5 megabasses or greater than about 1.8 megabasses or greater than the structure variation total points in the genomic dna of about 2 megabasse DNA.
These methods are considered to provide the tolerance of any risk of cancer, and these cancers include but not limited to acute and chronic leukemia, lymphoma, between a lot of solid tumors of matter or epithelium, the cancer of the brain, breast cancer, liver cancer, cancer of the stomach, colorectal carcinoma, B cell lymphoma, lung cancer, bronchogenic carcinoma, colorectal carcinoma, prostate cancer, breast cancer, carcinoma of the pancreas, cancer of the stomach, ovarian cancer, bladder cancer, the cancer of the brain or central nervous system cancer, the peripheral nervous system cancer, esophagus cancer, cervical cancer, melanoma, uterus carcinoma or carcinoma of endometrium, oral carcinoma or pharynx cancer, liver cancer, kidney, cancer of bile ducts, small intestine or appendix cancer, salivary-gland carcinoma, thyroid carcinoma, adrenal carcinoma, osteosarcoma, chondrosarcoma, liposarcoma, carcinoma of testis, and malignant fibrous histiocytoma, and other cancers.
Complete chromosomal aneuploidy.
As noted above, in cancer, there is high-frequency dysploidy.In some research that checks somatocyte copy number variation (SCNAs) prevalence rate in cancer, have been found that the full arm SCNAs of dysploidy or whole chromosome SCNAs are to 1/4th genomes of typical cancer cells influential (referring to such as the gentle gold of uncle people such as (Beroukhim), nature (Nature) 463:899-905[2010]).In some type of cancer, repeatedly observe the whole chromosome variation.For example, acute myelocytic leukemia 10% to 20% (acute myeloid leukaemia, AML) in the case, and see in some solid tumor (comprising Ai Wen sarcoma (Ewing ' s Sarcoma) and fiber-like knurl) karyomit(e) 8 acquisition (referring to such as the people such as Bayer Nader (Barnard), leukemia (Leukemia) 10:5-12[1996]; The people such as thatch Ritz (Maurici), cancer genet and cytogenetics (Cancer Genet.Cytogenet.) 100:106-110[1998]; Strange people such as (Qi), cancer genet and cytogenetics (Cancer Genet.Cytogenet.) 92:147-149[1996]; The people such as Bayer Nader D.R. (Barnard, D.R.), blood (Blood) 100:427-434[2002]; Etc..Karyomit(e) obtains and lose schematic but non-limiting catalogue are shown in the table 1 in the human cancer.
Table 1: the schematically chromosomal acquisition of particular rendition and losing (referring to for example Gordon in the human cancer The people (2012) such as (Gordon), natural summary genetics (Nature Rev.Genetics), 13:189-203).
Figure BDA00002366924901291
In different embodiments, method described herein can be used for detecting and/or quantification and cancer generally relevant and/or with the whole chromosome dysploidy of concrete related to cancer.Therefore, for example, in certain embodiments, considered to detect and/or quantize it is characterized in that with the acquisition shown in the table 1 or the whole chromosome dysploidy of losing.
The horizontal chromosome segment copy number variation of arm.
Multinomial research has reported that the variation of the horizontal copy number of arm strides the pattern of a large amount of cancer samples (people such as woods (Lin), cancer research (Cancer Res) 68,664-673 (2008); The people such as George (George), PLoS ONE2, e255 (2007); The people such as Dai Michelisi (Demichelis), gene karyomit(e) cancer (GenesChromosomes Cancer) 48:366-380 (2009); The gentle gold of uncle people such as (Beroukhim), nature (Nature.) 463 (7283): 899-905[2010]).Observed in addition and observed, the frequency of the horizontal copy number variation of arm reduces along with chromosome arm length.Adjust according to this tendency, the strong evidence that most of chromosome arm performance preferentially obtains or loses, but stride a plurality of cancer pedigrees, both are rare (referring to such as the gentle gold of uncle people such as (Beroukhim), nature (Nature) 463 (7283): 899-905[2010]) all.
Therefore, in one embodiment, method described here is used for the horizontal CNVs of arm (comprise a chromosome arm or basically the CNVs of a chromosome arm) in the working sample.Among the CNVs in the specimen that comprises composition (planting system) nucleic acid, CNVs can be determined, and in a little composition nucleic acid, the horizontal CNVs of arm can be identified.In certain embodiments, the horizontal CNVs of identification arm (if existence) in the sample that comprises nucleic acid mixture (for example, deriving from Normocellular nucleic acid and the nucleic acid that derives from neoplastic cell).In certain embodiments, sample source is in suspecting or the known experimenter who suffers from cancer (for example, cancer, sarcoma, lymphoma, leukemia, gonioma, blastoma and similar cancer).In one embodiment, sample is the derive plasma sample of (treated) of peripheral blood, and this peripheral blood can comprise the mixture of the cfDNA that derives from normal cell and cancer cells.In another embodiment, whether the biological sample that is used for definite CNV that exists derives from cell, if there is cancer, then these cells comprise that this other biological is learned tissue and included but not limited to biological fluid, for example serum from the cancer cells of other biological tissue and the mixture of non-cancer cells, sweat, tears, phlegm, urine, phlegm, ear effluent, lymph, saliva, celiolymph, irrigating solution (ravages), marrow suspension, vaginal fluid, transcervical irrigating solution, the brain fluid, ascites, milk, respiratory tract, enteron aisle and genitourinary tract juice, and white corpuscle exclusion sample, or at biopsy, in cotton swab or the smear.In other embodiments, biological sample is ight soil (ight soil) ight soil (ight soil) sample.
In different embodiments, include but not limited to the horizontal CNVs of arm cited in the table 2 through the CNVs that identification expression cancer exists or risk of cancer increases.As illustrated at table 2, comprise that some CNVs that substantive arm level obtains represents to exist cancer or some risk of cancer to increase.Therefore, for example, 1q obtains expression acute lymphocytoblast leukemia (ALL), breast cancer, GIST, HCC, lung NSC, medulloblastoma, melanoma, MPD, ovarian cancer and/or prostate cancer existence or risk increases.3q obtains expression esophagus squamous cell carcinoma, lung SC and/or MPD exists or risk increases.7q obtains expression colorectal carcinoma, neurospongioma, HCC, lung NSC, medulloblastoma, melanoma, prostate cancer and/or kidney existence or risk increases.7p obtains expression breast cancer, colorectal carcinoma, esophageal adenocarcinoma, neurospongioma, HCC, lung NSC, medulloblastoma, melanoma and/or kidney existence or risk increases.20q obtains expression breast cancer, colorectal carcinoma, dedifferentiated liposarcoma, esophageal adenocarcinoma, esophageal squamous cell carcinoma, neurospongioma cancer, HCC, lung NSC, melanoma, ovarian cancer and/or kidney etc. existence or risk increases.
Similarly, as illustrated in the table 2, comprise that some CNVs that substantive arm level is lost represents that some cancer exists and/or risk increases.Therefore, for example, 1p loses the existence of expression gastrointestinal stromal tumors or risk increases.4q loses expression colorectal carcinoma, esophageal adenocarcinoma, lung sc, melanoma, ovarian cancer and/or kidney existence or risk increases.17p loses expression breast cancer, colorectal carcinoma, esophageal adenocarcinoma, HCC, lung NSC, lung SC and/or ovarian cancer etc. existence or risk increases.
Table 2:16 kind cancer hypotype (breast cancer, colorectal carcinoma, dedifferentiated liposarcoma, esophageal gland Cancer, esophageal squamous cell carcinoma, GIST (gastrointestinal stromal tumors), neurospongioma, HCC (hepatocellular carcinoma), lung NSC, lung SC, medulloblastoma, melanoma, MPD (myelosis sexual dysfunction), ovarian cancer, prostatitis Gland cancer, acute lymphocytoblast leukemia (ALL) and kidney) each in the dyeing of remarkable arm level Tagma section copy number variation is (referring to the gold people such as (Beroukhim) that softens such as uncle, nature (Nature) (2010) 463 (7283): 899-905).
Figure BDA00002366924901331
Figure BDA00002366924901341
Figure BDA00002366924901351
The example of the horizontal copy number variation of arm Relations Among is intended to for illustrative and not restrictive.The variation of the horizontal copy number of other arms and its cancer have been known to the those skilled in the art.
The variation of less (for example focus) copy number.
As noted above, in certain embodiments, method described here can be used for measuring the existence of chromosome amplification or does not exist.In some embodiments, chromosome amplification is one or more whole chromosomal acquisitions.In other embodiments, chromosome amplification is the acquisition of one or more sections in the karyomit(e).Still in other other embodiments, chromosome amplification is the acquisition of two or more sections in two or more karyomit(e)s.In different embodiments, chromosome amplification can relate to the acquisition of one or more oncogenes.
The dominant acting gene that is associated with the human entity knurl is typically brought into play their effect by crossing the expression of expressing or changing.Gene amplification is a kind of common mechanism that causes genetic expression to be raised.Evidence from cytogenetical study shows, in surpassing people's breast cancer of 50% remarkable amplification has occured.It should be noted that most, the amplification that is positioned at the proto-oncogene human epidermal growth factor receptor 2 (HER2) on the karyomit(e) 17 (17 (17q21-q22)) has caused the crossing of HER2 acceptor on cell surface to express, thereby (the people such as Park (Piao) of excessive and signal dysregulation in causing breast cancer and other malignant tumours, Clinical Breast Cancer (clinical breast cancer), 8:392-401[2008]).Had been found that in other human malignancies multiple oncogene is amplified.The example of cellular oncogene amplification comprises the amplification of the following in the human tumor: promyelocytic leukemia clone HL60, and the c-myc in the small cell lung cancer, former neuroblastoma (Phase I and IV), neuroblastoma clone, Retinoblastoma Cells system and primary tumo(u)r, and the N-myc in small cell lung cancer cell system and the tumour, L-myc in small cell lung cancer cell system and the tumour, in the acute myelocytic leukemia and colon carcinoma cell line in c-myb, the epidermoid carcinoma cell, and the c-erbb in the former glioma of going crazy, lung, colon, bladder, and the c-K-ras-2 in the primary carcinoma of rectum, N-ras in the breast cancer cell line (Varmus (Wa Musi) H., Ann Rev Genetics (genetics yearbook), 18:553-612 (1984), [quote at the people such as Watson (fertile gloomy), Molecular Biology ofthe Gene (molecular biology of gene) (the 4th edition; Benjamin/Cummings Publishing Co. company 1987)].
It is the common cause of disease of very eurypalynous cancer that oncogene is copied, and 1 amplification of P70-S6 kinases and breast cancer are exactly this situation.In this type of situation, genetic replication betides in the somatocyte and only affects the genome of cancer cells self (rather than whole organism), and is then much smaller for the impact of any filial generation afterwards.Other examples of the oncogene of increasing in human cancer comprise MYC, ERBB2 (EFGR), CCND1 (cycle element D1), FGFR1 and the FGFR2 in the breast cancer; MYC in the cervical cancer and ERBB2; HRAS in the cervical cancer, KRAS and MYB; MYC in the esophagus cancer, CCND1 and MDM2; CCNE in the cancer of the stomach, KRAS and MET; ERBB1 in the glioblastoma multiforme and CDK4; CCND1 in the head and neck cancer, ERBB1 and MYC; CCND1 in the hepatocellular carcinoma; MYCB in the neuroblastoma; MYC: the ERBB2 in the ovarian cancer and AKT2; MDM2 in the sarcoma and CDK4; MYC in the small cell lung cancer.In one embodiment, the inventive method can be used for determining existing or not existing with the amplification of the oncogene of related to cancer.The oncogene of increasing in certain embodiments, is relevant with breast cancer, cervical cancer, colorectal carcinoma, esophagus cancer, cancer of the stomach, glioblastoma multiforme, head and neck cancer, hepatocellular carcinoma, neuroblastoma, ovarian cancer, sarcoma and small cell lung cancer.
In one embodiment, present method can be used to determine to exist or do not exist a kind of chromosome deletion.In some embodiments, this chromosome deletion is to lose one or more complete karyomit(e)s.In other embodiments, this chromosome deletion is to lose chromosomal one or more section.In other other embodiments, this chromosome deletion is to lose two or more chromosomal two or more sections.This chromosome deletion can relate to loses one or more tumor suppressor genes.
The chromosome deletion that relates to tumor suppressor gene is considered to play a kind of vital role in the development of solid tumor and progress.Retinoblastoma tumor suppressor gene (Rb-1) (being positioned at chromosome 13q14) is the tumor suppressor gene of the most widely characterization.Rb-1 gene product (nuclear phosphoprotein of a kind of 105kDa) people such as (Howe (person of outstanding talent according to) that obviously in cell cycle regulating, plays an important role, Proc Natl Acad Sci (institute of NAS periodical) (U.S.), 87:5883-5887[1990]).By by a point mutation also or the allelic inactivation of these two genes of chromosome deletion cause expression change or that lose of Rb albumen.Have been found that the Rb-i gene alteration does not exist only in the retinoblastoma, but also be present in other malignant tumours, such as osteosarcoma, the small cell lung cancer (people such as Rygaard (Rui Gede), Cancer Res (cancer research), 50:5312-5317[1990)]) and breast cancer.Restriction fragment length polymorphism (RFLP) research shows, this type of tumor type has been lost heterozygosity through the 13q that is everlasting, prompting is because total chromosome deletion, one of allelotrope of Rb-1 gene is lost (the people such as Bowcock (Bai Kaoke), Am J HumGenet (American Journal of Human Genetics), 46:12[1990]).Comprise relate to karyomit(e) 6 and other with x linkedly copy, the karyomit(e) 1 of disappearance and unbalanced translocation shows the zone of karyomit(e) 1 unusually, particularly q21-1q32 and 1p11-13, may hold and myelosis's property upper relevant oncogene of excrescent chronic and advanced stage of morbidity or the tumor suppressor gene (people such as Caramazza (OK a karaoke club horse Sa), Eur JHematol (European hematology magazine), 84:191-200[2010]).Myelosis's property vegetation also is associated with the disappearance of karyomit(e) 5.Karyomit(e) 5 complete lost or intercalary deletion is modal chromosome abnormalities in the myelodysplastic syndrome (MDS).The del (5q) that separates/5q-MDS patient has than the more favourable prognosis of those patients of suffering from extra caryogram defective, and they tend to develop myelosis's property vegetation (MPN) and acute myelocytic leukemia.The frequency of unbalanced karyomit(e) 5 disappearances has been drawn an idea, that is: 5q holds one or more tumor suppressor genes, and these genes play basic effect in the growth control of hemopoietic stem cell/hemopoietic progenitor cell (HSCsHPC).Usually the mapping of the cytogenetics in the zone (CDR) of disappearance concentrates on candidate's tumor suppressor gene of 5q31 and 5q32 identification, comprise that ribosomal subunit RPS14, transcription factor Egr1/Krox20 and cytoskeleton reinvent albumen, α-Lian albumen (Eisenmann (Ai Siman), Oncogene (oncogene), 28:3429-3441[2009]).The cytogenetics of fresh tumour and tumor cell line and allelotype research are verified, from the some clear and definite zone on the chromosome 3p (comprising 3p25,3p21-22,3p21.3,3p12-13 and 3p14) allelic lose be in the main epithelial cancer of the wide spectrum of the cancer of lung cancer, breast cancer, kidney, head and neck cancer, ovarian cancer, cervical cancer, colorectal carcinoma, carcinoma of the pancreas, esophagus cancer, bladder cancer and other organs related the earliest with modal genomic abnormality.Some tumor suppressor genes have been mapped to the chromosome 3p zone, and think that intercalary deletion or promotor high methylation are prior to ((Angeloni (An Geluoni) D. that loses at the developing 3p of cancer or complete karyomit(e) 3, Briefings Functional Genomics (functional genomics bulletin), 6:19-39[2007]).
The newborn infant and the children that suffer from mongolism (DS) usually present inborn symptomatic leukemia and have acute myelocytic leukemia and the risk of the leukemic increase of acute lymphocytoblast.Karyomit(e) 21 (holding about 300 genes) can involve the various structures distortion, for example transposition in leukemia, lymphoma and solid tumor, disappearance and amplification.In addition, identified the gene that is arranged on the karyomit(e) 21 vital role that risen has occured in tumour.The isostructural distortion of the company of the number of entities of karyomit(e) 21 is associated with leukemia, and specific gene comprises RUNX1, TMPRSS2 and TFF, they are positioned at 21q, (Fonatsch (Feng Nacike) C works in tumour occurs, Gene Chromosomes Cancer (gene, karyomit(e) and cancer), 49:497-508[2010]).
Consider foregoing, in different embodiments, method described here can be used for determining section CNVs, known one or more oncogenes or the tumor suppressor gene and/or known relevant with cancer or risk of cancer increase of comprising of these CNVs.In certain embodiments, can measure the CNVs in the specimen that comprises composition (planting system) nucleic acid, and can identification section in those composition nucleic acid.In certain embodiments, identification section CNVs (if existence) in the sample that comprises nucleic acid mixture (for example, deriving from Normocellular nucleic acid and the nucleic acid that derives from neoplastic cell).In certain embodiments, sample source is in suspecting or the known experimenter who suffers from cancer (for example, cancer, sarcoma, lymphoma, leukemia, gonioma, blastoma etc.).In one embodiment, sample is the derive plasma sample of (treated) of peripheral blood, and this peripheral blood can comprise the mixture of the cfDNA that derives from normal cell and cancer cells.In another embodiment, be used for determining to exist the biological sample of Dare CNV whether to derive from cell, if there is cancer, then this cell comprises from the cancer cells of other biological tissue and the mixture of non-cancer cells, this other biological is learned tissue and is included but not limited to biological fluid, serum for example, sweat, tears, phlegm, urine, phlegm, the ear effluent, lymph, saliva, celiolymph, irrigating solution (ravages), marrow suspension, vaginal fluid, the transcervical irrigating solution, the brain fluid, ascites, milk, respiratory tract, enteron aisle and genitourinary tract juice, with white corpuscle exclusion sample, or at biopsy, in cotton swab or the smear.In other embodiments, biological sample is ight soil (ight soil) sample.
Be used for determining that the CNVs that cancer exists and/or risk of cancer increases can comprise amplification or disappearance.
In different embodiments, comprise the one or more amplifications shown in the table 3 through the CNVs that identification expression cancer exists or risk of cancer increases.
Table 3: it is characterized by with the schematic of the amplification of related to cancer but nonrestrictive chromosome segment.Cited Type of cancer be the gentle gold of uncle (Beroukhim), identify among nature (Nature) 18:463:899-905 Those.
Figure BDA00002366924901391
Figure BDA00002366924901401
Figure BDA00002366924901421
In certain embodiments, with (at this) described Amplification above or respectively, the CNVs that exists cancer or risk of cancer to increase through the identification expression comprises the one or more disappearances shown in the table 4.
Table 4: it is characterized by with the schematic of the disappearance of related to cancer but nonrestrictive chromosome segment.The institute The type of cancer of enumerating is the gentle gold of uncle (Beroukhim), institute among nature (Nature) 18:463:899-905 Those of identification.
Figure BDA00002366924901422
Figure BDA00002366924901441
Figure BDA00002366924901451
Figure BDA00002366924901461
The dysploidy (dysploidy of for example, identifying in table 3 and the table 4) that characterizes various cancers through identification can comprise the known etiologic etiological gene of cancer (such as tumor suppression, oncogene etc.) that involves.Can also survey these dysploidy relevant to identify, but unknown gene in advance.
For example, the gentle gold of above-mentioned uncle people such as (Beroukhim) utilizes GRAIL (gene relationship between the Loci20 that involves) (algorithm of funtcional relationship between a kind of muca gene group zone), changes to assess potential oncogene according to copy number.Based on the text similarity of open summary on the viewpoint that some target gene works with common pathway of all papers of mentioning gene, GRAIL estimates each gene in one group of genome area and ' dependency ' of the gene in other zones.These methods allow identification/characterize in dispute in advance with the incoherent gene of concrete cancer.Table 5 explanation is known to be positioned at the amplification section identified and the target gene of predicted gene, and table 6 illustrates the known disappearance section identified and the target gene of predicted gene of being positioned at.
Table 5: known or prediction be present in the zone that it is characterized in that the amplification in the various cancers schematically but Non-limiting chromosome segment and gene (referring to such as the gentle gold of above-mentioned uncle people such as (Beroukhim)).
Figure BDA00002366924901472
Figure BDA00002366924901481
Figure BDA00002366924901491
[] Table 6: known or prediction be present in the zone that it is characterized in that the amplification in the various cancers schematically, But non-limiting chromosome segment and gene (referring to such as the gentle gold of above-mentioned uncle people such as (Beroukhim)).
Figure BDA00002366924901492
Figure BDA00002366924901501
Figure BDA00002366924901511
Figure BDA00002366924901531
Figure BDA00002366924901551
In different embodiments, the CNV of the amplification region of having considered to use method identification in this identification to comprise to identify in the table 5 or the section of gene, and/or the CNV of the section of the disappearance zone of using method identification in this identification to comprise to identify in the table 6 or gene.
In one embodiment, these methods described here provide a kind of means to evaluate cognation between the degree that gene amplification and tumour develop.Association between amplification and/or disappearance and carcinoma stage or the grade can be important for prognosis, because this type of information can consist of the definition of hereditary tumor grade, this can predict the following course of disease of the more late tumor with the worst prognosis better.In addition, can be useful with these events about the information of early stage amplification and/or disappearance event when carrying out association aspect the predictive factors of progression of disease subsequently.
Can and lack the gene amplification by present method identification carries out related with other known parameters (such as tumour grade, medical history, Brd/Urd marker index, Hormonal States, nodus lymphoideus transferring rate, tumor size, survival time with from epidemiology and obtainable other tumor characteristics of biostatistics research).For example, remain to comprise the carcinoma in situ of atypical hyperplasia, conduit, cancer and the lymphnode metastatic of Phase I-III by the tumour DNA that present method is tested, in order to allow to be identified in amplification and disappearance and the cognation between the stage.The association of making can be so that effectively therapeutic intervention becomes possibility.For example, the zone of consistent amplification can be contained one and cross the gene of expressing, and its product perhaps can receiving treatment property attached (for example, growth factor receptor tyrosine kinase p185HER2).
In different embodiments, these methods described here can be used for identification amplification and/or the disappearance event relevant with resistance by determining the copy number variation of those nucleotide sequences from primary carcinoma disease to the cell of transferring to other positions.A kind of performance of the karyotype instability that if gene amplification and/or disappearance are the permission resistance to be developed rapidly, compare with the tumour from the patient of chemosensitivity so, will expect from more amplifications and/or disappearance in the patient's of chemotherapy resistance the primary tumo(u)r.For example, if the amplification of specific gene has caused drug-fast development, so in from the patient's of chemotherapy resistance tumour cell rather than in primary tumo(u)r, will expect to have obtained consistent amplification around the zone of those genes.The discovery of the cognation between gene amplification and/or disappearance and development of drug resistance can allow to identify the patient that can or can not benefit from adjuvant therapy.
In maternal sample, determine to exist or do not exist the illustrated mode of fetal chromosomal aneuploidy complete and/or part, method described here, equipment and system can be used to determine in any patient's sample that comprises nucleic acid (for example DNA or cfDNA) (comprising the patient's sample that is not maternal sample), to determine to exist or do not exist karyomit(e) dysploidy complete and/or part to be similar to for determining.This patient's sample can be as in the illustrated any biological sample type of the application's elsewhere.Preferably, this sample obtains by non-invasive process.For example, this sample can be blood sample, or its serum and plasma part.Alternately, this sample can be urine samples or excrement sample.In other other embodiments, this sample is a kind of biopsy sample.Under the top and bottom, this sample comprises nucleic acid, for example cfDNA or genomic dna, and it is purified, and uses above-mentioned any NGS sequence measurement to check order.
The two can determine the karyomit(e) dysploidy complete and part that is associated with formation and the progress of cancer according to present method.
In different embodiments, when using method described here to determine that cancer exists and/or risk when increasing, can be with respect to one or more karyomit(e)s of the CNV that measures with data normalization.In certain embodiments, can be with respect to one or more chromosome arms of the CNV that measures with data normalization.In certain embodiments, can be with respect to the one or more concrete section of the CNV that measures with data normalization.
Except the effect of CNV in cancer, CNV is also relevant with increasing common complex disease, comprises human immunodeficiency virus (HIV), autoimmune disorder and a series of neuropsychopathy disease.
CNV in communicable disease and the autoimmune disorder
Up to now, large quantity research has been reported relation between the CNV that relates to inflammation and immunoreactive gene and HIV, asthma, Crow grace disease (Crohn ' s disease) and other autoimmune conditions (people such as Fan Cini (Fanciulli), clinical genetics (Clin Genet) 77:201-213[2010]).For example, CNV among the CCL3L1 with HIV/AIDS susceptibility (CCL3L1,17q11.2 disappearance), rheumatoid arthritis (CCL3L1,17q11.2 disappearance) and mucocutaneous lymphnode syndrome (Kawasaki disease) (CCL3L1,17q11.2 copies) implication; CNV among the HBD-2 has reported easy trouble colon Crohn's disease (HDB-2,8p23.1 disappearance) and psoriasis (HDB-2,8p23.1 disappearance); CNV among the FCGR3B has shown the glomerulonephritis (FCGR3B in the easy trouble systemic lupus erythematous, the 1q23 disappearance, 1q23 copies), the scorching (FCGR3B of anti-neutrophil's matter antibody (ANCA) related artery, 1q23 lacks), and the risk increase of suffering from rheumatoid arthritis.It is relevant with the CNV of different genes seat to have at least two kinds of inflammation or autoimmune disorder to show.For example, Crohn's disease is not only low relevant with the copy number of HDB-2, and relevant with the common deletion polymorphism of coding p47 immunity correlative GTP ase family member's IGRM upstream region of gene.Except relevant with the FCGR3B copy number, report that also the SLE susceptibility significantly increases in the lower experimenter of complement integral part C4 copy number.
Relation between the genomic deletion of GSTM1 (GSTM1,1q23 disappearance) and GSTT1 (GSTT1,22q11.2 disappearance) locus and atopic asthma risk increase has been reported in a large amount of independent studies.In some embodiments, method described here can be used for determining the existence of the CNV relevant with inflammation and/or autoimmune disorder or not existing.For example, these methods can be used for determining to suspect the existence of CNV among the patient who suffers from HIV, asthma or Crohn's disease.The CNV example relevant with this type of disease includes but not limited to the disappearance at 17q11.2,8p23.1,1q23 and 22q11.2 place, and the copying of 17q11.2 and 1q23 place.In some embodiments, the inventive method can be used for determining the existence of CNV in the gene, and these genes include but not limited to CCL3L1, HBD-2, FCGR3B, GSTM, GSTT1, C4 and IRGM.
Neural CNV disease
Relation between newborn CNV and hereditary CNV and some common neurologicals and the psychiatric disorders has been reported in some case of autism, schizophrenia and epilepsy and neurodegenerative disease, such as Parkinson's disease, amyotrophic lateral sclerosis (ALS) and autosomal dominant A Zihaimo sick (people such as Fan Cini (Fanciulli), clinical genetics (Clin Genet) 77:201-213[2010]).Having observed in the patient who suffers from autism and autism pedigree obstacle (ASD) at the 15q11-q13 place exists the cytogenetics that copies unusual.According to autism genome plan alliance (Autism Genome projectConsortium), the 154CNV that comprises some recurrent CNV also is positioned at karyomit(e) 15q11-q13 or new genome position, comprise karyomit(e) 2p16,1q21, and relevant with the lucky syndrome of Smith-Ma, with the overlapping zone of ASD in 17p12.Micro-deleted or little the copying of recurrent on the karyomit(e) 16p11.2 emphasized following observations: newborn CNV detects at the known locus of regulating and control the gene that cynapse differentiation and regulation and control Glutamatergic neurotransmitter discharge, for example SHANK3 (22q13.3 disappearance), the overhanging albumen 1 (NRXN1 of presynaptic membrane, 2p16.3 lack) and neuroglia quality (NLGN4, Xp22.33 disappearance).Schizophrenia is also relevant with a plurality of newborn CNV.Relevant with schizophrenia micro-deleted and little copying comprises the gene that belongs to neurodevelopment and Glutamatergic approach and excessively represents, and a plurality of CNV that prompting affect these genes can directly consist of schizoid pathogeny, ERBB4 for example, and 2q34 lacks; SLC1A3, the 5p13.3 disappearance; RAPEGF4, the 2q31.1 disappearance; CIT, 12.24 disappearances; With the polygene with newborn CNV.CNV is also relevant with other nervous disorders, comprises epilepsy (CHRNA7,15q13.3 disappearance), Parkinson's disease (SNCA 4q22 copies) and ALS (SMN1,5q12.2.-q13.3 disappearance; With the SMN2 disappearance).In some embodiments, method described here can be used for determining the existence of the CNV relevant with nervous system disorders or not existing.For example, these methods can be used for determining to suspect the existence of the CNV among the patient who suffers from autism, schizophrenia, epilepsy, neurodegenerative disease (such as Parkinson's disease), amyotrophic lateral sclerosis (ALS) or autosomal dominant A Zihaimo disease.Method can be used for measuring the CNV of the gene relevant with nervous system disorders (including but not limited to any one in autism pedigree obstacle (ASD), schizophrenia and the epilepsy), and the CNV of the gene relevant with neurodegenerative illness (such as Parkinson's disease).The CNV example relevant with this type of disease includes but not limited to copying of 15q11-q13,2p16,1q21,17p12,16p11.2 and 4q22 place, and in the disappearance at 22q13.3,2p16.3, Xp22.33,2q34,5p13.3,2q31.1,12.24,15q13.3 and 5q12.2 place.In some embodiments, these methods can be used for determining the existence of CNV in the gene, and these genes include but not limited to SHANK3, NLGN4, NRXN1, ERBB4, SLC1A3, RAPGEF4, CIT, CHRNA 7, SNCA, SMN1 and SMN2.
CNV and metabolic or cardiovascular disease
Relation between metabolic and cardiovascular sick characteristics (such as familial hypercholesterolemia (FH), atherosclerosis and coronary artery disease) and the CNV has been reported in the large quantity research (people such as Fan Cini (Fanciulli), clinical genetics (Clin Genet) 77:201-213[2010]).For example, locating to observe at some the FH patient's who does not carry other LDLR sudden change LDLR gene (LDLR, 19p13.2 lack/copy) kind is to reset (being mainly disappearance).Another example is the LPA gene of coding apolipoproteins (a) (apo (a)), and the plasma concentration of apolipoproteins (a) is relevant with the risk of coronary artery disease, myocardial infarction (MI) and apoplexy.Comprise the variability of plasma concentration between individuality of apo (a) of lipoprotein Lp (a) above 1000 times, and this variability 90% determines at the LPA locus in heredity, wherein plasma concentration and Lp (a) isotype size and height change ' kringle 4 ' tumor-necrosis factor glycoproteins number (scope 5 to 50) is proportional.These data show that the CNV at least two kinds of genes can be related with cardiovascular risk.Can be in the large-scale research specific relation for search CNV and cardiovascular disorder of method described here.In some embodiments, the inventive method can be used for determining the existence of the CNV relevant with metabolic or cardiovascular disease or not existing.For example, the inventive method can be used for determining to suspect the existence of CNV among the patient who suffers from familial hypercholesterolemia.Method described here can be used for measuring the CNV of the gene relevant with metabolic or cardiovascular disease (for example hypercholesterolemia).The CNV example relevant with this type of disease includes but not limited to that the 19p13.2 in the LDLR gene lacks/copies, and the amplification in the LPA gene.
Measure the complete chromosomal aneuploidy in patient's sample
In one embodiment, provide method, be used for determining to exist or do not exist any or multiple different, complete karyomit(e) dysploidy in the patient's specimen that comprises nucleic acid molecule.In some embodiments, the method determines to exist or do not exist any or multiple different, complete karyomit(e) dysploidy.The step of the method comprises: (a) obtain the sequence information for the patient's nucleic acid in patient's specimen; And (b) come to identify a number of sequence label in any one or a plurality of interested karyomit(e) that are selected from karyomit(e) 1-22, X and Y each with this sequence information, and identify a number of sequence label for each a normalization method chromosome sequence that is used for described interested any one or more karyomit(e)s.This normalization method chromosome sequence can be a monosome, and perhaps it can be a group chromosome that is selected from karyomit(e) 1-22, X and Y.The method further uses the number for the number of each the described sequence label that identifies in described any one or a plurality of interested karyomit(e) and the described sequence label that described normalization method chromosome sequence identifies for each to come for each calculates a monosome dosage in described interested any one or the more karyomit(e) in step (c); And (d) will in described any one or a plurality of interested karyomit(e) each each described monosome dosage with compare for each the threshold value in described interested any one or the more karyomit(e), determine thus in this patient's specimen, to exist or do not exist any or multiple different, complete patient's karyomit(e) dysploidy.
In some embodiments, step (c) comprises for each described interested karyomit(e) calculates a monosome dosage, as the ratio of the sequence label number that identifies for each described interested karyomit(e) with the sequence label number that identifies for each described interested chromosomal described normalization method chromosome sequence.
In other embodiments, step (c) comprises for each described interested karyomit(e) calculates a monosome dosage, as the ratio of the sequence label number that identifies for each described interested karyomit(e) with the sequence label number that identifies for each described interested chromosomal described normalization method karyomit(e).In other embodiments, step (c) comprising: carry out related by the number that makes the sequence label that obtains for interested karyomit(e) with interested chromosomal length, and make and carry out related for the number of tags of interested chromosomal corresponding normalization method chromosome sequence with the length of normalization method chromosome sequence, calculate a sequence label ratio for an interested karyomit(e), and calculate a karyomit(e) dosage for this interested karyomit(e), as interested chromosomal sequence label density and ratio for the sequence label density of normalization method chromosome sequence.Repeat this calculating for each of whole interested sequences.Can be for the specimen repeating step (a)-(d) from different patients.
An example by this embodiment has been determined one or more complete karyomit(e) dysploidy in comprising cancer patients's specimen of Cell-free DNA molecule, this example comprises: (a) at least a portion in the Cell-free DNA molecule is checked order in order to obtain sequence information for the patient's Cell-free DNA molecule in specimen; (b) come a number identifying a number of sequence label and come to identify for each described interested 20 or a more chromosomal normalization method karyomit(e) sequence label for being selected from each of karyomit(e) 1-22, X and Y interested any 20 or more karyomit(e) with this sequence information; (c) use the number of the described sequence label that identifies for each described interested 20 or more karyomit(e) and the number of the sequence label that identifies for each normalization method karyomit(e) to calculate a monosome dosage for each interested 20 or more karyomit(e); And (d) will for each described interested 20 or more chromosomal each monosome dosage with compare for each interested 20 or more chromosomal threshold value, and determine thus in patient's specimen, to exist or do not exist any 20 kinds or more kinds of different, complete karyomit(e) dysploidy.
In another embodiment, be used for as previously discussed determining existing or do not exist the method for any one or a plurality of different, complete karyomit(e) dysploidy to use a normalization method sector sequence to determine interested chromosomal dosage in patient's specimen.In this example, the method comprises: (a) obtain the sequence information for the nucleic acid in described sample; And (b) come to identify a number of sequence label in any one or a plurality of interested karyomit(e) that are selected from karyomit(e) 1-22, X and Y each with described sequence information, and identify a number of sequence label for each a normalization method sector sequence that is used for described interested any one or more karyomit(e)s.This normalization method sector sequence can be a chromosomal single section, and perhaps it can be one group of section from one or more coloured differently bodies.The method has further used the number of the described sequence label that identifies for the number of each the described sequence label that identifies in described any one or a plurality of interested karyomit(e) and for described normalization method sector sequence to come for each calculates a monosome dosage in described interested any one or the more karyomit(e) in step (c); And (d) will in described any one or a plurality of interested karyomit(e) each each described monosome dosage with compare for each the threshold value in described interested one or more karyomit(e), and determine thus in patient's sample, to exist or do not exist one or more different, complete karyomit(e) dysploidy.
In some embodiments, step (c) comprises for each described interested karyomit(e) calculates a monosome dosage, as the ratio of the sequence label number that identifies for each described interested karyomit(e) with the sequence label number that identifies for each described interested chromosomal described normalization method sector sequence.
In other embodiments, step (c) comprising: carry out related by making the sequence label number that obtains for interested karyomit(e) with interested chromosomal length, and make and carry out related for the number of tags of interested chromosomal corresponding normalization method sector sequence with the length of normalization method sector sequence, calculate a sequence label ratio for an interested karyomit(e), and calculate a karyomit(e) dosage for this interested karyomit(e), as interested chromosomal sequence label density and ratio for the sequence label density of normalization method sector sequence.Repeat this calculating for each of whole interested sequences.Can be for the specimen repeating step (a)-(d) from different patients.
By determining that a normalized karyomit(e) value (NCV) provides a kind of means that are used for the karyomit(e) dosage of more different sample sets, it make in the specimen karyomit(e) dosage with carry out related at a mean value that makes up the corresponding karyomit(e) dosage in the lattice sample.Calculate NCV, as:
NCV ij = x ij - &mu; ^ j &sigma; ^ j
Wherein
Figure BDA00002366924901622
With
Figure BDA00002366924901623
Respectively estimation average and the standard deviation of the j time karyomit(e) dosage of qualified samples collection, and x IjThe j time karyomit(e) dosage observed value of specimen i.
In some embodiments, determined to exist or do not exist a complete karyomit(e) dysploidy.In other embodiments, determine to have or do not exist two kinds, three kinds, four kinds, five kinds, six kinds, seven kinds, eight kinds, nine kinds, ten kinds, 11 kinds, 12 kinds, 13 kinds, 14 kinds, 15 kinds, 16 kinds, 17 kinds, 18 kinds, 19 kinds, 20 kinds, 21 kinds, 22 kinds, 23 kinds or 24 kinds of complete karyomit(e) dysploidy in a sample, wherein 22 kinds of complete karyomit(e) dysploidy are corresponding to any one or a plurality of autosomal complete karyomit(e) dysploidy; The 23 and the 24 kind of karyomit(e) dysploidy corresponding to the complete karyomit(e) dysploidy of chromosome x and Y.Because dysploidy can comprise trisomy, tetrasomy, five body constituents and other polysomies, and in various disease and in the different steps of same disease, the number of complete karyomit(e) dysploidy changes, and the number of the complete karyomit(e) dysploidy of determining according to present method is at least 24, at least 25, at least 26, at least 27, at least 28, at least 29,30complete, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100 or more kinds of karyomit(e) dysploidy at least.The System Core type analysis of tumour discloses, chromosome number in cancer cells is alterable height, scope is from hypodiploid (considerably being less than 46 karyomit(e)s) to tetraploid with hypertetraploid (up to 200 karyomit(e)s) (Storchova (stoke watt) and Kuffer (withered no), J Cell Sci (cell science magazine), 121:3859-3866[2008]).In some embodiments, the method comprise determine suspect from one or the known patient's who suffers from cancer (for example colorectal carcinoma) sample in exist and or not do not exist up to 200 kinds or more kinds of karyomit(e) dysploidy.These karyomit(e) dysploidy comprise loses one or more complete karyomit(e)s (hypodiploid), obtains to comprise trisomy, tetrasomy, five body constituents and other polysomic complete karyomit(e).Illustrated such as the elsewhere in the application, can also determine the acquisition of chromosome segment and/or lose.The method is applicable to determine from suspecting or knownly having or do not exist different dysploidy in suffering from such as the sample the patient of the illustrated cancer of the application's elsewhere.
In some embodiments, the interested karyomit(e) that any one among karyomit(e) 1-22, X and the Y can be in determining to exist or do not exist any or multiple different, complete karyomit(e) dysploidy in aforesaid patient's specimen.In other embodiments, two or more interested karyomit(e)s be selected among karyomit(e) 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22, X or the Y any two or more.In one embodiment, any one or a plurality of interested karyomit(e) that are selected from karyomit(e) 1-22, X and Y comprise at least two ten karyomit(e)s that are selected from karyomit(e) 1-22, X and Y, and have wherein determined to exist or do not exist at least two ten kinds of different, complete karyomit(e) dysploidy.In other embodiments, any one or a plurality of interested karyomit(e) that are selected from karyomit(e) 1-22, X and Y are whole karyomit(e) 1-22, X and Y, and have wherein determined to exist or do not exist the complete karyomit(e) dysploidy of whole karyomit(e) 1-22, X and Y.Complete, the different karyomit(e) dysploidy that can be determined comprise any one or a plurality of complete karyomit(e) monosomy among karyomit(e) 1-22, X and the Y; Among karyomit(e) 1-22, X and the Y any one or a plurality of complete karyomit(e) trisomys; Among karyomit(e) 1-22, X and the Y any one or a plurality of complete karyomit(e) tetrasomies; Among karyomit(e) 1-22, X and the Y any one or a plurality of complete karyomit(e) five body constituents; And any one or other a plurality of complete karyomit(e) polysomies among karyomit(e) 1-22, X and the Y.
Measure the chromosome dyad dysploidy in patient's sample
In another embodiment, provide several different methods, be used for determining to exist or do not exist any or multiple karyomit(e) dysploidy different, part in the patient's specimen that comprises nucleic acid molecule.The step of the method comprises: (a) obtain the sequence information for the patient's nucleic acid in the described sample; And (b) come to identify a number of sequence label in any one or a plurality of interested karyomit(e) that are selected from karyomit(e) 1-22, X and Y each with this sequence information, and identify a number of sequence label for each the normalization method sector sequence in described any one or a plurality of section that are used for any one or a plurality of interested karyomit(e)s.This normalization method sector sequence can be a chromosomal single section, and perhaps it can be one group of section from one or more coloured differently bodies.The method has further used the number of the number of the described sequence label that described any one or a plurality of interested chromosomal any one or a plurality of section identify for each and the described sequence label that described normalization method sector sequence identifies for each to come to calculate a single section dosage in described any one or a plurality of interested chromosomal any one or a plurality of section each in step (c); And (d) will compare for each the described monosome dosage in each described any one or a plurality of interested chromosomal any one or a plurality of section and a threshold value for each described any one or a plurality of interested chromosomal any one or a plurality of chromosome segments, and determine thus in described sample, to exist or do not have one or more karyomit(e) dysploidy different, part.
In some embodiments, step (c) comprising: for each any one or a plurality of interested chromosomal any one or a plurality of section calculate a single section dosage, as for each any one or the ratio of the sequence label number that identifies of the sequence label number that identifies of a plurality of interested chromosomal any one or a plurality of section and described normalization method sector sequence for each described any one or a plurality of interested chromosomal any one or a plurality of sections.
In other embodiments, step (c) comprising: carry out related by the number that makes the sequence label that obtains for interested section with the length of interested section, and make and carry out related for the number of tags of the corresponding normalization method sector sequence of interested section with the length of normalization method sector sequence, calculate a sequence label ratio for an interested section, and calculate a section dosage for this interested section, as the sequence label density of interested section and ratio for the sequence label density of normalization method sector sequence.Repeat this calculating for each of whole interested sequences.Can be for the specimen repeating step (a)-(d) from different patients.
By determining that normalized section value (NSV) provides a kind of means that are used for the section dosage of more different sample sets, this make in the specimen section dosage with carry out related at a mean value that makes up the corresponding section dosage in the lattice sample.Calculate NSV, as:
NSV ij = x ij - &mu; ^ j &sigma; ^ j
Wherein
Figure BDA00002366924901652
With Respectively estimation average and the standard deviation of the j time section dosage of qualified samples collection, and x IjThe j time section dosage observed value of specimen i.
In some embodiments, determined to exist or do not exist a kind of karyomit(e) dysploidy of part.In other embodiments, in a sample, determined to exist or not exist the karyomit(e) dysploidy of two kinds, three kinds, four kinds, five kinds, six kinds, seven kinds, eight kinds, nine kinds, ten kinds, 15 kinds, 20 kinds, 25 kinds or more kinds of parts.In one embodiment, an any one interested section that is selected among karyomit(e) 1-22, X and the Y is to be selected from karyomit(e) 1-22, X and Y.In other embodiments, two or more interested sections that are selected from karyomit(e) 1-22, X and Y be selected among karyomit(e) 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22, X or the Y any two or more.In one embodiment, any one or a plurality of interested section that are selected from karyomit(e) 1-22, X and Y comprise at least one that is selected from karyomit(e) 1-22, X and Y, five, ten, 15,20,25,50,75,100 or more section, and have wherein determined to exist or do not exist at least a, five kinds, ten kinds, 15 kinds, 20 kinds, 25 kinds, 50 kinds, 75 kinds, 100 kinds or more kinds of karyomit(e) dysploidy different, part.Confirmable karyomit(e) dysploidy different, part comprises partial replication, part multiplication, partial insertion and excalation.
Can be used for determining that having or do not exist the sample of karyomit(e) dysploidy (part or complete) in the patient can be at the illustrated any biological sample of the application's elsewhere.Can be used for determining that the sample type of the dysploidy among the patient or sample will depend on the type of the known or disease of suffering under a cloud of patient.For example, can select faecal samples to determine to exist or do not exist the dysploidy that is associated with the colorectum cancer as the DNA source.The method also is applicable to tissue sample described herein.Preferably, this sample is the biological sample that obtains by non-invasive mode, for example plasma sample.Illustrated such as the elsewhere in the application, can use in the illustrated next generation's order-checking (NGS) of the application's elsewhere and carry out the order-checking of the nucleic acid in patient's sample.In some embodiments, order-checking is the extensive parallel order-checking of using by the synthesis method order-checking of reversible dyestuff terminator.In other embodiments, order-checking is the connection method order-checking.In other other embodiments, order-checking is single-molecule sequencing.Optionally, before order-checking, carry out an amplification step.
In some embodiments, determined to exist or do not exist in the patient body dysploidy, this patient suspects to suffer from as in the illustrated cancer of the application's elsewhere, the for example cancer of lung cancer, breast cancer, kidney, head and neck cancer, ovarian cancer, cervical cancer, colorectal carcinoma, carcinoma of the pancreas, esophagus cancer, bladder cancer and other organs, and hematologic cancers.Hematologic cancers comprises marrow, blood and lymphoid cancer, and lymphsystem comprises lymphoglandula, lymphatic vessel, tonsilla, thymus gland, spleen and digestive tube Lymphoid tissue.Start from leukemia and the myelomatosis of marrow, and to start from lymphoid lymphoma be modal leukemia disease type.
In patient's sample, can make and have or do not exist determining of one or more karyomit(e) dysploidy and to the following without limits, that is: determine that the patient is to a kind of susceptibility of concrete cancer, as known or do not know that the part of routine screening determines to exist or do not exist the cancer of being concerned about in the middle of the patient of a kind of cancer of susceptible, prognosis to disease is provided, assessment is to the needs of adjuvant therapy, and the progress of definite disease or recovery.
Genetic counseling
Fetal chromosomal abnormalities be cause miscarrying, the major cause of congenital anomaly and perinatal death (people such as Wellesley (Wellesley), European human genetics magazine (Europ.J.Human Genet.), 20:521-526[2012]; The people such as Changgong (Nagaoka), naturally summary genetics (Nature Rev.Genetics) 13:493-504[2012]).Since having introduced amniocentesis, introduced subsequently chorionic villus sampling (CVS), the pregnant woman has had the right to obtain the information (No. 77 (ACOG PracticeBulletin No.77) of ACOG practice bulletin: tocology and gynecology (Obstet Gynecol) 109:217-227[2007]) of relevant fetal chromosomal situation.When obtaining enough organizing, fetal cell or the chorionic villus that obtains from these programs carried out the typing of cytogenetics caryogram, make in most cases diagnostic sensitivity and specificity very high (about 99%) (Halle graceful (Hahnemann) and Fu Jisile (Vejerslev), antenatal diagnosis (Prenat Diagn.), 17:801-8201997; NICHD amniocentesis research national registration JAMA 236:1471-1476[1976]).Yet, these programs also to fetus and pregnant woman brought risk (Audi wins people such as (Odibo), tocology and gynecology (Obstet Gynecol) 112:813-819[2008]; Audi wins people such as (Odibo), tocology and gynecology (Obstet Gynecol) 111:589-595[2008]).
In order to alleviate these risks, a series of Prenatal Screening algorithms have been developed, for the most common fetus trisomy-T21 (mongolism) and trisomy 18 (T18 occurring, edward's syndrome), and their possibility of the trisomy 13 of less degree (T13, handkerchief tower syndrome) is with women's classification.Examination typically relates to the multiple biochemical analysis thing in the point measurement maternal serum when difference, measures fetus nuchal translucency (NT) in conjunction with ultrasound investigation, and the merging of other maternal factors (for example age), to produce risk score.According to its development and improvement and depend on and when give examination that (only the gravidic junior three month or second are three months for many years, continuous or abundant integration) and how to give examination (only serum or serum and NT combination), developed the options menu (No. 77 (ACOG Practice BulletinNo.77) announced in the ACOG practice: tocology and gynecology (Obstet Gynecol) 109:217-227[2007]) of have different recall rates (65% to 90%) and high screening positive rate (5%).
For the patient, after this multi-step process, gained information or " risk score " may make its puzzlement and cause its anxiety, particularly in the situation that comprehensive consulting lacks.At last, when the women makes decision, for the balance of the risk of miscarriage due to invasive program result.Acquisition aids under this background about the better Noninvasive mode of the clearer and more definite information of fetal chromosomal situation and makes decision.Acquisition is considered to and can provides by method described herein about this type of Noninvasive improved means of the clearer and more definite information of fetal chromosomal situation.
In different embodiments, considered that genetic counseling is as a part of using analysis described herein, particularly under clinical settings.On the contrary, an option that provides under prenatal care and the correlated inheritance consulting background can be provided dysploidy detection method described herein.
Therefore, in different embodiments, method described herein can be used as the preliminary examination women of the conceived risk of establishing before having (for example, for) or is provided as those women's that " routine " examination is positive secondary examination.In certain embodiments, considered that the antenatal test of Noninvasive described herein (NIPT) method comprises the genetic counseling part in addition, and and/or NIPT method described herein in randomly or clearly incorporate genetic counseling and pregnancy " management " into.
For example, in certain embodiments, there is the conceived risk of establishing before one or more in the women.This type of risk includes but not limited to following one or more:
1) maternal age surpasses 35 years old, although point out, about 80% children that suffer from from birth mongolism are given birth to by the women less than 35 years old.
2) have the previous fetus/children of euchromosome trisomy.Depend on trisomy type, previous conceived whether spontaneous abortion and the maternal age the when maternal age when occuring for the first time and afterwards antenatal diagnosis, think that again incidence is about 1.6 times to about 8.2 times of maternal age risk.
Previous fetus/the children of the sex chromosomal abnormality that 3) has---not every sex chromosomal abnormality has maternal source, and is not all to have recurrent risk.When they occured, incidence was about 1.6 times to about 1.5 times of parent age risk again.
4) the parental generation carrier of chromosome translocation.
5) the parental generation carrier of chromosome inversion.
6) parental generation dysploidy or mosaic.
7) use some auxiliary procreation technology.
Under this type of situation, obedience is stated different considerations, mother, such as warp and people's consultations such as doctor, genetic counseling teacher, can be provided the method described herein of using, be used for Noninvasive and determine the existence of fetus dysploidy (for example trisomy 21, trisomy 18, trisomy 13, monosomy X etc.) or do not exist.In this, should point out that method described herein is considered to effectively, even in the gravidic junior three month.Therefore, in certain embodiments, considered when 8 week, to use NIPT method described herein, and in different embodiments, in about 10 weeks or more late.
In certain embodiments, can provide method described herein as the secondary examination to those women that " routine " examination is positive.For example, in certain embodiments, the pregnant woman may present textural anomaly, and for example fetus cystic hygroma, or the nuchal translucency that improves is for example as using ultrasonography to detect.Typically, carry out the ultrasound examination of textural defect in 18 weeks to 22 weeks, and particularly when observing irregularity, can with the coupling of fetal ultrasound electrocardiogram(ECG.Considered when unusually (for example observing at this, " routine " examination is positive) time, mother, such as warp and people's consultations such as doctor, genetic counseling teacher, can be provided the method described herein of using, be used for Noninvasive and determine the existence of fetus dysploidy (for example trisomy 21, trisomy 18, trisomy 13, monosomy X etc.) or do not exist.
Therefore, in different embodiments, considered genetic counseling, an integral part of the exploitation of (NIPT) described herein analysis as prenatal care, conceived management and/or labor scheme/design wherein is provided.Those women by be positive to routine screening (or establishing risk before other) provide NIPT as the secondary examination, and expectation can reduce the number of times of unnecessary amniocentesis and CVS program.Yet, because letter of consent is the important component part of NIPT, so the necessity of genetic counseling improves.
Because NIPT positive findings (using method described herein) more is similar to the positive findings of amniocentesis or CVS, therefore should be before this test, when genetic counseling, provide to the women to determine whether it needs the chance of the information of this degree.NIPT genetic counseling before the test should comprise that also discussion/suggestion is to confirm via CVS, amniocentesis, the abnormality test result of (depending on conceptional age) such as umbilical cord punctures, thereby the desired arrangement of time to the result can give with due regard to, planning after being used for testing is according to national genetic counseling Shi Xuehui (NSGC, USA) about the statement of this theme (referring to such as people such as Dai Fusi (Devers), the antenatal test/non-invasive prenatal diagnosis of Noninvasive: the position of national genetic counseling Shi Xuehui (by the NSGC public policy council) NSGC position statement 2012 (Noninvasive Prenatal Testing/Noninvasive Prenatal Diagnosis:the position of the National Society of GeneticCounselors (by NSGC Public Policy Committee) .NSGC Position Statements 2012; The people such as Berne (Benn), antenatal diagnosis (Prenat Diagn), 31:519-522[2011]), because present all karyomit(e) or the hereditary situations of not examination of NIPT, so it may not can replace risk assessment and the antenatal diagnosis of standard.Considered that at this patient with other factors (for example, the ultrasonic wave result of study that some is unusual) that hint chromosome abnormalty should accept genetic counseling, the option of conventional checking property diagnostic test wherein is provided to them, and no matter NIPT result.The women should also be appreciated that when genetic counseling NIPT the possibility of result quantity of information is little for some patient.
Compare with amniocentesis, detection in dysploidy represents that typically the karyomit(e) of fetus forms, but may represent in some cases to use the NIPT of said method perhaps more to be similar to CVS aspect restricted placenta dysploidy or the restricted placenta mosaic (CPM).In the CVS result of today, there is CPM in about situation of 1% to 2%, and some women in more late conceptional age experience amniocentesis, come so that create a difference between clear placenta dysploidy contrast fetus dysploidy of separating after CVS.Along with NIPT implements more extensively, estimate that therefore the CPM situation can produce the positive NIPT result that may be subsequently can not confirmed by invasive program (particularly amniocentesis) of certain number.Again, in different embodiments, considered that this information (such as by doctor, genetic counseling teacher etc.) under the background of genetic counseling presents to the patient.
Will be appreciated that, in different embodiments, an integral part of genetic counseling may be to recommend to make a definite diagnosis mode, inform the risk level arrangement of time, and make a definite diagnosis mode for difference and carry out arrangement of time, can be used to provide the input about the value of information that provides by these verification methods, particularly under the background of selecting the conceived time.In different embodiments, genetic counseling can also be established a scheme, is used for monitoring conceived (for example follow-up ultrasound investigation, extra doctor pay a home visit etc.), and is used for setting up in due course a series of decision points.In addition, genetic counseling can advise and help to develop labor scheme that labor scheme for example can comprise nurses etc. about childbirth place (for example family, hospital, specialized facilities etc.), the related obtainable third party of personnel, baby in childbirth place.
Although above discussion concentrates on method described herein as an integral part (and perhaps being the second instrument) of antenatal diagnosis, if but along with clinical experience accumulation and the success of the result from the comparative studies to the routine screening, NIPT method so described herein may replace existing examination scheme and may be used as main tool.
Considered that also method described herein will be for the pregnancy discovery purposes of polycyesis.
Typically, estimate that genetic counseling (such as mentioned above) can provide by doctor's (such as main doctor, tocologist obstetrician etc.) and/or by genetic counseling teacher or other qualified medical professions.In certain embodiments, provide advice face-to-face, yet it should be understood that in some cases, can provide advice by remote access (for example, by text, mobile phone, application program of mobile phone, tablet PC application program, Internet etc.).
It will also be appreciated that in certain embodiments a genetic counseling or one integral part can be sent by computer system.For example, can provide that " " system, it is in response to test result, provide genetic counseling information (for example mentioned above) from medical treatment and nursing supplier's instruction and/or in response to inquiry (for example from patient's inquiry) for intelligence suggestion.In certain embodiments, information will be the specific clinical information that provides by doctor, health care system and/or patient.In certain embodiments, information can provide with iterative manner.Therefore, for example, the patient can provide inquiry and the system of " and if so on " can return message, for example connotation of diagnose option, risk factor, arrangement of time and Different Results.
In certain embodiments, information can provide (for example, presenting at computer screen) in temporary mode.In certain embodiments, information can provide in the nonvolatile mode.Therefore, for example, information can print (for example, as the menu of option and/or suggestion, it is randomly with arranging correlation time etc.) and/or be stored in computer-readable media (magnetic medium for example, such as local hard drive, server etc.; Optical media; Flash memory etc.) on.
Should be appreciated that this type systematic typically is configured to the security that provides enough, in order to keep patients ' privacy, for example according to the current standards in the industry.
The above discussion of genetic counseling is intended to for illustrative and not restrictive.Genetic counseling is in the medical science one the good branch that confirms, and belongs in practitioner's the skill about the combination of the consulting integral part of analysis described herein.In addition, it should be understood that the character of genetic counseling and relevant information and suggestion probably changes along with this field development.
Determine the fetus mark
The fetus mark determines that method is disclosed in U.S. Patent Application Publication 2010-0010085 (117.201), U.S. Patent Application Publication 2011-0201507 (120.201), Application No. 13/365, in 240 (submissions on February 2nd, 2012) and the Application No. 13/445,778 (submission on April 12nd, 2012).In these files, can find expounding adequately for the technology of determining the fetus mark.
Method described herein allows to the fetus mark in definite sample, and this sample comprises the mixture of fetus and parent nucleic acid, or more generally, is the mixture that derives from the nucleic acid of two different genes groups.The purpose of for this reason discussing will be described parent and fetal nucleic acid, it should be understood that, can therefore substitute any two genomes.In some embodiments, determine the fetus mark, determine simultaneously the existence of copy number variation (for example dysploidy) or do not exist.Such as hereinafter more abundant description, can adopt one group of label of specimen to determine fetus mark and copy number variation.
The method that quantizes the fetus mark is the difference that depends between Fetal genome and the maternal gene group.In some embodiment described herein, determine that the fetus mark of sample DNA depends on the known multiple dna sequence reading that holds the sequence site of one or more polymorphisms.In some embodiments, to sequence label each other and/or reference sequences find polymorphic site or target nucleic acid sequence when comparing.In certain embodiments, the fetus mark of sample DNA is to determine by the copy number information of considering concrete karyomit(e) or chromosome sequence, wherein has copy number difference between parent karyomit(e) and the fetal chromosomal.In this type of embodiment, the fetus mark of sample DNA is to determine by the sample DNA relative populations of considering mother and fetus, and wherein karyomit(e) or section are original just determines or knownly have a copy number variation.In this type of embodiment, the fetus mark can use the copy number variation between parent karyomit(e) and the fetal chromosomal to be calculated.For this purpose, the method and equipment can be calculated as follows literary composition described normalized karyomit(e) value (NCV), or similar module.
Some method is subject to the restriction of sex of foetus, and the method that for example be used for to quantize the fetus mark depends on to be had the existence of specific sequence or determine the karyomit(e) dosage of the X chromosome of male fetus Y chromosome.In certain embodiments, quantizing foetal DNA is for the fetus target, these fetus targets do not have the parent counterpart, such as the Y chromosome sequence (people such as model (Fan), Proceedings of the National Academy of Sciences (Proc Natl AcadSci) 105:16266-16271[2008] and U.S. Patent Application Publication No. 2010/0112590, on November 6th, 2009 submitted to, the people such as sieve (Lo)) or in the negative parent of RhD there is not the RHD1 gene, also or by at a plurality of DNA base pairs, be different from and the parent background.Additive method is independent of sex of foetus, and depends on the polymorphism difference between fetus and the maternal gene group.
Allelotrope imbalance in the polymorphism can detect and quantification by different technologies.In some embodiments, use digital pcr to determine that the allelotrope in the polymorphism is uneven, for example the SNP on the mRNA.Alternately, detect the difference of polymorphic area size with capillary gel electrophoresis, for example in the STR situation.
In some embodiments, can detect outer hereditary difference, for example promoter region is discrepant methylates, can be separately or and the digital pcr combination be used for determining the difference between Fetal genome and the maternal gene group and quantize fetus mark (child people such as (Tong), clinical chemistry (Clin Chem) 56:90-98[2010]).The modification that also comprises epigenetic methods is such as distinguish (people such as Ai Niqi (Erich), 204: the 205.e1 pages or leaves of AJOG are to 205.e11 page or leaf [2011]) based on methylated DNA.In some embodiments, use as in the order-checking of the polymorphic sequence of the group of the illustrated one or more pre-selected of the application's elsewhere, estimate the fetus mark.
Except as the illustrated method that the polymorphic sequence of many groups preliminary election is checked order of the application's elsewhere, the method that be used for to quantize the foetal DNA of Maternal plasma includes but not limited to real-time qPCR, mass spectrometry, digital pcr (comprising the microfluid digital pcr), capillary gel electrophoresis.
This section is discussed and to be begun to consider the fetus mark, has the karyomit(e) of copy number variation or one or more polymorphisms or other information of chromosome segment is determined such as never (or through determining not).The fetus mark of determining by this type of technology will be called non-CNV fetus mark or " NCNFF " at this.Part in this section back has been described multiple technologies, is used for from through determining that the karyomit(e) or the chromosome segment that have the copy number variation calculate the fetus mark.The fetus mark of determining from this type of technology will be called CNV fetus mark or " CNFF " at this.
In some embodiments, assess the fetus mark by the allelic Relative Contribution of polymorphism of determining to derive from Fetal genome and the allelic contribution of corresponding polymorphism that derives from the maternal gene group.In some embodiments, the fetus mark is assessed in the allelic total contribution of corresponding polymorphism that derives from Fetal genome and maternal gene group by the allelic Relative Contribution contrast of the polymorphism of determining to derive from Fetal genome.
Polymorphism can be tell-tale, informational (informative), or both.Indicative polymorphism shows and has fetus Cell-free DNA (" cfDNA ") in the maternal sample.Informedness polymorphism (for example informedness SNP) produces the information about fetus, for example, the existence of disease or do not exist, genetic abnormality or any other bioinformation, for example gestation stage or sex.In this case, the informedness polymorphism is those of sequence difference of identification mother and fetus, and for method disclosed here.In other words, the informedness polymorphism is the polymorphism that has in the not homotactic nucleic acid samples (that is, they have different allelotrope), and these sequences exist with different amounts.In this certain methods, use the sequence/allelotrope of different quantities to determine fetus mark, particularly NCNFF.
Polymorphic site includes but not limited to single nucleotide polymorphism (SNP), series connection SNP, the polybase base lacks or insert (IN-DELS or disappearance are inserted polymorphism (DIP)), polynucleotide polymorphism (MNP), short repeated fragment (STR), the restriction fragment length polymorphism (RFLP) of connecting on a small scale, or has any polymorphism of any other allelotrope sequence variations in the karyomit(e).In some embodiments, each target nucleic acid comprises two series connection SNP.Series connection SNP is analyzed as single unit (for example, as short haplotype), and provides as having a plurality of set of two SNP at this.
In some embodiments, the fetus mark is to determine by statistics and approximation technique, and these technology are assessed the Relative Contribution of joining type of fetus and maternal gene group by being used for determining the polymorphic site of Relative Contribution.Can also determine the fetus mark by electrophoretic method, wherein the polymorphic site of some type be separated with electrophoretic and be used for identification from the allelic Relative Contribution of the polymorphism of Fetal genome with from the allelic Relative Contribution of corresponding polymorphism of maternal gene group.
In an embodiment shown in Fig. 6 process flow sheet, the fetus mark is to determine by method 600, method 600 comprises at first the specimen of the mixture that obtains to comprise fetus and parent nucleic acid in operation 610, operating in 620 for polymorphic target nucleic acid enriched nucleic acid mixture, nucleic acid mixture to enrichment in operation 630 checks order, and determines simultaneously fetus mark and dysploidy in the sample in operation 640.
Fig. 7 shows the process flow sheet that is used for some embodiments.By following definite fetus mark: (i) in operation 710, obtain the Maternal plasma sample, (ii) cfDNA in the purification of samples in operation 720, (iii) the polymorphic nucleic acid of amplification in operation 730, (iv) in operation 740, use extensive parallel sequence measurement that mixture is checked order, and (v) in operation 760, calculate the fetus mark.In another embodiment, by following definite fetus mark: (i) in operation 710, obtain the Maternal plasma sample, (ii) cfDNA in the purification of samples in operation 720, (iii) the polymorphic nucleic acid of amplification in operation 730, (iv) in operation 750, use electrophoretic method according to apart nucleic acid, and (v) in operation 770, calculate the fetus mark.
In an embodiment shown in Fig. 8 process flow sheet, by following definite fetus mark: (i) in operation 810, obtain to comprise the sample of the mixture of fetus and parent nucleic acid, (ii) operating the sample that increases in 820, (iii) the operation 830 in by will the amplification sample and the not amplification sample of original mixture merge to come enriched sample, (iv) purification of samples in operation 840, (v) in operation 850, use different methods that sample is checked order to determine the fetus mark, in 860 operations, determine simultaneously the existence of fetus mark and dysploidy or do not exist.
In another embodiment shown in Fig. 9 process flow sheet, by following definite fetus mark: (i) in operation 910, obtain to comprise the sample of the mixture of fetus and parent nucleic acid, (ii) purification of samples in operation 920, (iii) part of amplification sample in operation 930, (iv) initial sample of sample and the original mixture by will amplification purified but part that do not increase makes up enriched sample in operation 940, (v) in operation 950 sample is checked order to determine the fetus mark, the use different methods is determined simultaneously the existence of fetus mark and dysploidy or is not existed in 960 operations.
In another embodiment shown in Figure 10 process flow sheet, by following definite fetus mark: (i) in operation 1010, obtain to comprise the sample of the mixture of fetus and parent nucleic acid, (ii) purification of samples in operation 1020, (iii) first part of amplification sample in operation 1040, (iv) sequencing library through the amplification part of preparation sample in operation 1050, (v) the purified but sequencing library of amplification part of preparation second of sample in operation 1030, (vi) operating in 1060 by two sequencing libraries are made up the enrichment mixture, (vii) in operation 1070 mixture is checked order, the use different methods is determined simultaneously the existence of fetus mark and dysploidy or is not existed in 1080 operations.
In another embodiment, by following definite fetus mark: (i) obtain to comprise the sample of the mixture of fetus and parent nucleic acid, (ii) purification of samples, (iii) use is through the primer amplification sample of mark, (iv) use electrophoretic method that sample is checked order, determine the fetus mark to use different methods.
In another embodiment, by following definite fetus mark: (i) obtain to comprise the sample of the mixture of fetus and parent nucleic acid, (ii) purification of samples, (iii) come randomly enriched sample by the part of amplification sample, (iv) to the sample order-checking, determine the fetus mark to use different methods.
The initial sample that obtains of purifying, through the sample of amplification or through amplification and the sample of enrichment or other nucleic acid samples relevant with method disclosed here (for example in operation 720,840,920 and 1020), can finish by any routine techniques.For from cell, separating cfDNA, can use fractional separation, centrifugal (for example density gradient centrifugation), DNA specificity precipitation or high-flux cell sorting and/or separation method.Randomly, fragmentation before the gained sample can or increase at purifying.If specimen in use comprises cfDNA, may not request so fragmentation because cfDNA is at fragmentation in nature, wherein the fragment size often for about 150bp to 200bp.
In more above-mentioned programs, use selective amplification and enrichment raising from the relative populations of the nucleic acid in the residing zone of polymorphism.Similar results can obtain by genomic selected areas (the particularly residing zone of polymorphism) is carried out deep order-checking.
Amplification
Obtain after sample and the purification of samples, use the part of the purified mixture of fetus and parent nucleic acid (for example cfDNA) a plurality of polymorphic target nucleic acids that increase, each nucleic acid Including Polymorphism site.Target nucleic acid in amplification fetus and the parent nucleic acid mixture, in some implementation, it is any method (including but not limited to asymmetric PCR, helicase dependent amplification, heat start PCR, qPCR, Solid phase PCR and the touchdown PCR) realization by the variation of using PCR (polymerase chain reaction) or the method.In some embodiments, sample can partly increase to assist to determine the fetus mark.In some embodiments, do not increase.In operation 730,820,930 and 1040, can use disclosed amplification method and other amplification techniques.
Amplification SNP
There is a large amount of nucleic acid primers to comprise the dna fragmentation of SNP for being used for increasing, and can obtains its sequence, for example from the database known to those of ordinary skills.Can also design other primer, for example use with the disclosed similar approach of Publication about Document: Vicks VapoRub E.F. (Vieux, E.F.), Guo P-Y (Kwok, P-Y) and Miller R.D. (Miller, R.D.), biotechnology (BioTechniques) (in June, 2002), the 32nd volume, supplementary issue: " SNP: the discovery of marker disease (SNPs:Discovery of Marker Disease) ", the 28th page to the 32nd page.
The Selective sequence Auele Specific Primer is with the amplification target nucleic acid.In one embodiment, such as the target nucleic acid in amplicon amplification Including Polymorphism site.In another embodiment, the target nucleic acid that comprises two or more polymorphic sites (for example two series connection SNP) such as amplicon amplification.The target nucleic acid amplicon through amplification at least about 100bp comprises single or series connection SNP.The primer that comprises the target sequence of the SNP that connects for amplification can be contained two SNP sites through design.
Amplification of STR
Some nucleic acid primers can comprise the dna fragmentation of STR for being used for increasing, and this type of sequence can obtain by a database known to the skilled from this area.
In some embodiments, use the part of fetus and parent nucleic acid mixture to have the template of the target nucleic acid of at least one STR as being used for amplification.Comprehensive catalogue about reference, argument and the sequence information of STR, disclosed PCR primer, common multiplicated system and relevant population data is compiled among the STRBase, and this STRBase can conduct interviews at the cstl.nist.gov/strbase place via the Internet.Come comfortable ncbi.nlm.nih.gov/genbank's , also be addressable for the sequence information of str locus seat commonly used by STRBase.
The STR multiplicated system allows to increase simultaneously a plurality of nonoverlapping locus in single reaction, thereby improves in fact flux.Because the polymorphism of STR is high, most of individuality is heterozygous.STR can be used in the electrophoretic analysis as described further below.
Can also use miniSTRs to increase to produce the amplicon of size reduction, thereby distinguish STR allelotrope shorter on length.The method of disclosed embodiment contains the fetal nucleic acid mark in the maternal sample of determining enrichment target nucleic acid, each self-contained miniSTR of target nucleic acid, the method comprises at least one fetus and parent allelotrope that quantizes to be positioned at a polymorphism miniSTR, and it can increase to produce the amplicon that length is about the size of circulation foetal DNA fragment.Arbitrary to miniSTR primer or two pairs or more combinations to the miniSTR primer at least one miniSTR that can be used for increasing.
Enrichment
The sample of enrichment can comprise in addition: the separating plasma part of blood sample; The sample of the purified cfDNA that from blood plasma, extracts; Sequencing library sample from the purified mixture preparation of fetus and parent nucleic acid; Etc..
In certain embodiments, before to genome sequencing, comprise the sample of dna molecular mixture for full genome unspecific enrichment, that is, before order-checking, carry out whole genome amplification.The unspecific enrichment nucleic acid mixture refers to that the genomic DNA fragment to the DNA sample carries out this DNA sample of whole genome amplification and is used in by improving the level of sample DNA before the order-checking identification polymorphism.Unspecific enrichment can be the selective enrichment of one of two genomes (fetus and parent) of existing in the sample.
In other embodiments, the cfDNA in the sample is through specific enrichment.Specific enrichment refers to the genome sample for the enrichment of particular sequence (for example polymorphism target sequence), and it is finished by the method that comprises the specific amplification target nucleic acid sequence, target nucleic acid sequence Including Polymorphism site.
In other embodiments, the nucleic acid mixture that is present in the sample is for the separately in addition enrichment of polymorphic target nucleic acid in Including Polymorphism site.In operation 620, can use this type of enrichment.The mixture of enrichment fetus and parent nucleic acid comprises, the target sequence that increases the part of the nucleic acid that comprises from initial maternal sample, and will part or the remainder combination of whole amplified production and initial maternal sample, for example in operation 830 and 940.
In another embodiment, in addition the sample of enrichment is the sequencing library sample by the purified mixture preparation of fetus and parent nucleic acid.Selection is enough to be used in determining the sequence information of fetus mark with acquisition for the amount of the amplified production of enrichment initial sample.The sum of the sequence label that obtains from order-checking at least about 3%, at least about 5%, at least about 7%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30% or how mapped to determine the fetus mark.
In one embodiment, in Figure 10, enrichment is included in the operation 1040 the target nucleic acid amplification that comprises in the part with the initial sample of the purified mixture of fetus and parent nucleic acid (for example, the cfDNA of purifying from the Maternal plasma sample).Similarly, in operation 1050, use a part purified but the not cfDNA of amplification to prepare elementary sequencing library.In operation 1060, be combined in the elementary library that the part in target library and nucleic acid mixture by not amplification produce, and the fetus and the parent nucleic acid mixture that comprise in operating in 1070 two libraries check order.The library of enrichment can comprise the target library at least about 5%, at least about 10%, at least about 15%, at least about 20% or at least about 25%.In operation 1080, to the data analysis from the order-checking round, and as the operation 640 of the embodiment described of Fig. 6 described in, the simultaneously existence of definite fetus mark and dysploidy or do not exist.
Sequencing technologies
Fetus and parent nucleic acid mixture to enrichment check order.For determining that the necessary sequence information of fetus mark can use any known dna sequencing method to obtain, wherein a lot of methods are in the application's elsewhere explanation.This type of sequence measurement comprises sequencing of future generation (NGS), Sang Geer sequencing (Sangersequencing), nautical mile single-molecule sequencing method that Cohan is real (Helicos True Single MoleculeSequencing) (tSMSTM), 454 sequencing (Roche), SOLiD technology (applying biological system), unit molecule is (SMRTTM) in real time, sequencing technologies (Pacific Ocean bio-science), the nanoporous sequencing, chemosensitivity field-effect transistor (chemFET) array, use the Hall health molecule process (Halcyon Molecular ' s method) of transmission electron microscopy (TEM), ionic current single-molecule sequencing method, Sequencing by hybridization etc.In certain embodiments, adopt extensive parallel sequencing.In one embodiment, use Yi Lumina synthesis method order-checking and based on the order-checking chemical technology of reversible terminator.In certain embodiments, use the part sequencing.
The DNA that checks order is mapped to reference to genome.Can be the artificial gene group or can be human canonical sequence genome with reference to genome.This type of comprises with reference to genome: the made Target sequence gene group of Including Polymorphism target nucleic acid sequence; Artificial SNP is with reference to genome; Artificial STR is with reference to genome; Artificial series connection STR is with reference to genome; Human canonical sequence genome NCBI36/hg18 sequence, is it at Internet genome.ucsc.edu/cgi-bin/hgGateway? org=Human﹠amp; Db=hg18﹠amp; Hgsid=166260105 can obtain; And the human canonical sequence genome NCBI36/hg18 sequence and made Target sequence gene group, for example the SNP genome that comprise the polymorphic sequence of target.In mapping process, allow to exist some mispairing.
In one embodiment, the order-checking information that obtains in operation 630 is analyzed and made simultaneously definite, determine the existence of fetus mark and definite dysploidy or do not exist.
Illustrated as mentioned, every kind of sample obtains a plurality of sequence labels.In certain embodiments, utilize reading to be mapped to reference to genome, every kind of sample obtains at least about 3x 10 6Individual sequence label, at least about 5x 10 6Individual sequence label, at least about 8x 10 6Individual sequence label, at least about 10x 10 6Individual sequence label, at least about 15x 10 6Individual sequence label, at least about 20x 10 6Individual sequence label, at least about 30x 10 6Individual sequence label, at least about 40x 10 6Individual sequence label or at least about 50x 10 6Individual sequence label, these sequence labels comprise the reading between 20bp and the 40bp.In one embodiment, all sequences reading is mapped to reference to genomic All Ranges.In one embodiment, the label that comprises the reading that is mapped to the genomic All Ranges of human canonical sequence (for example all karyomit(e)s) is counted, and in the DNA sample that mixes, determine the fetus dysploidy, namely, the excessive representative of interested sequence (for example karyomit(e) or its part) or represent not enoughly and counts to determine the fetus mark to the label that comprises the reading that is mapped to made Target sequence gene group.The method does not require between maternal gene group and Fetal genome makes differentiation.
In one embodiment, to from the data analysis of order-checking round and determine simultaneously the fetus mark, and there is or do not exist dysploidy.
Sequencing library
In some embodiments, use part or all next sequencing library for the preparation of checking order with described parallel mode of the polymorphic sequence that increases.In one embodiment, the preparation library is in order to use Yi Lumina to carry out the synthesis method order-checking based on the order-checking chemical technology of reversible terminator.Can prepare the library from the cfDNA of purifying and comprise at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45% or at least about 50% amplified production.
Checked order in the library that any method of describing by Figure 11 produces, the sequence label of the target nucleic acid that derives from amplification is provided and has derived from the initial not label of the maternal sample of amplification.The fetus mark is to calculate from being mapped to the genomic number of tags of artificial reference.
Calculate the fetus mark
As explained above, after relative dna checked order, can utilize method of calculation with sequence mapping or comparison on concrete gene, karyomit(e), allelotrope or other structures.There is multiple computerized algorithm for aligned sequences, includes but not limited to BLAST (people such as Ao Ciqiu (Altschul), 1990), BLITZ (MPsrch) (Si Teluoke and Collins (Sturrock; Collins), 1993), FASTA (inferior and Lippmann (the Pearson ﹠amp of pul; Lipman), 1988), the BOWTIE (people such as youth's lattice rice (Langmead), genome biology (Genome Biology) 10:R25.1-R25.10[2009]) or ELAND (Yi Lumina company, San Diego, CA, the U.S. (Illumina, Inc., San Diego, CA, USA)).In some embodiments, the data box sequence is found in the known nucleic acid database of those skilled in the art, comprises GenBank, dbEST, dbSTS, EMBL (European Molecular Bioglogy Laboratory) and DDBJ (Japanese DNA database).The sequence that can utilize BLAST or similar tool control sequence database search to identify, and can utilize search hit that the sequence of identifying is categorized into suitable data box.Alternately, can adopt Bloom filter (Bloom filter) or similar set member's tstr (set membershiptester) with reading and with reference to genome alignment.Referring to the Application No. 61/552,374 of submitting on October 27th, 2011, this application is combined in this in full by reference with it.
As mentioned, determine according to some embodiments (particularly NCNFF technology) that the fetus mark is based on and be mapped to the first allelic total number of labels and be mapped to the second allelic sum, the second allelotrope is positioned at the informedness polymorphic site (for example SNP) that comprises with reference to genome.The informedness polymorphic site may allelic quantity be identified by the difference of allelotrope sequence and each.Fetus cfDNA often exists with the concentration of<10% parent cfDNA.Therefore, with respect to the allelic main contributions of parent, existence can be distributed to allelic minor contributions fetus, fetus and parent nucleic acid mixture.Derive from the allelotrope of maternal gene group referred to here as main allelotrope, and the allelotrope that derives from Fetal genome is referred to here as inferior equipotential gene.The allelotrope that represents with the similar level of the sequence label that shines upon represents parent allelotrope.The target nucleic acid that comprises the SNP that derives from the Maternal plasma sample is carried out the results are shown among Figure 12 of exemplary multiplex amplification.
Here, term " karyomit(e) dysploidy " and " complete karyomit(e) dysploidy " refer to by loss or obtain whole karyomit(e) and the imbalance of the genetic material that causes at this, and comprise kind being dysploidy and mosaic dysploidy.Term " part dysploidy " and " chromosome dyad dysploidy " refer to by losing or (for example obtaining a chromosomal part at this, partial monosomy and partial trisomy) and the imbalance of the genetic material that causes, and contain the imbalance that is caused by transposition, deletion and insertion.
Use allelotrope Ratio Estimation fetus mark
For in two allelotrope at predetermined polymorphic site place each, the relative abundance of fetus cfDNA in maternal sample can be determined, as the parameter that is mapped to reference to the sum of the unique sequences label of the target nucleic acid sequence on the genome.In one embodiment, for the mark of fetal nucleic acid in the following calculating fetus of each informedness allelotrope (allelotrope x) and the parent nucleic acid mixture:
Figure BDA00002366924901811
Equation 1
And calculate the fetus mark for sample, as the allelic fetus mark of all information mean value.Randomly, for each informedness allelotrope (allelotrope x), the mark of fetal nucleic acid in following calculating fetus and the parent nucleic acid mixture:
Figure BDA00002366924901812
Equation 2
In order to compensate two allelic existence of fetus, one is covered by the parent background.
By predetermined polymorphic sequence is checked order to determine the fetus mark
About by predetermined polymorphic sequence being checked order to determine the more details of fetus mark provide as follows.
Referring to Fig. 7, operation 720,730,740 and 760 is showed by the polymorphic target nucleic acid through pcr amplification is carried out the technical process that the mark of a fetal nucleic acid in the parent biological sample is determined in extensive parallel order-checking.In step 720, comprise the maternal sample of the mixture of fetus and parent nucleic acid from experimenter's acquisition.This sample is the maternal sample that obtains from a conceived women (for example pregnant woman).Other maternal samples can come from Mammals, for example cow, horse, dog or cat.If the experimenter is human, sample can obtain at first or second trimenon of gestation so.Any parent biological sample can be as being included in the cell or the source of acellular fetus and parent nucleic acid.In certain embodiments, advantageously obtain to comprise the maternal sample of acellular nucleic acid (cfDNA).Preferably, this parent biological sample is the biological fluid sample.Preferably, this maternal sample is the pregnant woman's sample that is selected from blood, blood plasma, serum, urine and saliva.In certain embodiments, this maternal sample is plasma sample.
In step 720, the mixture of fetus and parent nucleic acid comprises the sample of the purified mixture of fetus and parent nucleic acid (for example cfDNA) from further processing such as the samples such as blood plasma part with acquisition.Method for the treatment of maternal sample is described at this paper elsewhere.
In step 730, the part of the purified mixture of fetus and parent cfDNA is used for increasing a plurality of polymorphic target nucleic acids, and each polymorphic target nucleic acid comprises a polymorphic site.In certain embodiments, these target nucleic acids comprise SNP separately.In other embodiments, each self-contained pair of series SNP of these target nucleic acids.In other other embodiments, each target nucleic acid comprises STR.The polymorphic site that comprises in the target nucleic acid includes, without being limited to single nucleotide polymorphism (SNP), series connection SNP, the polybase base lacks or inserts and (is called IN-DELS on a small scale, be also referred to as disappearance and insert polymorphism or DIP), polynucleotide polymorphism (MNP), short series connection repeated fragment (STR), restriction fragment length polymorphism (RFLP), or comprise the polymorphism of any other sequence variation in the karyomit(e).In certain embodiments, the polymorphic site that the method contains is positioned on the euchromosome, can determine thus the fetus mark irrelevant with sex of foetus.With except karyomit(e) 13,18,21 and Y the polymorphism that is associated of karyomit(e) also can be used for method described here.
Polymorphism can be tell-tale, informational, or both.Indicative polymorphism shows and has the fetus Cell-free DNA in the maternal sample.For instance, (for example SNP) is more for concrete genetic sequence, and a kind of method is just easier to be changed into concrete colouring intensity, color density with its existence and maybe can detect and can measure and show the existence of concrete DNA section and/or concrete polymorphism (for example embryo's SNP), some other character that does not exist and measure.About the present invention, these methods are not to use all possible SNP in the genome to carry out, but use the previously selected polymorphism (being the informedness polymorphism) that probably identifies the sequence difference between mother and the fetus to carry out.The informedness polymorphic site by allelic sequence difference and each the amount in the possible allelotrope identify.Any polymorphic site of containing by the reading of sequence measurement generation described here may be used to determine the fetus mark.
Use the part of fetus and parent nucleic acid (for example cfDNA) mixture in the sample to be used as the template that the target nucleic acid that comprises at least one SNP is increased.In certain embodiments, each target nucleic acid comprises single (namely one) SNP.The target nucleic acid sequence that comprises SNP can obtain from the database that can openly access, these databases include but not limited to that Web address is the human snp database of wi.mit.edu, Web address is the NCBI dbSNP homepage of ncbi.nlm.nih.gov, Web address lifesciences.perkinelmer.com, Web address is the applying biological system (Applied Biosystems) of the LifeTechnologiesTM (Carlsbad, California city (Carlsbad, CA)) of appliedbiosystems.com, Web address is the human snp database of the Celera of celera.com, Web address is the snp database of the genome analysis group (GAN) of gan.iarc.fr.In one embodiment, selection is used for the SNP of enrichment fetus and parent cfDNA and is selected from the people such as Parkes (the Pakstis) (people such as Parkes, human genetics (Hum Genet) 127:315-324[2010]) group of 92 individual recognition SNP (IISNP) of describing, these SNP have shown that spreading all over colony has very little variation (F in frequency St<0.06) and in the whole world be to have elevation information, average heterozygosity 〉=0.4.The SNP that the inventive method contains comprises the SNP that connects and be not connected connection.Other available SNP that can use or be applicable to method described here are disclosed in the Application No. 20080070792,20090280492,20080113358,20080026390,20080050739,20080220422 and 20080138809, and these patent applications are incorporated into this in full with it by reference.Each target nucleic acid comprises at least one polymorphic site, single SNP for example, this polymorphic site is different from the polymorphic site that exists at another target nucleic acid, thereby produce one group of polymorphic site of the polymorphic site that contains enough numbers, SNP for example, wherein at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 25, at least 30, at least 40 or more be informational.For instance, one group of SNP can be configured to comprise at least one informedness SNP.In one embodiment, target is that the SNP that increases is selected from rs560681, rs1109037, rs9866013, rs13182883, rs13218440, rs7041158, rs740598, rs10773760, rs4530059, rs7205345, rs8078417, rs576261, rs2567608, rs430046, rs9951171, rs338882, rs10776839, rs9905977, rs1277284, rs258684, rs1347696, rs508485, rs9788670, rs8137254, rs3143, rs2182957, rs3739005 and rs530022.In one embodiment, this group SNP comprises at least 3, at least 5, at least 10, at least 13, at least 15, at least 20, at least 25, at least 30 or more SNP.In one embodiment, this group SNP comprises rs560681, rs1109037, rs9866013, rs13182883, rs13218440, rs7041158, rs740598, rs10773760, rs4530059, rs7205345, rs8078417, rs576261 and rs2567608.The polymorphic nucleic acid that comprises SNP can be with the exemplary primer that provides and be disclosed as SEQ ID NOs:63-118 in example 24 to increasing.
In other embodiments, each target nucleic acid comprises two or more SNP, and namely each target nucleic acid comprises series connection SNP.Preferably, each target nucleic acid comprises two series connection SNP.Series connection SNP is analyzed as single unit (for example, as short haplotype), and provides as having a plurality of set of two SNP at this.For identifying suitable series connection SNP sequence, can search for international HapMap group (International HapMap Consortium) database (international HapMap plan (The International HapMap Project), nature (Nature) 426:789-796[2003]).This database can obtain at the hapmap.org place on the World Wide Web.In one embodiment, target is to be selected from the right following set of series connection SNP: rs7277033-rs2110153 for the series connection SNP that increases; Rs2822654-rs1882882; Rs368657-rs376635; Rs2822731-rs2822732; Rs1475881-rs7275487; Rs1735976-rs2827016; Rs447340-rs2824097; Rs418989-rs13047336; Rs987980-rs987981; Rs4143392-rs4143391; Rs1691324-rs13050434; Rs11909758-rs9980111; Rs2826842-rs232414; Rs1980969-rs1980970; Rs9978999-rs9979175; Rs1034346-rs12481852; Rs7509629-rs2828358; Rs4817013-rs7277036; Rs9981121-rs2829696; Rs455921-rs2898102; Rs2898102-rs458848; Rs961301-rs2830208; Rs2174536-rs458076; Rs11088023-rs11088024; Rs1011734-rs1011733; Rs2831244-rs9789838; Rs8132769-rs2831440; Rs8134080-rs2831524; Rs4817219-rs4817220; Rs2250911-rs2250997; Rs2831899-rs2831900; Rs2831902-rs2831903; Rs11088086-rs2251447; Rs2832040-rs11088088; Rs2832141-rs2246777; Rs2832959-rs9980934; Rs2833734-rs2833735; Rs933121-rs933122; Rs2834140-rs12626953; Rs2834485-rs3453; Rs9974986-rs2834703; Rs2776266-rs2835001; Rs1984014-rs1984015; Rs7281674-rs2835316; Rs13047304-rs13047322; Rs2835545-rs4816551; Rs2835735-rs2835736; Rs13047608-rs2835826; Rs2836550-rs2212596; Rs2836660-rs2836661; Rs465612-rs8131220; Rs9980072-rs8130031; Rs418359-rs2836926; Rs7278447-rs7278858; Rs385787-rs367001; Rs367001-rs386095; Rs2837296-rs2837297; And rs2837381-rs4816672.
In one embodiment, use the part of fetus and parent nucleic acid (for example cfDNA) mixture in the sample as the template that is used for the target nucleic acid that comprises at least one STR is increased.In certain embodiments, each target nucleic acid comprises single (namely one) SNP.The str locus seat almost can find and can use multiple polymerase chain reaction (PCR) primer to increase on each karyomit(e) in genome.The tetranucleotide repeat fragment is preferred owing to the fidelity of reproduction in pcr amplification in the forensic science family belongings, but also uses some trinucleotide and pentanucleotide repeated fragment.The particular sheet editor of reference, the fact and the sequence information of relevant STR, disclosed PCR primer, multiplicated system commonly used and Reference Group's data is in STRBase, and STRBase can be by World Wide Web ibm4.carb.nist.gov:8800/dna/home.htm access.From
Figure BDA00002366924901861
The sequence information about str locus seat commonly used of (http://www2.ncbi.nlm.nih.gov/cgi-bin/genbank) also can obtain by STRBase.The commercial reagents box that can be used for analyzing the str locus seat provides reactive component and the needed contrast of amplification of whole necessity usually.The STR multiplicated system allows to increase simultaneously a plurality of nonoverlapping locus in single reaction, this has substantially increased throughput.The use multicolor fluorescence detects, even overlapping locus also can multiplely carry out.The polymorphism that spreads all over the series connection reiterated DNA sequences that human genome blazons makes these sequences become important genetic marker, is used for assignment of genes gene mapping research, linking parsing and human Recognition test.Because the polymorphism of STR is high, so most of individuality will be heterozygous, that is, most people has two allelotrope (version)---one by each parental generation heredity---, and each has different repetition number.The PCR product that comprises STR can come separation and detection with artificial, semi-automation or automatic mode.Automanual system is based on gel, and with the synthetic unit of electrophoresis, detection and analysis bank.In the half auto system, gel assembling and sample load and remain artificial process; Yet in case sample is carried on the gel, electrophoresis, detection and analysis will be carried out automatically.When the migration of fluorescently-labeled fragment by the detector of fixed point and can be along with collecting them when observing them, " in real time " carries out data gathering.As its name suggests, capillary electrophoresis is to carry out in microcapillary but not between sheet glass.In case sample, gelatin polymer and damping fluid are loaded on the instrument, then kapillary is full of gelatin polymer and automatic load sample.Therefore, the fetus STR sequence of non-maternal inheritance will be different from the parent sequence on repetition number.These STR sequences that increase can produce one or both main amplified productions corresponding with parent allelotrope (with the fetus allelotrope of maternal inheritance), with a kind of secondary product corresponding with the fetus allelotrope of non-maternal inheritance.This technology was reported (people such as Pu'er (Pertl) first in 2000, human genetics (Human Genetics) 106:45-49[2002]) and used subsequently PCR in real time to identify simultaneously multiple different STR zone and developed (people such as Liu, Acta Obset GynScand 86:535-541[2007]).Distinguished the corresponding size distribution of circulation fetus and mother body D NA material with pcr amplification of various size, and showed the fetal DNA in maternal plasma dna molecular usually than mother body D NA molecule short (people such as Chan, clinical chemistry (Clin Chem) 50:8892[2004].The size fractional separation of circulation foetal DNA is verified, the mean length<300bp of circulation foetal DNA fragment, and estimate mother body D NA (people such as Li, clinical chemistry, 50:1002-1011[2004]) between about 0.5Kb and 1Kb.The invention provides a kind of method for determine the fetal nucleic acid mark at a maternal sample, the method comprises at least one fetus and allelic copy number of parent of determining to be positioned at a polymorphic miniSTR site, and miniSTR can approximately be the amplicon of the size (for example less than about 250 base pairs) of circulation foetal DNA fragment to produce length through amplification.In one embodiment, the fetus mark can determine that each target nucleic acid comprises a miniSTR by a kind of method that at least a portion of passing through the polymorphic target nucleic acid that increases is checked order that comprises.The fetus that is positioned at informedness STR site and parent allelotrope is by its different length, that is, repetition number distinguishes, and the ratio percentage of the allelic amount of fetus parent that the fetus mark can be by being positioned at this site recently calculates.The method can be determined with the combination of the informedness miniSTR of an informedness miniSTR or any number the mark of fetal nucleic acid.In one embodiment, the method comprises at least one fetus and the allelic copy number of at least one parent of determining to be positioned at least a polymorphic miniSTR, this miniSTR through amplification to produce less than about 300bp, less than about 250bp, less than about 200bp, less than about 150bp, less than about 100bp or less than the amplicon of about 50bp.In another embodiment, the amplicon that produces by miniSTR is increased is less than about 300bp.In another embodiment, the amplicon that produces by miniSTR is increased is less than about 250bp.In another embodiment, the amplicon that produces by miniSTR is increased is less than about 200bp.The allelic amplification of informedness comprises uses miniSTR primer, these primers can increase the amplicon of size reduction to detect less than about 500bp, less than about 450bp, less than about 400bp, less than about 350bp, less than about 300 base pairs (bp), less than about 250bp, less than about 200bp, less than about 150bp, less than about 100bp or less than the STR allelotrope of about 50bp.Use the amplicon of the size reduction of miniSTR primer generation to be called as miniSTR, the marker title identification that these miniSTR bases are corresponding with the locus that they have shone upon.In one embodiment, the miniSTR primer comprises for all 13 CODIS str locus seats of finding in commercially available STR test kit, except D2S 1338, outside Penta D and the pentaE, allowed the amplicon size miniSTR primer of the size reduction (people such as Nicholas Murray Butler (Butler) farthest, Journal of Forensic Sciences (J ForensicSci) 48:1054-1064[2003]), other miniSTR of being connected the miniSTR locus that is not connected with the CODIS marker (Cusparia and Nicholas Murray Butler, Journal of Forensic Sciences 50:43-53[2005]) with Nicholas Murray Butler and having characterized at NIST such as Cusparia (Coble).The information of the relevant miniSTR that characterizes at NIST can obtain via World Wide Web cstl.nist.gov/biotech/strbase/newSTRs.htm.Arbitrary to miniSTR primer or two pairs or more combinations to the miniSTR primer at least one miniSTR that can be used for increasing.
Target nucleic acid in amplification fetus and parent nucleic acid (for example cfDNA) mixture be by use PCR or as any method of the variation described at the application's elsewhere realize.These target sequences that increase are primers of using each target nucleic acid sequence comprising polymorphic site (for example SNP) of can increase in multi-PRC reaction to realization.Multi-PRC reaction comprises at least 2, at least three, at least 3, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 40 or more primer set is combined in the same reaction, to quantize to comprise the target nucleic acid through amplification of at least two, at least three, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 40 or more polymorphic sites in same sequencing reaction.Any group of primer set polymorphic sequence of at least one informedness that can be configured to increase.
Primer is designed to be included in the length of the reading that produces by sequenator to guarantee this SNP site with a sequence hybridization near the SNP site on the cfDNA.As providing in the example, at least one that is used in two primers of primer set of any polymorphic site of identification hybridized in enough modes near polymorphic site, so that this polymorphic site is encompassed in by carrying out at Yi Lumina analyser GII in the 36bp reading that extensive parallel order-checking produces, and produce length is enough to carry out the bridge-type amplification during cluster forms amplicon.Therefore, primer is designed to produce at least amplicon of 110bp, these amplicons with General adaptive the ((IlluminaInc. of San Diego, CA city Yi Lumina company that is used for the cluster amplification, San Diego, CA)) when combination produce at least dna molecular of 200bp.The SNP that provides in table 33 is for 13 target sequences that increase simultaneously at a multiple check.It is an exemplary SNP group that group is provided in table 33.Can adopt still less or more SNP comes for polymorphic target nucleic acid enrichment fetus and mother body D NA.Operable extra SNP is included in the SNP that provides in the table 34.SNP allelotrope is showed with runic and is underlined.Can be used for the method according to this invention and determine that other SNP of fetus mark comprise rs315791, rs3780962, rs1410059, rs279844, rs38882, rs9951171, rs214955, rs6444724, rs2503107, rs1019029, rs1413212, rs1031825, rs891700, rs1005533, rs2831700, rs354439, rs1979255, rs1454361, rs8037429 and rs1490413.These SNP analyze for definite fetus mark by TaqMan PCR, and are disclosed among the U.S. Patent Application Publication 2010-0010085.
Forward or backwards primer in each primer set enough hybridizes to be included in by the previously selected polymorphic nucleic acid through amplification being carried out in the sequence reading that described extensive parallel order-checking produces near the dna sequence dna of described polymorphic site with one.The length of sequence reading is relevant with concrete sequencing technologies.The sequence reading that extensive parallel sequence measurement provides size to change to hundreds of base pairs from tens base pairs.At least one primer in each primer set is designed to be identified in 20bp, about 25bp, about 30bp, about 35bp, about 40bp, about 45bp, about 50bp, about 55bp, about 60bp, about 65bp, about 70bp, about 75bp, about 80bp, about 85bp, about 90bp, about 95bp, about 100bp, about 110bp, about 120bp, about 130bp, about 140bp, about 150bp, about 200bp, about 250bp, about 300bp, about 350bp, about 400bp, a polymorphic site that exists in the sequence reading of about 450bp or about 500bp.In certain embodiments, at least one primer in each described primer set is designed to be identified in a polymorphic site that exists in the sequence reading of about 25bp, about 40bp, about 50bp or about 100bp.
Circulation Cell-free DNA pact<300bp.Therefore, primer set is designed on average increase up to about the polymorphic sequence hybridization of 300bp and to it with length, and wherein foetal DNA length on average is about 170bp.In certain embodiments, primer set and DNA hybridization produce the amplicon up to about 300bp.In other embodiments, the hybridization of primer set and described dna sequence dna produces at least about 100bp, at least about 150bp, at least about the amplicon of 200bp.Primer set can be hybridized with the dna sequence dna hybridization that exists at the phase homologous chromosomes or with the dna sequence dna that exists at the coloured differently body.For instance, one or more primer sets can with the sequence hybridization that exists at the phase homologous chromosomes.Alternately, two or more primer sets and the sequence hybridization that exists at the coloured differently body.In one embodiment, primer increases to the polymorphic sequence of the one or more existence in karyomit(e) 1 to 22.In certain embodiments, primer set is not hybridized with the dna sequence dna that exists at karyomit(e) 13,18,21, X or Y.
In step 740 (Fig. 7), use part or all next sequencing library for the preparation of checking order with described parallel mode of the polymorphic sequence that increases.In one embodiment, the preparation library is in order to use the order-checking chemical technology synthesis method based on reversible terminator of Yi Lumina to check order.
In step 740, determine that the needed sequence information of fetus mark obtains with any known dna sequencing method.Preferably, method described here adopts sequencing technologies of future generation (NGS) to provide as at the described isarithmic sequence label of the application's elsewhere.Order-checking can be the extensive parallel order-checking of synthesis method.Preferably, reversible dyestuff terminator is used in the extensive parallel order-checking of synthesis method.Alternately, extensive parallel order-checking can be the connection method order-checking, or single-molecule sequencing.
The polymorphic nucleic acid of the target that increases is carried out part order-checking, and to the reading that comprises predetermined length (for example 36bp), be mapped to the genomic sequence label of known reference and count.The sequence reading of only comparing with reference genome uniqueness is counted as sequence label.In one embodiment, with reference to genome be the made Target sequence gene group of Including Polymorphism target nucleic acid (SNP) sequence.In one embodiment, be that artificial SNP is with reference to genome with reference to genome.In another embodiment, be that artificial STR is with reference to genome with reference to genome.In another embodiment, be manually to connect STR with reference to genome with reference to genome.The artificial reference genome can use the polymorphic nucleotide sequence editor of target.The artificial reference genome can comprise that each comprises the polymorphic target sequence of one or more dissimilar polymorphic sequences.For instance, the artificial reference genome can comprise the polymorphic sequence that comprises SNP allelotrope and/or STR.In one embodiment, be human reference sequences genome NCBI36/hg18 sequence with reference to genome, is it at World Wide Web genome.ucsc.edu/cgi-bin/hgGateway? org=Human﹠amp; Db=hg18﹠amp; Hgsid=166260105 can obtain.Other disclosed sequence information sources comprise GenBank, dbEST, dbSTS, EMBL (European Molecular Bioglogy Laboratory (European Molecular Biology Laboratory)) and DDBJ (Japanese DNA database).In another embodiment, comprise that with reference to genome the mankind are with reference to genome NCBI36/hg18 sequence and the made Target sequence gene group that comprises the polymorphic sequence of target, for example SNP genome.Compare to determine that with the genomic sequence of reference nucleic acid (for example cfDNA) the karyomit(e) starting point of molecule that checks order can realize the mapping of sequence label by the sequence with map tags, and do not need concrete genetic sequence information.Multiple computerized algorithm can be used for aligned sequences, includes, without being limited to BLAST (people such as Ao Ciqiu (Altschul), 1990), BLITZ (MPsrch) (Si Teluoke and Collins (Sturrock; Collins), 1993), FASTA (inferior and Lippmann (the Pearson ﹠amp of pul; Lipman), 1988), the BOWTIE (people such as youth's lattice rice (Langmead), genome biology (Genome Biology) 10:R25.1-R25.10[2009]) or the ELAND ((Illumina of San Diego, CA, USA city Yi Lumina company, Inc., San Diego, CA, USA)).In one embodiment, one end of the copy that increases in clone's mode of blood plasma cfDNA molecule is checked order and processed by the information biology compare of analysis of Yi Lumina genome analysis instrument, and Yi Lumina genome analysis instrument uses the extensive efficient comparison of RiboaptDB (ELAND) software to carry out.In the embodiment that comprises the method for using the definite existence of NGS sequence measurement or not having dysploidy and fetus mark, for determining that analysis that dysploidy is carried out order-checking information can allow the mispairing (0 to 2 mispairing of each sequence label) of less degree, with the small polymorphism that may exist between the genome in explanation reference genome and the biased sample.For determining that analysis that the fetus mark carries out order-checking information can allow the mispairing of less degree, this depends on polymorphic sequence.For instance, if polymorphic sequence is STR, can allow so the mispairing of less degree.Be in the situation of SNP in polymorphic sequence, at first to counting and filter out from residual readings with all sequences of any exact matching of two allelotrope that are arranged in the SNP site, for residual readings, can allow the mispairing of less degree.Can be as described in this, perhaps use to adopt the median of interested chromosomal sequence label with respect to the median normalization method of the label of each in other euchromosomes (people such as model (Fan), institute of NAS periodical (Proc Natl Acad Sci) 105:16266-16271[2008]) or the number of unique reading of relatively comparing with each karyomit(e) and the reading sum of comparing with all karyomit(e)s to draw the substitution analysis of each chromosomal genomic expression per-cent, determine that the quantification of number of the sequence reading of comparing with each karyomit(e) is to determine chromosomal aneuploidy.Produce " z mark " to represent that interested chromosomal genomic expression per-cent and the difference of phase homologous chromosomes between the average expression per-cent between the euploid control group are divided by standard deviation (people such as Zhao (Chiu), clinical chemistry (ClinChem) 56:459-463[2010]).In another embodiment, order-checking information can be to determine that this application is incorporated into this in full with it by reference described in the U.S. Provisional Patent Application case 32047-768.101 of " normalized biological test " such as the title of on January 19th, 2010 application.
For determining that analysis that the fetus mark carries out order-checking information can allow the mispairing of less degree, this depends on polymorphic sequence.For instance, if polymorphic sequence is STR, can allow so the mispairing of less degree.Be in the situation of SNP in polymorphic sequence, at first to counting and filter out from residual readings with all sequences of any exact matching of two allelotrope that are arranged in the SNP site, for residual readings, can allow the mispairing of less degree.By nucleic acid being checked order determine that the inventive method of fetus mark can be used in combination with additive method.
In step 760, the fetus mark is based on reference to the sum that is mapped to the first allelic label on the informedness polymorphic site that comprises in the genome (for example SNP) and the sum that is mapped to the second allelic label to be determined.For instance, be to have contained to comprise SNP rs560681 with reference to genome, rs1109037, rs9866013, rs13182883, rs13218440, rs7041158, rs740598, rs10773760, rs4530059, rs7205345, rs8078417, rs576261, rs2567608, rs430046, rs9951171, rs338882, rs10776839, rs9905977, rs1277284, rs258684, rs1347696, rs508485, rs9788670, rs8137254, rs3143, rs2182957, the made Target sequence gene group of the polymorphic sequence of rs3739005 and rs530022.In one embodiment, the artificial reference genome comprises the polymorphic target sequence (referring to example 24) of SEQ IDNO:7 to 62.
In another embodiment, the artificial gene group is the made Target sequence gene group that has contained the polymorphic sequence that comprises the SNP that connects.In another embodiment, the made Target genome has been contained the polymorphic sequence that comprises STR.The composition of made Target sequence gene group will be looked for the polymorphic sequence of determining the fetus mark and change.Therefore, the SNP that is not limited at this illustration of made Target sequence gene group, series connection SNP or STR sequence.
Informedness polymorphic site (for example SNP) by allelic sequence difference and each the amount in the possible allelotrope identify.Fetus cfDNA exists with the concentration that is lower than parent cfDNA 10%.Therefore, with respect to the allelic main contributions of parent, existence can be distributed to the allelotrope of fetus to the minor contributions of fetus and parent nucleic acid mixture.Derive from the allelotrope of maternal gene group referred to here as main allelotrope, and the allelotrope that derives from Fetal genome is referred to here as inferior equipotential gene.The allelotrope that represents with the similar level of the sequence label that shines upon represents parent allelotrope.The target nucleic acid that comprises SNP and derive from the Maternal plasma sample is carried out the results are shown among Figure 12 of exemplary multiplex amplification.Informedness SNP is distinguished with the single Nucleotide variation that is positioned at polymorphic site, and fetus allelotrope is by comparing with the main contributions of parent nucleic acid to fetus and parent nucleic acid mixture, and it is to relative less important the distinguishing of the contribution of this mixture in the sample.Therefore, in two allelotrope at predetermined polymorphic site place each, the relative abundance of fetus cfDNA in maternal sample can be determined, as the parameter that is mapped to reference to the sum of the unique sequences label of the target nucleic acid sequence on the genome.In one embodiment, for each informedness allelotrope (allelotrope x), as described at the application's elsewhere, calculate the mark of fetal nucleic acid in fetus and the parent nucleic acid mixture.
Use STR sequence and capillary electrophoresis to estimate the fetus mark
Because repetition number is different, individuality has different STR length.Because the polymorphism of STR is high, so most of individuality will be heterozygous, that is, most people has two allelotrope (version)---one by each parental generation heredity---, and each has different repetition number.The fetus STR sequence of non-maternal inheritance will be different from the parent sequence on repetition number.These STR sequences that increase can produce one or both main amplified productions corresponding with parent allelotrope (with the fetus allelotrope of maternal inheritance), with a kind of secondary product corresponding with the fetus allelotrope of non-maternal inheritance.When order-checking, collected sample can be associated and count to determine relative marks by use equation 3 with corresponding allelotrope.
By using fluorescently-labeled primer that the sample of purifying is carried out PCR.Can use artificial, semi-automation or automatization electrophoretic method to separate and detect the PCR product that comprises STR.That automanual system is based on gel and electrophoresis, determination and analysis be combined into a unit.On automanual system, gel assembling and sample load and remain manual program; Yet in case sample loads on the gel, electricity ice, determination and analysis carry out automatically.As its name suggests, kapillary electricity ice is to carry out in microcapillary but not between sheet glass.In case sample, gelatin polymer and damping fluid load on the instrument, then kapillary is full of gelatin polymer and automatic load sample.When the migration of fluorescently-labeled fragment by the detector of fixed point and can be along with collecting them can observe them time, " in real time " carries out data gathering.The sequence of kapillary electricity ice acquisition can be detected by the program of measuring the fluorescent mark wavelength altogether.The calculating of fetus mark is based on average all information marker.The informedness marker is identified by the existence of electrophoretogram upward peak, and these peak values drop in the preset data case parameter for the STR that analyzes.
Inferior allelic mark for any appointed information marker is to calculate divided by the peak height summation of principal constituent by the peak height of submember, and this fraction representation is following per-cent for each information gene seat:
Figure BDA00002366924901941
Equation 3
Can calculate the fetus mark for the sample that comprises two or more informednesses STR, as the fetus mark mean value that calculates for two or more informedness markers.
Use mixture model to estimate the fetus mark
In the embodiment disclosed here, have nearly four kinds of different data types (joining the type situation), they consist of the inferior equipotential gene frequency data of polymorphism under consideration.
As shown in Figure 13, situation 1 and situation 2 are polymorphism situations, and wherein mother is homozygous at a certain allelotrope place.In situation 1, if baby and mother are homozygous, so much state property is situation 1 polymorphism.This situation typically is not to make us interested especially, because only there is one type allelotrope in collected data at the polymorphic site of analyzing.In situation 2, if mother is homozygous and the baby is heterozygous, fetus mark f obtains by 2 times of the ratio of inferior equipotential gene counting and coverage on paper so.Coverage is defined as reading or label (fetus and the parent) sum that is mapped to the polymorphism specific site.It is as follows to come the fetus mark is carried out the equation of approximate evaluation with the mark of fetus and maternal sample in the situation 2:
Figure BDA00002366924901942
Equation 4
In situation 3, wherein mother is that heterozygous and baby are homozygous, the fetus mark be time equipotential gene counting on paper with the 1-2 of the ratio of coverage doubly.In situation 3, always read several marks in these two with fetus and maternal sample and come equation that the fetus mark is similar to as follows:
Figure BDA00002366924901951
Equation 5
At last, in situation 4, wherein mother and fetus all are heterozygous, and inferior equipotential mrna fraction is 0.5 (not comprising error) always.For the polymorphism that drops in the situation 4, can't derive the fetus mark.
If the number of the main allelotrope reading of table 7 general introduction be 300 and the number of inferior equipotential gene reading be 200, use so equation 4 and 5 to estimate the example of fetus marks.Coverage can be 500.
Table 7: use and join the example that type is estimated the fetus mark
Figure BDA00002366924901952
In certain embodiments, can adopt mixture model that the polymorphism sets classification is become two or more type situations of joining that proposes, and simultaneously estimate the foetal DNA mark in these situations each from average gene frequency.In general, mixture model supposes that concrete data acquisition is comprised of the mixing of dissimilar data, its each have the distribution (for example normal distribution) of its expectation.This program attempts to find the mean value of each categorical data and other possible features.In the embodiment disclosed here, have nearly four kinds of different data types (joining the type situation), it consists of the inferior equipotential gene frequency data for polymorphism under consideration.
In adopting some embodiment of mixture model, the one or more factorial moments that provided by equation 1 for the position calculation that just is thought of as polymorphism.For example, use a plurality of SNP position calculation factorial moment F that in dna sequence dna, consider i(or collection of factorial moment).Shown in hereinafter equation 10, each different factorial moment F iTo given position, for inferior equipotential gene frequency a iWith coverage d iRatio, the locational summation of the different polymorphisms of all that consider.Shown in hereinafter equation 11, these factorial moments also relate to and above-mentioned four kinds each relevant parameter alpha and the p that join in the type situation iExactly, they relate to the Probability p for each situation i, and by α given, in four kinds of situations concentrating of the polymorphism of considering each relative quantity.As explained above, Probability p iIn the Cell-free DNA in mother's blood, the function of the mark of foetal DNA.Such as hereinafter more abundant explanation, by calculating these factorial moments of sufficient amount, the method provides the expression formula of sufficient amount to obtain all unknown quantitys.Unknown quantity in the case can be in the polymorphism population of considering, the relative quantity of each in four kinds of situations and with these four kinds of situations in each relevant probability (and being the foetal DNA mark thus).Use the mixture model of other versions can obtain similar results.Some version only utilizes the polymorphism in the situation of dropping on 1 and the situation 2, and wherein the polymorphism of situation 3 and situation 4 is that the passing threshold technology is filtered.
Therefore, factorial moment can be used as the part of mixture model, joins the probability of any combination of four kinds of situations of type with identification.And, as mentioned, these probability, or at least for these probability of situation 2 and situation 3, be directly involved in the foetal DNA mark in the total Cell-free DNA in mother's blood.
Should also be mentioned that, be can be used for reducing the system complexity of the factorial moment equation that must find the solution by the given sequencing error of e.In this, should be realized that sequencing error in fact can have among four kinds of results any (corresponding to be arranged in four of any given polymorphism position may bases each).
Suppose that the main allelotrope counting at genome position j is B, in the first-order statistics amount of the counting (counting of reading) of position j.Main allelotrope, b is corresponding independent variable(s) maximum value (arg max).When considering an above SNP, use subscript.By the following main allelotrope counting that provides:
Equation 6
The inferior equipotential gene counting of assumed position j is A, at the second-order statistic of the counting (that is, the highest inferior allelotrope counting) of position j:
A &equiv; A i &equiv; { a j } = w j , i ( 2 ) Equation 7
Coverage be defined as be mapped to the concrete site of polymorphism always read the number (fetus and parent).The coverage of assumed position j is defined as D:
D ≡ D j={ d i}=A j+ B jEquation 8
In this embodiment, inferior equipotential gene frequency A is four summation as shown in equation 9.Described four kinds of heterozygosity situations prompting is at point (a i, d i) a iThe following binomialexpression mixture model of the distribution of individual equipotential gene counting, wherein d iCoverage:
Figure BDA00002366924901971
Wherein
1=α 1234
m=4
Equation 9
Each joins one of type situation corresponding to four kinds.Each is the product of the binomial distribution of polymorphism mark α and time equipotential gene frequency.These α represent to drop in four kinds of situations the mark of the polymorphism in each.Each binomial distribution has relevant probability, p, and coverage, d.The inferior equipotential gene probability of situation 2 is for example given by f/2, and wherein f is the fetus mark.Be used for making p iAs follows from the different model descriptions that fetus mark and sequencing error rate are related.Parameter alpha i relates to the group specificity parameter and with respect to race and offspring such as parental generation, the ability that allows these values " float " can give these methods extra robustness.
Disclosed embodiment utilization is for the factorial moment of the gene frequency data in considering.As everyone knows, distribution average is first moment.It is the expected value of time equipotential gene frequency.Variance is second moment.It is to calculate from the expected value of gene frequency square.
For different heterozygosity situations, above equation 9 can solve the fetus mark.In certain embodiments, the fetus mark is to solve by the factorial moment method, and wherein hybrid parameter can represent with square, and these squares can easily estimate from observed data.
The gene frequency data of striding all polymorphisms can be used for calculating i factorial moment F i(the first factorial moment F 1, the second factorial moment F 2Deng), shown in equation 10.(SNP only is used for the purpose of example.The polymorphism of other types can be such as the use that elsewhere is discussed in the application.) a given n SNP position, then factorial moment is as giving a definition: ...
F 1 = 1 n &Sigma; i = 1 n a i d i
F 2 = 1 n &Sigma; i = 1 n a i ( a i - 1 ) d i ( d i - 1 )
F j = 1 n &Sigma; i = 1 n a i ( a i - 1 ) &CenterDot; &CenterDot; &CenterDot; ( a i - j + 1 ) d i ( d i - 1 ) ( d i - j + 1 ) Equation 10
As by as shown in these equatioies, factorial moment is the summation that surpasses i item (the individual polymorphism of data centralization), and wherein there be n this type of polymorphism in data centralization.The every of summation is time equipotential gene counting a i, and coverage value d iFunction.
Usefully, factorial moment and α iAnd p iValue relevant, as illustrated in the equation 11.Factorial moment can with { α i, p iAssociation, thereby ... ...
F 1 &ap; &Sigma; i = 1 m &alpha; i p i 1
F 2 &ap; &Sigma; i = 1 m &alpha; i p i 2
F j &ap; &Sigma; i = 1 m &alpha; i p i j
F g &ap; &Sigma; i = 1 m &alpha; i p i g Equation 11
From Probability p iCan determine fetus mark f.For example,
Figure BDA00002366924901988
And
Figure BDA00002366924901989
Therefore, reliably logic can be obtained solution of equations, and this system of equations makes unknown quantity α and p variable with related for the factorial moment expression formula of striding inferior equipotential mrna fraction in a plurality of polymorphisms of considering.Certainly, in the scope of disclosed embodiment, there are the other technologies that mixture model is found the solution.
When n>2* (number of parameters that will estimate), by obtaining { α in the system of equations of being derived by above relational equation 8 i, p iSolution can identify a solution.Obviously, this problem on mathematics difficult many because g is higher, { the α that need to estimate i, p iMore.
Typically can not distinguish exactly by the simple threshold values under the lower fetus mark data of situation 1 and situation 2 (or situation 3 and situation 4).By the point
Figure BDA00002366924901991
Distinguish, can with the data of situation 1 and situation 2 easily with the data separating of situation 3 and situation 4, wherein A is that time equipotential gene counting and D are that coverage and T are threshold values.Found to use T=0.5 can show satisfaction.
Notice that adopting the method with mixed model of equation 10 and equation 11 is to utilize the data of all polymorphisms, but sequencing error is not described respectively.The data of the first and second situations can be illustrated sequencing error from the proper method of the data separating of the third and fourth situation.
In other example, the data set that offers mixture model only comprises the data for the polymorphism of situation 1 and situation 2.These are to be homozygous polymorphism for mother.Can adopt threshold technology to eliminate the polymorphism of situation 3 and 4.For example, before adopting mixture model, time equipotential gene frequency is wherein got rid of greater than the polymorphism of concrete threshold value.Utilize through suitably filtering data and according to the factorial moment of equation 13 and 14 abbreviations hereinafter, people can calculate fetus mark f, as shown in equation 15.Attention equation 13 is the again statements for the equation 9 of this implementation of mixture model.Notice also that in this specific examples the sequencing error relevant with the machine reading is unknown.As a result of, must obtain respectively the error of system of equations, e.
Figure 14 shows the result that uses this mixture model and known fetal mark (X-axis) and the comparison of the fetus mark (Y-axis) estimated.If mixture model ideally dopes the fetus mark, the result who describes so will follow deshed line.Yet the mark of estimation is good significantly, considers that particularly most of data were excluded before the application mix model.
In order to be described in further detail, can utilize some additive methods that the model from equation 7 is carried out parameter estirmation.In some cases, can be the zero tractable solution that finds by chi amount (chi-squared statistic) derivative is set as.Can not find by direct differentiation in the situation of easy solution, it can be effective that binomial probability distribution function (PDF) or other approximation polynomials are carried out Taylor series expansion.The minimum X2 estimator is well known to be effective.Ask the method for square solution to can be used as the starting point of iterative method from equation 9.Can use following card side estimator:
Equation 12
P wherein iCounting of counting i.The alternative manner of Lycra grace (Le Cam) [" asymptotic theory (Asymptotic Theory of Estimation and Testing Hypotheses) of estimation and testability hypothesis ", uncle's gram comes mathematical statistics and probability Conference Papers collection (Proceedings of the Third BerkeleySymposium on Mathematical Statistics and Probability) for the third time, the 1st volume, Bai Ke comes, California (Berkeley CA): University of California press (University of CAPress), 1956, the 129 pages to the 156th page] be the La Erfu-Newton iteration (Ralph-Newtoniteration) in using likelihood function.
Use according to another kind, discuss a kind of method of resolving mixture model, it relates to the expected value maximization approach that is mixed into line operate of pairing approximation β-distribution.
Model 1: situation 1 and 2, sequencing error is unknown
Consider only to illustrate the model that dwindles of heterozygosity situation 1 and 2.In this case, mixture distributes and can be write as:
A={a i}~α 1Bin(e,d i)+α 2Bin(f/2,d i)
Wherein
l=α 12
M=4 equation 13.
And with system of equations:
F 1=α 1e+(1-α 1)(f/2)
F 2=α 1e 2+(1-α 1)(f/2) 2
F 31e 3+ (1-α 1) (f/2) 3 Equation 14,
Solve e (sequencing error rate), α (ratio that situation) and f (fetus mark), wherein F at 1 iSuch as in the above equation 10 definition.The closed-form solution of fetus mark is chosen as the real solution of following equation:
f &ap; ( F 1 - 1 ) F 2 &PlusMinus; F 2 4 F 1 3 + F 2 - 3 F 1 ( 2 + F 1 ) F 2 + 4 F 2 2 2 ( F 1 2 - F 2 ) Equation 15,
This solution is between 0 and 1.
In order to measure the performance of reckoning formula, with being designed to { 1%, 3%, 5%, 10%, 15%, 20% and the constant sequencing error rate of the fetus mark of 25%} and 1% construct the simulated data sets (a of Ha Di-Wen Boge trim point (Hardy-Weinberg Equilibrium points) i, d i).1% specific inaccuracy is employed order-checking machine and the current ratio of accepting of scheme, and with the Yi Lumina genomic constitution part parser II data consistent shown in Figure 15.Equation 15 is applied to these data and find except four points to upper variation of tolerance, unanimous on the whole with " known " fetus mark.Interesting is, according to estimates, the sequencing error rate, e just in time is higher than 1%.
Model 2: situation 1 and 2, sequencing error is known
In next mixture model example, again adopt threshold value to determine or another kind of filtering technique removes the data for polymorphism of the situation of belonging to 3 and 4.Yet in this case, sequencing error is known.The fetus mark has been simplified in this measure, f, the gained expression formula, as shown in equation 16.Figure 16 shows this version of mixture model and compares the result that improvement is provided with the method that equation 15 adopts.In equation subsequently, making order-checking machine error rate is e.
A kind of similar method has been shown in equation 17 and 18.The method recognizes to only have some sequencing errors to add time equipotential gene counting to.Yet, only have one should increase time equipotential gene counting in per four sequencing errors.Very well agreeing with property when Figure 17 shows this technology of use between fetus mark reality and that estimate.
Because the sequencing error rate of the machine that uses is known to a great extent, so by eliminating deviation and the complicacy that can reduce calculating as the e of the variable of wanting to find the solution.Therefore, we have obtained the system of equations for fetus mark f:
F 1=α 1e+(1-α 1)(f/2)
F 21e 2+ (1-α 1) (f/2) 2 Equation 16, separate in order to obtain:
f &ap; 2 ( eF 1 - F 2 ) ( e - F 1 ) .
Figure 16 shows, uses the machine error rate can reduce a little to upper variation of tolerance as known parameter.
Model 3: situation 1 and 2, sequencing error is known, improved error model
In order to improve the deviation in this model, we have launched the error model of above equation so that the following fact to be described: in heterozygosity situation 1, be not that each sequencing error event can be increased to time equipotential gene counting A=a iIn addition, we allow the following fact: the sequencing error event may help the counting of heterozygosity situation 2.Therefore, we are by finding the solution to determine fetus mark f to the system of following factor square relation:
F 1=α 1e/4+(1-α 1)(e+f/2)
F 2 = &alpha; 1 ( e 4 ) 2 + ( 1 - &alpha; 1 ) ( e + f / 2 ) 2 Equation 17.
[549] then the solution of this system is:
f &ap; - 2 ( e 2 - 5 eF 1 + 4 F 2 ) ( e - 4 F 1 ) Equation 18.
Figure 17 shows and uses the machine error rate as known parameters, and the simulated data of enhancing situation 1 and 2 error model makes to upper variation of tolerance greatly to be reduced to less than for the point that is lower than 0.2 fetus mark.
Use the fetus mark that affected sample is classified
In certain embodiments, adopt fetus mark estimated value further to characterize affected sample.In some cases, to allow affected sample classification be mosaic, complete dysploidy or the dysploidy of part to fetus mark estimated value.Describe for a kind of computer-implemented method that obtains this information with respect to the schema of Figure 18.Can carry out determining and the classification of CNV of estimation that this with relevant method provides the fetus mark simultaneously, CNV.In other words, can adopt identical label to carry out in these three kinds of functions any.
In order to use the method, adopt the pattern of two kinds of assessment fetus marks.A kind of mode producing NCNFF value, and another kind of mode producing CNFF value.As explained above, the CNFF value is to use to depend on the technology that is determined the karyomit(e) that has the copy number variation or chromosome segment and obtain.Do not need to rely on polymorphism and calculate the fetus mark.An example that is used for calculating the non-polymorphic technology of fetus mark is described in the example 17, and what there was whole chromosome in this example hypothesis copies or lack and adopt following formula:
Ff (i)=2*NCV JACV JUEquation 28,
Wherein j represents the chromosomal identification of dysploidy, and CV represents the definite variation coefficient for the mean value in the expression formula of NCV and standard deviation that is used for that obtains from qualified samples.
The NCNFF value is to use the technology depend on karyomit(e) with copy number variation or chromosome segment and obtain.In other words, NCN fetus mark is to be used for calculating in the situation of normal ploidy of genomic part of fetus mark in hypothesis, determines by the technology of determining reliably the fetus mark.CN fetus mark is to determine by the technology that the sample that hypothesis is paid attention to has a kind of form of dysploidy.The CNV of affected karyomit(e) or chromosome segment is used for calculating CN fetus mark.Hereinafter present the technology for its calculating.
By the estimated value of the estimated value contrast CN fetus mark of NCN fetus mark relatively, a kind of method can be determined the type of the dysploidy that may exist in the sample.Basically, if NCN fetus mark and CN fetus fractional value coupling can be considered to be real for assessment of the hypothesis of the ploidy in the technology of CN fetus mark so.For example, has complete chromosomal aneuploidy if calculate the method assumes samples of CN fetus mark, this dysploidy represents a chromosomal single additional copies or a chromosomal single disappearance, and NCN fetus fractional value coupling CN fetus fractional value, the method can draw to draw a conclusion so: this sample represents complete chromosomal aneuploidy.The basis of making this hypothesis is described in greater detail in hereinafter.
NCN fetus mark can be determined by different technology.In some embodiments, use the selected polymorphism in the canonical sequence genome to estimate NCN fetus mark.The example of these technology is described in above.In other embodiments, NCN fetus mark is not aneuploid or has determined that the euploid chromosomal relative quantity of right and wrong is not determined with known.For instance, the euploid karyomit(e) of known not right and wrong may be chromosome x in the male fetus in the sample.Therefore, in other embodiments, use comprises from the X chromosome in the sample of the pregnant woman's who nourishes son DNA or the relative quantity (for example, so chromosomal karyomit(e) dosage) of Y chromosome determines NCN fetus mark.Son's genome should not comprise the second copy of X chromosome.Known this point, the relative quantity of X chromosome DNA can be used for providing the NCN value of fetus mark.In the sample that comprises female child DNA, the euploid karyomit(e) of known not right and wrong can be known not compatible with life karyomit(e).Alternately, for the sample that comprises from the DNA of sex fetus, can use sequence label to determine karyomit(e) dosage (with NCV or NSV) to confirm that karyomit(e) can be used for determining NCN fetus mark, determine to can be used for to determine the existence of the chromosomal normal ploidy of NCN fetus mark.
Forward the schema 1800 of Figure 18 to, relatively NCN fetus mark estimated value 1802 and CN fetus mark estimated value 1804.If their couplings, such as square frame 1806 places indications, this process is reached a conclusion so, and is identified for estimating that contained hypothesis is real in the technology of CN fetus mark.In different embodiments, this is assumed to be: have trisomy or monosomy in one of karyomit(e) of fetus.
On the other hand, if this relatively point out, the value of two fetus marks do not mate (condition 1808) and in fact the estimated value of CN fetus mark will carry out such as the indication of square frame 1810 places the subordinate phase of the method so less than NCN fetus mark.
In this subordinate phase, the method determines that sample comprises dysploidy or the mosaic of part.In addition, if sample comprises the dysploidy of part, the method determines where dysploidy resides in go the aneuploid karyomit(e) so.In certain embodiments, this is to realize by at first affected karyomit(e) being cased into a plurality of matrixs.In an example, each matrix is about 100 ten thousand base pairs in length.Certainly, can use other matrix length, according to appointment 1 kilobase, about 10 kilobase, about 100 kilobase etc.These matrixs not overlapping and leap this chromosomal major part or all length.These matrixs or data box are compared to each other, and this relatively provides the opinion about condition.In one approach, for each matrix or data box, data box dosage is counted and randomly changed into to the label of mapping.If any one in these data boxes or the matrix is aneuploid, these countings or data box dosage are just pointed out it so.As the part of the analysis of independent data box, can be more suitable be will be from the information normalization method of each data box to make a variation between the explanation data box, such as G-C content.The normalized data box of gained can be called the NBV for normalized data box value; NBV is an example of chromosome segment, and this chromosome segment normalizes to the label (in following instance 19) of the normalization method section of the GC content that is mapped to the section with similar GC content.In some embodiments, calculate the independent value of fetus mark and comparison fetus fractional value for each data box.This sequential analysis of each data box is depicted in the square frame 1812 of Figure 18.If any data box or matrix are identified as having dysploidy (by considering label density, fetus mark or other information), the method determines that the data box that this sample comprises the dysploidy of part and additionally fully departs from desired value with label counting wherein locates this dysploidy so.Referring to square frame 1814.
Yet if when analyzing chromosomal separately these ends of paying attention to, the method nonrecognition represents any chromosomal region of dysploidy, and the method determines that sample comprises mosaic so.Referring to square frame 1816.
On the interested karyomit(e) of affected sample and the euploid karyomit(e) of known not right and wrong (for example, chromosome x) upper polymorphism of using, SNP for example calculates and more real fetus is divided Number is in order to exist or do not exist dysploidy complete or part in definite male fetus
As explained above, the polymorphic sequence of use information, information SNP for example, definite fetus mark (FF) can be used for distinguishing the dysploidy of complete chromosomal aneuploidy and part.
There is or do not exist dysploidy, no matter be part or complete, can determine from the value of the fetus mark determined with existing polymorphic target sequence on the interested karyomit(e), and from use this sample in the value of the fetus mark that existing polymorphic target sequence is determined on the different karyomit(e) compare.Be in the male sex's the sample fetus, can determine the FF on the interested karyomit(e), and with same sample in the FF that determines for chromosome x compare.For example, given maternal sample is selected polymorphic sequence so from the mother who nourishes the male fetus with trisomy 21, for example comprises the sequence of at least one information SNP, so as to be presented on the karyomit(e) 21 and chromosome x on; Polymorphic target sequence is increased and checks order, and determine the fetus mark such as the elsewhere explanation in the application.
The amount of fetal chromosomal is proportional in given fetus mark and the sample, use so in the maternal sample existing polymorphic sequence is determined on the trisomy chromosome fetus mark will be the fetus mark that the polymorphic sequence on the euploid karyomit(e) of known not right and wrong (for example, chromosome x) is determined in the male fetus that uses in the identical maternal sample 1+1/2 doubly.For example, in normal specimens, when using the polymorphism group on the karyomit(e) 21 to determine fetus mark (FF 21) and use the polymorphism group on the chromosome x to determine fetus mark (FF X) time, known chromosome x is unaffected in male fetus, so FF 21=FF XYet, if fetus is trisomys for karyomit(e) 21, so for the fetus mark (FF of trisomy chromosome 21 21) will equal the fetus mark (FF of chromosome x in the same sample X) one and 1/2nd times of (FF 21=1.5*FF X).So, if FF 21<FF X, analysis logic can draw to draw a conclusion so: exist karyomit(e) 21 part disappearance and/or have mosaic.If FF 21>FF X, analysis logic can draw to draw a conclusion so: the part of karyomit(e) 21 increases to some extent, copying or multiplication or complete copying of the part of karyomit(e) 21 for example, and karyomit(e) 21 is not describing in the technology that is used for by karyomit(e) 21 calculates the fetus marks.Difference between two results can one be solved and be the copying of part, and will produce<1.5*FF XFF.Alternately, copying, lack or existing of the part of mosaic can be by for example increasing the polymorphic sequence number on the karyomit(e) 21 in order to obtain a plurality of FF values and determine along this chromosomal length, so that show that for the part existence of the dual or multiple value of FF a chromosomal part increases to some extent.Alternately, as will be as the situation for the mosaic sample, the FF that is determined by polymorphic sequence remains unchanged in chromosomal whole length, shows that complete chromosomal amount totally increases, but should increase less than for FF XIncrease, as indicated above.Exist in the situation of whole chromosomal loss, for example chromosome x monosomy, so FF Monosomy=1/2FF XThe fetus fractional value that is obtained by the polymorphic sequence of information can be used to and sequence dosage and its normalized dose value, and for example NCV, NSV combination is used for confirming to exist complete dysploidy.
Karyomit(e) Rapid Dose Calculation fetus mark by the aneuploid sequence
Calculate for interested chromosomal NCV according to following equation:
NCV ij = x ij - &mu; ^ j &sigma; ^ j Equation 19,
Wherein
Figure BDA00002366924902062
With
Figure BDA00002366924902063
Be estimation mean and the standard deviation for j karyomit(e) dosage in the qualified sample sets accordingly, and x IjJ karyomit(e) dosage of observation of specimen i.
Generally, the karyomit(e) dosage for trisomy will increase pro rata with fetus mark (ff).Therefore, the ff for the karyomit(e) dosage in the sample that contains trisomy chromosome will increase pro rata with respect to the fetus mark:
R jA = ( 1 + ff 2 ) R jU Equation 20;
Karyomit(e) dosage for monosomy will reduce pro rata with fetus mark (ff).Therefore, the ff for the karyomit(e) dosage in the sample that contains Monosomy will reduce pro rata with respect to the fetus mark:
Figure BDA00002366924902065
Equation 21; In the equation 20 and 21, R JAFor the karyomit(e) dosage (x of karyomit(e) j among affected sample (for example, the maternal sample to be tested) i Ij); Ff is the expection fetus mark among unaffected (qualified) sample U; And R JUThe karyomit(e) dosage in the unaffected sample.Comprise the factor " 2 " based on following hypothesis: the compute sign in the equation 20 is " plus sige ", namely has an interested chromosomal extra copy; Compute sign in the equation 21 is " minus sign ", namely lacks an interested chromosomal complete copy.If make in addition different hypothesis (for example, this is copying of interested chromosomal part), the factor " 2 " does not represent practical significance so.
Substitute the karyomit(e) dosage R in the equation 19 A:
NCV jA = R jA - R &OverBar; jU &sigma; jU Equation 22
Wherein
Figure BDA00002366924902071
Be Equivalently represented, and σ JUBe
Figure BDA00002366924902073
Equivalently represented; The following ff that solves:
NCV jA = ( 1 + ff 2 ) R jU &OverBar; - R jU &OverBar; &sigma; jU Or NCV jA = ( 1 - ff 2 ) R jU &OverBar; - R jU &OverBar; &sigma; jU Equation 23
NCV jA = ( ff 2 ) R jU &OverBar; &sigma; jU Or NCV jA = ( ff 2 ) R jU &OverBar; &sigma; jU Equation 24
NCV jA = ff 2 CV jU Or NCV jA = ff 2 CV jU Equation 25.
Therefore, can be with any chromosomal per-cent " ff for the trisomy chromosome hypothesis (i)" be defined as:
Ff (i)=2*NCV JACV JUEquation 26.
Can be with any chromosomal per-cent " ff for the Monosomy hypothesis (i)" be defined as:
Ff (i)=-2*NCV JACV JUEquation 27.
The hypothesis of equation 27 is chromosomal complete copy disappearances.The NCV that this karyomit(e) is corresponding JAIt must be negative.Therefore, although equation 27 contains negative sign, the fetus mark that calculates remains positive.
Because the fetus mark can not be negative, any chromosomal " ff (i)" can calculate by following equation:
Ff (i)=2*|NCV JACV JU| equation 28
Use the fetus mark to solve without judging
Conclude the ability of determining the significant difference of the expression of existing one or more sequences in two genomic mixtures with respect to second genomic contribution based on first genomic relative sequence contribution.For example, use the non-invasive prenatal diagnosis of the cfDNA in the maternal sample challenging, because only have sub-fraction DNA sample source in fetus.For the antenatal diagnosis analysis, the background of mother body D NA has formed the physical constraints to sensitivity, and therefore, the mark of existing foetal DNA is an important parameter in the maternal sample.Depend on foetal DNA mark and the molecule number of counting by the sensitivity that the fetus dysploidy that the dna molecular counting is carried out detects.
Typically, about 1% is " without judging " sample in the parent specimen of analyzing for the fetus dysploidy by extensive parallel order-checking, for it, inadequate order-checking information, for example fetus sequence label number has hindered and has determined assertorically to exist or do not exist in the maternal sample one or more fetus dysploidy." without judge " determine may since fetus cfDNA content with respect to maternally contributing to for the content that is used for providing order-checking information sample low so that distinguished due to the aneuploid sample by the determined order-checking information of qualified sample.In order to determine that " " without judging " sample yes or no aneuploid sample is determined by rule of thumb and/or for example obtained the fetus mark by NVC value, and is used for the existence of definite or negative chromosomal aneuploidy.As described in other parts of this paper, ff can be used for the type of existing dysploidy in the characterization test sample.For example, for will " without judge " district be located at 2.5 and the 4NCV value between threshold value, having the specimen that NCV and demonstration near 4 times of NCV threshold values have lower (for example less than 3%) fetus mark may be affected sample.Otherwise having the specimen that NCV and demonstration near the 2.5NCV threshold value have higher (for example greater than 40%) fetus mark may be unaffected sample.Split " without judging " sample and may depend on a kind of definite of fetus mark.Preferably, according to two or more diverse ways, or by determining the fetus mark with the NCV that utilizes identical method from two or more different karyomit(e)s of sample, to determine, similarly, the fetus mark can be used for evaluating NCV be slightly larger than 4 or the sample that is slightly less than NCV 2.5 accordingly possibility be that false positive or false negative are judged.
Be used for determining equipment and the system of CNV
Analysis to sequencing data is typically carried out with algorithm and program that different computers is carried out with the diagnosis that stems from it.Therefore, some embodiments adopts to relate in one or more computer systems or other treatment systems and data is stored or by its technique that shifts.A plurality of embodiment of the present invention is the equipment about being used for carrying out these operations also.This equipment can be constructed especially for required purpose, or it can be the multi-purpose computer (or one group of computer) that is optionally activated or reconfigured by the computer program of storing in the computer and/or data structure.In some embodiments, one group of treater is with cooperation mode and/or carry out simultaneously the analysis operation (for example by network or cloud computing) of some or all of narrations.Can belong to different types for a treater or one group of treater of carrying out method as herein described, comprise microcontroller and microprocessor, such as programmable device (for example CPLD and FPGA) and non-programmable device, such as gate array ASIC or general purpose microprocessor.
In addition, some embodiment is about computer-readable media tangible and/or nonvolatile or computer program, these media or product comprise programmed instruction and/or data (comprising data structure), and these programmed instruction and/or data (comprising data structure) are used for carrying out different from computer-implemented operation.The example of computer-readable media includes but not limited to semiconductor storage; Magnetic media is such as disc driver, tape; Optical media is such as CD; Magneto-optical media; And through being configured to especially to store and the hardware unit of execution of program instructions, such as read-only memory device (ROM) and random access memory (RAM).Computer-readable media can directly be controlled by the final user, or media can be controlled indirectly by the final user.Comprised by the example of directly actuated media to be positioned at not the user's set shared with other mechanisms and/or the media at media place.Be subjected to the example of the media of indirectly control comprise the user by external network and/or by the service (such as " cloud ") that sharing resources is provided accessible media indirectly.The example of programmed instruction comprises machine code (as being produced by program compiler) and comprises the file that can be used by computer the high-level code of interpreter execution.
In different embodiments, the data that adopt in disclosed method and the equipment or information are to provide with electronic format.These data or information can comprise reading and the label that stems from nucleic acid samples, compare counting or the density of these labels of (for example comparing with karyomit(e) or chromosome segment) with the specific region of canonical sequence, canonical sequence (comprising the canonical sequence of only or mainly putting forward polymorphism), karyomit(e) and section dosage, judge (judging such as dysploidy), normalized karyomit(e) and section value, pairing chromosomes or section and corresponding normalization method karyomit(e) or section, the consulting suggestion, diagnosis etc.As used herein, the data that provide with electronic format or other information can be stored on the machine and between machine to be transmitted.Routinely, the data that are electronic format provide with the numerical digit form, and can be used as bit and/or the byte form is stored in different data structures, tabulation, the database.The modes such as these data can electronics, optics embody.
In one embodiment, the invention provides a kind of computer program, this product is for generation of the output that has or do not exist dysploidy (for example fetus dysploidy) or cancer in the indication specimen.This computer product can contain and is useful on the instruction of carrying out any or multiple aforesaid method for determining chromosome abnormalty.As described, this computer product can comprise computer-readable media nonvolatile and/or tangible, has the executable logic that maybe can compile of record computer thereon (for example instruction) on this computer-readable media and determines karyomit(e) dosage and exist in some cases still not have the fetus dysploidy in order to start treater.In an example, this computer product comprises computer-readable media, this computer-readable media has the executable logic that maybe can compile of record computer thereon (for example instruction) and comes the diagnosing fetal dysploidy in order to start treater, this computer product comprises: a reception program, be used for receiving the sequencing data from least a portion nucleic acid molecule of parent biological sample, wherein this sequencing data comprises as calculated karyomit(e) and/or section dosage; The area of computer aided logic is used for the data analysis fetus dysploidy according to this reception; And a written-out program, for generation of the existence of this fetus dysploidy of indication, do not exist or the output of kind.
From the order-checking information of the sample of paying attention to can be mapped to the karyomit(e) canonical sequence with identify many in any one or a plurality of interested karyomit(e) each sequence label and identify many sequence labels for the normalization method sector sequence of each in described any one or a plurality of interested karyomit(e).In different embodiments, these canonical sequences are stored in the database, for example relation curve or target database.
Should be understood that and allow the calculating operation of not carrying out the disclosed method of this paper with the people of auxiliary means in most of the cases be unpractical or even impossible.For example, in the situation auxiliary without calculating device, will be mapped to any human chromosomal from the single 30bp reading of sample and may need the effort in several years.Certainly, this problem since reliably dysploidy judge and need generally to shine upon one or more chromosomal thousands of (for example at least about 10,000) or even millions of readings and complicated.
The disclosed method of this paper can be carried out with computer-readable media, and this computer-readable media has computer-readable instruction stored thereon, is used for carrying out being used for identifying any CNV, for example the method for the dysploidy of karyomit(e) or part.Therefore, in one embodiment, the invention provides a kind of computer-readable media, this computer-readable media has computer-readable instruction stored thereon, be used for to carry out be used for differentiating complete and chromosomal aneuploidy part, for example method of fetus dysploidy.These instructions for example can comprise the instruction that is used for carrying out following operation: (a) obtain for the sequence information of the fetus of a sample and parent nucleic acid and/or at least temporarily these information are stored in the computer-readable media; (b) use the sequence information store many for each sequence label any one or a plurality of interested karyomit(e) that is selected from karyomit(e) 1-22, X and Y from the mixture Computer identification of fetus and parent nucleic acid, and identify many sequence labels at least one the normalization method chromosome sequence of each in these one or more interested karyomit(e)s; And (c) use for the sequence label number of each identification in these one or more interested karyomit(e)s with for the sequence label number of each normalization method chromosome sequence identification, by each interested chromosomal single karyomit(e) dosage of computer calculates.These instructions can be carried out with one or more treaters through suitably design or configuration.These instructions can additionally comprise each karyomit(e) dosage and dependent thresholds are compared, and determine thus to exist or do not exist in this sample any four kinds or different fetal chromosomal aneuploidies more kinds of parts or complete.Illustrated as mentioned, there are many change programmes about this technique.All these change programmes can be implemented when using as described here processing and storage feature.
In some embodiments, these instructions may further include the information that automatically records about the method in for the human experimenter's that the parent specimen is provided patient medical records, such as karyomit(e) dosage and existence or there is not fetal chromosomal aneuploidy.This patient medical records can be preserved by for example laboratory, doctor's office, hospital, HMO, Insurance Company or individual medical records website.In addition, based on the result of the analysis of being implemented by treater, the method can further relate to prescribe, treatment that initial and/or change obtains the human experimenter of parent specimen.This may relate to the additional samples of taking from this experimenter is carried out one or more additional measurements or analysis.
Disclosed method can also be carried out with computer processing system, and this computer processing system is used for identifying any CNV through adjusting or be configured to carry out, for example the method for the dysploidy of karyomit(e) or part.Therefore, in one embodiment, the invention provides a kind of computer processing system, it is through adjusting or be configured to carry out method as described herein.In one embodiment, this equipment comprises a sequencing device, and this sequencing device checks order to obtain the described sequence information type of other parts of this paper through adjusting or dispose at least a portion nucleic acid molecule to sample.This equipment can also comprise the device for the treatment of sample.These unit describes are in other parts of this paper.
Sequence or other data can be input to directly or indirectly in the computer or be stored on the computer-readable media.In one embodiment, computer system is directly connected on the sequencing device that can read and/or analyze from the nucleotide sequence of sample.Deriving from the sequence of these instruments or other information exchanges crosses the interface and is provided in the computer system.Scheme by sequence storage source, provides the sequence of processing by system such as database or other thesauruss as an alternative.After with this treatment unit, storing device or mass storage device at least temporarily cushion or store the sequence of nucleic acid.In addition, storing device can be stored for different karyomit(e) or genomic label counting etc.This storer can also be stored for the sequence of analyzing existence or different sub-routines and/or the program of mapping (enum) data.These program/sub-routines can comprise for the program of carrying out statistical study etc.
In an example, the user provides a sample in sequencing device.Collect and/or analytical data by the sequencing device that is connected to computer.Software on this computer allows data gathering and/or analysis.Data can store, show (by watch-dog or other allied equipments) and/or send to the another location.This computer can be connected to the Internet, be used for transferring data to the employed handheld type devices of long-distance user (for example doctor, scientist or analyst).Should be understood that and before transmission, to store and/or analytical data.In some embodiments, collect raw data and sending to long-distance user or device to this data analysis and/or storage.Can transmit by the Internet, but also can be via satellite or other connections carry out.Scheme can store data on the computer-readable media as an alternative, and these media can be delivered to final user place (for example passing through mail).This long-distance user can be in identical or different geographical position, includes but not limited to buildings, city, state, country or continent.
In some embodiments, these methods also comprise the data of collecting about a plurality of polynucleotide sequences (for example reading, label and/or with reference to chromosome sequence) and these data are sent to computer or other computing systems.For example, this computer can be connected to laboratory equipment, for example sample collection device, amplification oligonucleotide device, nucleotide sequencing device or hybrid device.Then, this computer can be collected the proper data that is gathered by lab setup.Can be in any step, for example when collecting in real time, before sending, sending during or simultaneously or after sending these data are stored on computers.These data can be stored on the computer-readable media that from this computer, to extract.Data collected or storage can be transferred to remote location from this computer, for example by local area network or Wide area network, such as the Internet.At this remote location place, can as mentioned belowly carry out different operations to the data of transmitting.
The type of the electronic format data that can store in the disclosed system of this paper, device and method, transmit, analyze and/or operate is as follows:
By the reading that the nucleic acid in the specimen is checked order obtain
By the label that reading and reference gene group or other canonical sequences are compared and obtained
This reference gene group or sequence
Sequence label density-for each counting or the number of tags in two or more zones (typically being karyomit(e) or chromosome segment) of reference genome or other canonical sequences
For the normalization method karyomit(e) of interested specific karyomit(e) or chromosome segment or the consistence of chromosome segment
For available from interested karyomit(e) or section and corresponding normalization method karyomit(e) or the karyomit(e) of section or the dosage of chromosome segment (or other zones)
Be used for to judge that karyomit(e) dosage is influenced, uninfluenced or without the threshold value of judging;
The actual judgement of karyomit(e) dosage
Diagnosis (judging relevant clinical condition with these)
Stem from the suggestion for other tests of these judgements and/or diagnosis
The treatment and/or the monitoring plan that stem from these judgements and/or diagnosis
These different data types can use different devices to obtain, store, transmit, analyze and/or operation in one or more positions.Processing selecting is crossed over relative broad range.At an end of this scope, in the position of processing this specimen, for example doctor's office or other clinical settings are stored all or most this information and are used.In another kind is extreme, obtain sample a position, in different positions it is processed and optionally check order, at one or more different positions comparison reading and judge, and make diagnosis, suggestion and/or plan again another position (it can be the position that obtains sample).
In different embodiments, utilize this sequencing device to produce these readings, then be transferred to remote site, at this remote spots place it is processed to produce dysploidy and judge.At this remote location, for example, these readings and canonical sequence are compared to produce label, it is counted and distributes to interested karyomit(e) or section.At this remote location, use relevant normalization method karyomit(e) or section that these countings are changed into dosage equally.Further again, at this remote location, these dosage are used for producing dysploidy judge.
Can be as follows in the processing operation that different positions adopts:
Sample collection
Sample preparation before the order-checking
Order-checking
Analytical sequence data and derivation dysploidy are judged
Diagnosis
Report diagnosis and/or judgement to patient or nursing supplier
Formulation is for the plan of further treatment, test and/or monitoring
Carry out this plan
Consulting
These the operation in any one or a plurality of can be such as automatization as described in other parts of this paper.Typically, order-checking and sequence data analyzed and the dysploidy of deriving is judged and will be carried out on computers.But other operation artificiallies or automatically execution.
The example that can carry out the position of sample collection comprises health worker office, clinic, patient family's (wherein sampling collection kit or test kit) and movable nursing trolley.The example of the position of the front sample preparation that can check order comprises that health worker office, clinic, patient family's (wherein sampling treatment unit or test kit), movable nursing trolley and dysploidy analyze supplier's facility.The example of the position that can check order comprises that health worker office, clinic, health worker office, clinic, patient family's (wherein sampling sequencing device and/or test kit), movable nursing trolley and dysploidy analyze supplier's facility.The position of checking order can provide dedicated Internet access is electronic format to be used for transmission sequencing data (typically being reading).This connection can be wired or wireless, and and may be through configuration can process and/or the website of combined data in order to before being transferred to process points, data are sent to.The data summary device can be safeguarded by health care organization, such as HMO (HMO).
Analyze and/or the operation of deriving can be in any above-mentioned position, or scheme as an alternative, be devoted to calculate and/or another remote site of nucleic acid sequence data Analysis Service carries out.These positions comprise for example cluster, such as generic server district, dysploidy Analysis Service industry facility etc.In some embodiments, the calculating device that is used for execution analysis is leased or is rented.Computational resource can be treater in the part of the accessible set in Internet, as be commonly called as processing resource for cloud.In some cases, calculating is carried out by associated with each other or not associated parallel or Massively Parallel Processor group.Processing can realize with distributed processing, such as cluster calculating, grid computing etc.In these embodiments, the cluster of computational resource or grid are concentrated to form and are worked to carry out a plurality of treaters of analysis as herein described and/or derivation or the super virtual machine that computer consists of by one.These technology and how conventional supercomputer can be used for processing sequence data as described herein.The parallel computing form of respectively doing for oneself and depending on processor computer.In the situation of grid computing, these treaters (often being complete computer) connect by general networks agreement (such as Ethernet) by network (private, public or Internet).On the contrary, supercomputer has many treaters that connected by local high-speed computing machine bus.
In certain embodiments, produce diagnosis (for example fetus suffers from the cancer that mongolism or patient suffer from particular type) in the position identical with analysis operation.In other embodiments, it is to carry out in different positions.In some instances, the report diagnosis is to carry out in the position that obtains sample, but situation is also not necessarily like this.The example of the position that can produce or report diagnosis and/or make a plan comprises health worker office, clinic, the accessible internet site of computer and the handheld type devices with the wired or wireless connection that is connected to network, such as mobile phone, flat board, smart phone etc.The example of the position of seeking advice from comprises health worker office, clinic, the accessible internet site of computer, handheld type devices etc.
In some embodiments, carry out sample collection, sample preparation and order-checking operation in first position, and the operation of deriving second position.Yet in some cases, sample collection is that collect (for example health worker office or clinic) a position, and sample preparation is to carry out a different position with order-checking, and this position is the same position for analyzing and deriving optionally.
In different embodiments, the order of operation listed above can be triggered by user or the mechanism of beginning sample collection, sample preparation and/or order-checking.After beginning to carry out one or more these operations, other operations can be naturally subsequently.For example, order-checking operation can make reading automatically be collected and send to treatment unit, and then this treatment unit usually automatically and may carry out sequential analysis and the operation of derivation dysploidy in without the situation of other user interventions.In some implementations, the result that then will process operation automatically sends (may follow reformatting as diagnosis) to system component or mechanism, this system component or mechanism's process information and report to fitness guru and/or the patient.As described, this information may with consultation information, can also process to produce treatment, test and/or monitoring plan through automatic.Therefore, begin early stage operation can the trigger end opposite end order, diagnosis is provided, plans, seeks advice from and/or can be used for to act on therein other information of physical integrity to fitness guru, patient or other relevant groups.Even the each several part of whole system separates physically and may be away from the position of for example sample and sequence device, this measure also can realize.
Figure 19 shows for an implementation that produces the dispersion system of judging or diagnosing from specimen.Sample collection position 01 is used for from the patient, such as the cancer patient place acquisition specimen of conceived women or supposition.Then with offering sample to processing and order-checking position 03, wherein can be as indicated above specimen be processed and is checked order.The device that position 03 comprises for the treatment of the device of sample and is used for treated sample is checked order.Be the set of reading such as the described sequencing result of other parts of this paper, these readings typically provide and are provided to network with electronic format, such as the Internet, this network in Figure 19 with reference to numbering 05 indication.
This sequence data is provided to remote location 07 place, analyzes therein and judge generation.This position can comprise one or more efficient calculation device, such as computer or treater.The computational resource that sets to 07 places in place has been finished their analysis and after the judgement of sequence information generation that receives, should have been judged that a minute journey was delivered to network 05.In some embodiments, not only 7 places that set to 0 in place produce judgement, but also produce dependent diagnostic.Then as illustrated in fig. 19 should judge with or diagnose by Internet Transmission and pass sample collection position 01 back.As described, how this is only about to distribute between different positions and produce judgement or diagnose one of many change programmes of relevant different operation.A common change programme relates in the sampling of single position to be collected and processes and order-checking.Another change programme relates in the position identical with judging generation with analysis provides processing and order-checking.
Figure 20 is to describing in detail for the selection of carrying out different operations in different positions.On the most comprehensive meaning described in Figure 20, each following operation is to carry out in the position that separates: sample collection, sample preparation, order-checking, read-around ratio are to, judgement, diagnosis and report and/or plan.
In some the embodiment in gathering these operations, carry out sample preparation and order-checking a position, and carry out a position that separates read-around ratio to, judge and diagnosis.The part by the alphabetical A sign of reference referring to Figure 20.In the another kind of implementation by the sign of the letter b among Figure 20, sample collection, sample preparation and order-checking are all carried out in same position.In this implementation, read-around ratio is to carrying out second position with judging.At last, diagnosis and report and/or program launched are carried out the 3rd position.In by the described implementation of the letter C among Figure 20, sample collection carries out in first position, sample preparation, order-checking, read-around ratio to, judge and diagnosis is all carried out second position together that and report and/or plan carry out the 3rd position.At last, in the implementation by the alphabetical D institute mark among Figure 20, sample collection carries out in first position, sample preparation, order-checking, read-around ratio to and judge and all carry out second position that and diagnosis and report and/or plan are processed and carried out the 3rd position.
In one embodiment, the invention provides a kind of system, be used for determining that there is or does not exist any or multiple different complete fetal chromosomal aneuploidy in the parent specimen that comprises fetus and parent nucleic acid, this system comprises: an order-checking device is used for receiving nucleic acid samples and fetus and the parent nucleic acid sequence information that derives from this sample being provided; A treater; And a machine readable gets storage media, comprises the instruction of carrying out at this treater, and these instructions comprise:
(a) be used for to obtain the code of the sequence information of these fetuses of this sample and parent nucleic acid;
(b) be used for using described sequence information by computer from these fetuses and the identification of parent nucleic acid for each the many sequence labels any one or a plurality of interested karyomit(e) that are selected from karyomit(e) 1-22, X and Y, and identification in described any one or a plurality of interested karyomit(e) each at least one normalization method chromosome sequence or the code of many sequence labels of normalization method chromosome segment sequence;
(c) use for each described sequence label numbers of identifying of described any one or a plurality of interested karyomit(e)s and for the described sequence label number of each normalization method chromosome sequence or normalization method chromosome segment recognition sequence and calculate for this each the code of single karyomit(e) dosage in any one or a plurality of interested karyomit(e); And
(d) be used for relatively for this each corresponding threshold value in each each single karyomit(e) dosage and any one or a plurality of interested karyomit(e) for this of any one or a plurality of interested karyomit(e)s, and determine thus to exist in this sample or do not exist the code of any or multiple complete different fetal chromosomal aneuploidies.
In some embodiments, be used for to calculate each the code of single karyomit(e) dosage for any one or a plurality of interested karyomit(e)s comprise for will a selected interested chromosomal karyomit(e) Rapid Dose Calculation for for the code of selected interested chromosomal sequence label number with the ratio of the sequence label number of identifying for selected interested chromosomal accordingly at least one normalization method chromosome sequence or normalization method chromosome segment sequence.
In some embodiments, this system further comprises for each the code of karyomit(e) dosage of double counting for any all the other chromosome segments of any one or a plurality of interested chromosomal any one or a plurality of sections.
In some embodiments, these one or more interested karyomit(e)s that are selected from karyomit(e) 1-22, X and Y comprise at least two ten karyomit(e)s that are selected from karyomit(e) 1-22, X and Y, and wherein these instructions comprise for the instruction of determining to exist or do not exist 20 kinds of different complete fetal chromosomal aneuploidies at least.
In some embodiments, this at least one normalization method chromosome sequence is a group chromosome that is selected from karyomit(e) 1-22, X and Y.In other embodiments, this at least one normalization method chromosome sequence is a monosome that is selected from karyomit(e) 1-22, X and Y.
In another embodiment, the invention provides a kind of system, for the fetal chromosomal aneuploidy of determining to comprise the parent specimen existence of fetus and parent nucleic acid or not having any or multiple distinct portions, this system comprises: an order-checking device is used for receiving nucleic acid samples and fetus and the parent nucleic acid sequence information that derives from this sample being provided; A treater; And a machine readable gets storage media, comprises the instruction of carrying out at this treater, and these instructions comprise:
(a) be used for obtaining the code of the sequence information of the described fetus of described sample and parent nucleic acid;
(b) be used for using described sequence information by computer from these fetuses and the identification of parent nucleic acid for each the many sequence labels any one or a plurality of interested chromosomal any one or a plurality of section that are selected from karyomit(e) 1-22, X and Y, and identification is for each the code of many sequence labels of at least one normalization method sector sequence in any one or a plurality of interested chromosomal described any one or a plurality of section;
(c) use for each described sequence label numbers of identifying of any one or a plurality of interested chromosomal described any one or a plurality of sections and for the described sequence label number of described normalization method sector sequence identification and calculate for each the code of single chromosome segment dosage in any one or a plurality of interested chromosomal described any one or a plurality of section; And
(d) be used for relatively in each the described single chromosome segment dosage of any one or a plurality of interested chromosomal described any one or a plurality of sections each with for each the corresponding threshold value in any one or a plurality of interested chromosomal described any one or a plurality of chromosome segment, and determine to exist or do not exist in the described sample thus the code of the fetal chromosomal aneuploidy of one or more distinct portions.
In some embodiments, the code that is used for calculating single chromosome segment dosage comprises the code of the ratio of the sequence label number of identifying for the sequence label number identified for selected chromosome segment and for the corresponding normalization method sector sequence of selected chromosome segment for chromosome segment Rapid Dose Calculation that will a selected chromosome segment.
In some embodiments, this system further comprises for each the code of chromosome segment dosage of double counting for any all the other chromosome segments of any one or a plurality of interested chromosomal any one or a plurality of sections.
In some embodiments, this system comprises that further (i) is used for for the code that repeats (a)-(d) from different parent experimenters' specimen, and the code that (ii) is used for determining existing or do not exist in each of described sample the fetal chromosomal aneuploidy of any one or a plurality of distinct portions.
In other embodiments of any system that this paper provides, this code further comprises for automatically recording the code that exists or do not have fetal chromosomal aneuploidy according to (d) determine in for the human experimenter's that the parent specimen is provided patient medical records, wherein uses treater to carry out this record.
In some embodiments of any system that this paper provides, the order-checking device is through being configured to carry out order-checking of future generation (NGS).In some embodiments, the order-checking device is through being configured to the synthesis method order-checking, utilizing reversible dyestuff terminator to carry out extensive parallel order-checking.In other embodiments, the order-checking device is through being configured to carry out the connection method order-checking.In other embodiments again, the order-checking device is through being configured to carry out single-molecule sequencing.
Be used for determining the equipment of fetus mark
Can use a kind of equipment for sample being carried out medical analysis that the information of the mark that relevant one or two genome contributes nucleic acid mixture is provided, carry out the analysis to the sequence label that derives from order-checking sample (for example maternal sample).For instance, provide fetal nucleic acid mark in plurality of devices determines the fetus that exists and parent nucleic acid maternal sample to the sequence label analysis that obtains from the order-checking maternal sample the mixture.The medical supply that provides comprises a series of devices, and these devices are used for carrying out as describe the step of the method that is used for determining the fetus mark in other place of the application.
Figure 65 shows an a kind of embodiment of medical analysis equipment, and this medical analysis equipment is used for determining the fetus mark in the parent specimen of the mixture that comprises fetus and parent nucleic acid.This equipment comprises:
A device (a) is used for receiving described fetus and a plurality of sequence readings of parent nucleic acid from described parent specimen;
A device (b) is used for described a plurality of sequence readings and one or more karyomit(e) reference sequences are compared, and provides thus and the corresponding a plurality of sequence labels of these sequence readings;
A device (c), be used for identification from a number of those sequence labels of one or more interested karyomit(e)s or interested chromosome segment, these karyomit(e)s or chromosome segment are selected from karyomit(e) 1-22, X and Y and section thereof, and be used for each for described one or more interested karyomit(e)s or interested chromosome segment, identification is from a number of those sequence labels of at least one normalization method chromosome sequence or normalization method chromosome segment sequence, to determine a karyomit(e) dosage or chromosome segment dosage, wherein, described interested karyomit(e) or interested chromosome segment have the copy number variation; And
A device (d) is used for determining described fetus mark with the dosage of described interested chromosomal dosage or described interested chromosome segment.
Preferably, the signal output part of this device (a) is connected with this dress (b), and the signal output part of this device (b) is connected with this device (c), and the signal output part of this device (c) is connected with this device (d).
In certain embodiments, the variation of described copy number is that described karyomit(e) dosage by will each karyomit(e) in described one or more interested karyomit(e)s or the interested chromosome segment or chromosome segment compares definite with a respective threshold for each karyomit(e) in described one or more interested karyomit(e)s or the interested chromosome segment or chromosome segment.
Fetus can with copy number variation comprise complete chromosome duplication, complete chromosome deletion, partial replication, part multiplication, partial insertion and excalation.
In certain embodiments, be the number of the sequence label identified for described selected interested karyomit(e) or section and the ratio of the number of the sequence label of identifying for corresponding at least one normalization method chromosome sequence or the normalization method chromosome segment sequence of selected interested karyomit(e) or section by the definite karyomit(e) of device (c) or section Rapid Dose Calculation.In certain embodiments, by the definite described karyomit(e) dosage of device (c) or section Rapid Dose Calculation be the ratio of the sequence label density ratio of at least one corresponding normalization method chromosome sequence of the described selected interested karyomit(e) of the sequence label density ratio of described selected interested karyomit(e) or section and each or section or normalization method chromosome segment sequence.
In certain embodiments, this equipment further comprises device (e), this device (e) is used for calculating a normalization method karyomit(e) value (NCV) or a normalization method section value (NSV), wherein calculating this NCV, that this karyomit(e) dosage and the mean value of corresponding karyomit(e) dosage in a combination lattice sample are carried out is related, as:
NCV iA = R iA - R lU &OverBar; &sigma; iU
Wherein And σ IUEstimation mean value and the standard deviation for i karyomit(e) dosage in this combination lattice sample accordingly, and R IAThat wherein said i karyomit(e) is described interested karyomit(e) for i karyomit(e) dosage that karyomit(e) calculates in the specimen; Wherein calculating this NSV, that this chromosome segment dosage and the mean value of corresponding chromosome segment dosage in a combination lattice sample are carried out is related, as: NSV iA = R iA - R lU &OverBar; &sigma; iU
Wherein
Figure BDA00002366924902214
And σ IUEstimation mean value and the standard deviation for i chromosome segment dosage in this combination lattice sample accordingly, and R IAThat wherein said i chromosome segment is described interested chromosome segment for i chromosome segment dosage that chromosome segment calculates in the specimen.Preferably, the signal output part of device (c) is connected with device (e).
In certain embodiments, the device of this equipment (d) is then determined the fetus mark according to following formula:
ff=2×|NCV iACV iU|
Wherein ff is the fetus fractional value, NCV IAThe normalized karyomit(e) value on i karyomit(e) in an influenced sample (for example, maternal sample to be tested), and CV IUIt is the variation coefficient of the interested chromosomal dosage in these qualified samples, determined; Or determine the fetus mark according to following formula:
ff=2×|NSV iACV iU|
Wherein ff is the fetus fractional value, NSV IAThe normalized chromosomal region segment value on i chromosome segment in an influenced sample (for example, maternal sample to be tested), and CV IUBe the variation coefficient of i chromosomal dosage determining in these qualified samples, wherein said i karyomit(e) is described interested karyomit(e).Preferably, the signal output part of device (e) is connected with device (d).
In certain embodiments, interested karyomit(e) is the X chromosome of euchromosome or male fetus, and interested chromosome segment is selected from the X chromosome of euchromosome or male fetus.
In certain embodiments, this at least one normalization method chromosome sequence or normalization method chromosome segment sequence are for a kind of interested karyomit(e) that is associated or section selected karyomit(e) or section, this carries out in the following manner, that is: (i) identification is for a plurality of qualified samples of this interested karyomit(e) or section; (ii) come for this selected karyomit(e) or chromosome segment double counting karyomit(e) dosage or chromosome segment dosage with a plurality of potential normalization method chromosome sequences or normalization method chromosome segment sequence; And (iii) individually or in a kind of combination this normalization method chromosome sequence or normalization method chromosome segment sequence are selected, thereby in the karyomit(e) dosage that calculates or chromosome segment dosage, provided minimum variability or maximum resolvability.In certain embodiments, the normalization method chromosome sequence is any one or an a plurality of monosome among karyomit(e) 1 to 22, X and the Y; Alternately, normalization method sequence is any chromosomal group chromosome among karyomit(e) 1 to 22, X and the Y.In certain embodiments, the normalization method sector sequence is any one or an a plurality of single section among karyomit(e) 1 to 22, X and the Y; Alternately, the normalization method sector sequence is any one or one group of a plurality of section among karyomit(e) 1 to 22, X and the Y.
In certain embodiments, equipment that be used for to determine the fetus mark further comprises a device, and the described fetus mark that this device is used for using karyomit(e) dosage or chromosome segment dose determination and use show the unbalanced definite fetus mark of information that is present in non-described chromosomal one or more polymorphisms interested of allelotrope from the fetus of parent specimen and parent nucleic acid and compare.
In certain embodiments, this equipment further comprises a sequencing device (10), and this sequencing device (10) is configured to for the fetus of a parent specimen and parent nucleic acid being checked order and obtaining the sequence reading.Preferably, the signal output part of sequencing device (10) is connected with device (a).
In certain embodiments, sequencing device (10) is configured to be used to carrying out the synthesis method order-checking.The synthesis method order-checking can use reversible dyestuff terminator to carry out.In other embodiments, sequencing device (10) is configured to be used to carrying out the connection method order-checking.In other other embodiments, sequencing device (10) is configured to be used to carrying out single-molecule sequencing.
In certain embodiments, sequencing device (10) is arranged in the place that separates with device (a)-(d), and the signal output part of sequencing device (10) and device (a) pass through network connection.
In certain embodiments, comprise that this equipment of sequencing device as described further comprises device (11), this device (11) is used for obtaining the parent specimen from a conceived mother.Can be positioned at the place that separates for the device (11) that obtains the parent specimen and device (a)-(d) and (10).Except comprising device (a)-(d) and (10), this equipment may further include device (12), and this device (12) is used for extracting Cell-free DNA from this parent specimen.In certain embodiments, the device (12) that is used for the extraction Cell-free DNA is positioned at same place with sequencing device (10), and is positioned at a remote site for the device (11) that obtains the parent specimen.
In certain embodiments, this determines that the equipment of fetus mark also comprises a storing device, is used at least temporarily sequence reading of storing device (a) acceptance.Preferably, the signal output part of device (a) is connected with storing device, and the signal output part of storing device is connected with device (b).
For the extra equipment of determining the fetus mark-variation is classified to copy number
A kind of extra medical analysis equipment also is provided, has been used for the copy number variation in the Fetal genome of a maternal sample comprising fetus and parent nucleic acid (for example Cell-free DNA) is classified.This extra equipment comprises for the device of determining the fetus mark and the device of the fetus fractional value that is used for relatively determining by diverse ways.This extra equipment comes the copy number in Fetal genome variation is classified with two fetus marks that calculate.Can be selected from blood, blood plasma, serum or urine samples by the maternal sample that this equipment is used for analyzing.In certain embodiments, maternal sample is plasma sample.Figure 66 shows an embodiment of this type of medical analysis equipment.
In one embodiment, provide a kind of medical analysis equipment for the copy number variation of Fetal genome is classified, this equipment comprises:
Device (1) is used for receiving from the fetus of a specimen and the sequence reading of parent nucleic acid;
Device (2) is used for described sequence reading and one or more karyomit(e) reference sequences are compared, and a plurality of sequence labels corresponding with these sequence readings is provided thus;
Device (3) identifies the number from one or more interested chromosomal these sequence labels, and determines that first an interested karyomit(e) in this fetus makes a variation with a kind of copy number;
Device (4) is used for calculating first a fetus fractional value by a kind of the first method, and this first method is not used the information from these first interested chromosomal these labels;
Device (5) is used for calculating second a fetus fractional value by a kind of the second method, and this second method is used the information from these labels of this first chromosome; And
Device (6) is used for comparing and use this relatively this copy number variation of this first chromosome to be classified this first fetus fractional value and this second fetus fractional value.
Preferably, the signal output part of device (1) is connected with device (2), the signal output part of device (2) is connected with device (3), the device (2) and be connected 3) signal output part be connected with device (4), device (2) and be connected 3) signal output part be connected with device (5), and install (4) and be connected 5) signal output part be connected with device (6).This first interested karyomit(e) can be selected from any among karyomit(e) 1 to 2, X and the Y.
In certain embodiments, this extra equipment also comprises a storing device, is used at least temporarily sequence reading of storing device (1) acceptance.Preferably, the signal output part of device (1) is connected with storing device, and the signal output part of storing device is connected with device (2).
In certain embodiments, the device (4) that be used for to calculate the first method of the first fetus mark comprises uses an assembly that calculates this first fetus fractional value from the information of unbalanced one or more polymorphisms of allelotrope of the fetus that represents this parent specimen and parent nucleic acid, and described polymorphism is present in the non-described first chromosomal karyomit(e) interested; Comprise with the device (5) of this second method that is used for calculating the second fetus fractional value:
(a) assembly (5-1) be used for to calculate number from the sequence label of this first interested karyomit(e) and at least one normalization method chromosome sequence to determine karyomit(e) dosage; With
(b) assembly (5-2) is used for using this second method from this fetus fractional value of this karyomit(e) Rapid Dose Calculation.In certain embodiments, the device (2) and be connected 3) signal output part be connected with assembly (5-1), and the signal output part of assembly (5-1) is connected to assembly (5-2), and the signal output part of assembly (5-2) is connected with device (6).
In certain embodiments, the information that the device of the first method (4) uses comprises the sequence label that obtains by predetermined polymorphic sequence is checked order, and each of described polymorphic sequence comprises described one or more polymorphic site.The information that the device of the first method (4) uses can not obtain by sequence measurement yet, for example, obtains by the non-sequence measurement such as qPCR, digital pcr, mass spectrometry or capillary gel electrophoresis.
In certain embodiments, the device (4) that is used for the first method comprises the assembly that uses this first fetus fractional value of tag computation that comes from karyomit(e) with copy number variation or chromosome segment.For instance, when this first interested karyomit(e) was karyomit(e) 21, the determined fetus mark of sequence label that use can be come from karyomit(e) 21 compared with the determined fetus mark of sequence label that basis comes from the chromosome x in the male fetus.Knownly do not occur with aneuploid state, perhaps in specimen, determined it is not that any karyomit(e) or the chromosome segment of aneuploid (for example determining by calculating its NCV or NSV) may be used to determine the fetus mark by device (4) by any method described here.
In certain embodiments, the device (5) that is used for this second method of this fetus fractional value of calculating further comprises be used to the assembly (5-3) that calculates a normalization method karyomit(e) value (NCV), this assembly (5-3) that wherein be used for to calculate this NCV make this karyomit(e) dosage with carry out at a mean value that makes up the corresponding karyomit(e) dosage of lattice sample related, as:
NCV iA = R iA - R lU &OverBar; &sigma; iU
Wherein
Figure BDA00002366924902261
And σ IUEstimation mean value and the standard deviation for i karyomit(e) dosage in this combination lattice sample accordingly, and R IAThat wherein said i karyomit(e) is described interested karyomit(e) for i karyomit(e) dosage that karyomit(e) calculates in the specimen.
Preferably, the signal output part of assembly (5-1) is connected with assembly (5-3), and the signal output part of assembly (5-3) is connected with assembly (5-2).
In certain embodiments, be used for using this normalization method karyomit(e) value by the second method from the assembly (5-2) of this this fetus fractional value of karyomit(e) Rapid Dose Calculation.Be used for to calculate the assembly (5-2) of device (5) of this second method of this fetus fractional value and assess this fetus mark according to following formula:
ff=2×|NCV iACV iU|
Wherein ff is the second fetus fractional value, NCV IAThe normalized karyomit(e) value on i karyomit(e) in an influenced sample (for example, maternal sample to be tested), and CV IUBe the variation coefficient of i chromosomal dosage determining in described qualified samples, wherein said i karyomit(e) is described interested karyomit(e).
In certain embodiments, the device (4) that calculates the first method of the first fetus mark comprising: (a) assembly (4-1), be used for to calculate sequence label number from the non-described first chromosomal karyomit(e) interested and at least one normalization method chromosome sequence to determine this non-described first chromosomal karyomit(e) dosage interested; And (b) assembly (4-2), be used for by this first method from this first fetus fractional value of this karyomit(e) Rapid Dose Calculation; With, the device (5) that calculates the second method of the second fetus mark comprising: (a) assembly (5-1), be used for to calculate sequence label number from this first interested karyomit(e) and at least one normalization method chromosome sequence to determine a karyomit(e) dosage; And (b) assembly (5-2), be used for by this second method from this second fetus fractional value of this karyomit(e) Rapid Dose Calculation.
Preferably, the device of the first method (4) further comprises an assembly (4-3), the device of the second method (5) further comprises an assembly (5-3), assembly (4-3) and assembly (5-3) calculate respectively normalized karyomit(e) value (NCV), assembly (4-3) and assembly (5-3) are associated assembly (4-1) and the definite karyomit(e) dosage of assembly (5-1) respectively with a mean value that makes up the corresponding karyomit(e) dosage in the lattice sample, as:
NCV iA = R iA - R lU &OverBar; &sigma; iU
Wherein And σ IURespectively estimation mean value and the standard deviation for i in this combination lattice sample chromosomal dosage, and R IAI chromosomal dosage in the specimen of calculating,
Wherein, for the device (4) of this first method, described i karyomit(e) is described the non-described first chromosomal karyomit(e) interested; For the device (5) of this second method, described i karyomit(e) is the described first interested karyomit(e).
Preferably, the signal output part of assembly (4-1) is connected with assembly (4-3), and the signal output part of assembly (4-3) is connected with assembly (4-2), and wherein assembly (4-2) passes through to use described first method of corresponding normalized karyomit(e) value from corresponding karyomit(e) Rapid Dose Calculation the first fetus fractional value; The signal output part of assembly (5-1) is connected with assembly (5-3), and the signal output part of assembly (5-3) is connected with assembly (5-2), and wherein assembly (5-2) passes through to use described second method of corresponding normalized karyomit(e) value from corresponding karyomit(e) Rapid Dose Calculation the second fetus fractional value.
In certain embodiments, the assembly (5-2) of the device (5) of the assembly (4-2) of the device of the first method (4) and the second method passes through the following formula evaluation:
ff=2×|NCV iACV iU|
Wherein ff is the fetus fractional value, NCV IAThe normalized karyomit(e) value on i karyomit(e) in an influenced sample (for example, maternal sample to be tested), and CV IUIt is the variation coefficient of i chromosomal dosage in the described qualified samples;
Wherein, for the device (4) that is used for this first method, described i karyomit(e) is described the non-described first chromosomal karyomit(e) interested; For the device (5) that is used for this second method, described i karyomit(e) is the described first interested karyomit(e).Preferably, when described fetus was the male sex, described the non-described first chromosomal karyomit(e) interested was X chromosome.
In certain embodiments, the device (6) of more described the first fetus fractional value and described the second fetus fractional value is determined whether approximately equal of two fetus fractional values.In certain embodiments, device (6) determines that a kind of ploidy that implies in described the second method supposes real assembly when further being included in described two fetus fractional value approximately equals.The described ploidy hypothesis that implies in described the second method can be, the described first interested karyomit(e) has a kind of complete chromosomal aneuploidy, for example, the described first interested chromosomal complete chromosomal aneuploidy is a kind of monosomy or a kind of trisomy.
In certain embodiments, described extra equipment further comprises a device (7) of analyzing described the first interested chromosomal label information, to determine whether that (i) first interested karyomit(e) is with a kind of part dysploidy, or (ii) this fetus is a mosaic, and the device (7) of wherein analyzing this first interested chromosomal label information is configured to indicate these two fetus fractional values not carry out during approximately equal at the device (6) of described comparison the first fetus fractional value and the second fetus fractional value.Preferably, the signal output part of device (2), (3) and (6) is connected with device (7).
In certain embodiments, in the described extra equipment, the device of the first method (4) comprises uses an assembly that calculates this first fetus fractional value from the information of unbalanced one or more polymorphisms of allelotrope in the fetus that represents this parent specimen and the parent nucleic acid, and described polymorphism is present in the non-described first chromosomal karyomit(e) interested; The device of the second method (5) comprises uses an assembly that calculates this second fetus fractional value from the information of unbalanced one or more polymorphisms of allelotrope in the fetus that represents this parent specimen and the parent nucleic acid, and described polymorphism is present in the described first interested karyomit(e).The information that the device of the first method (4) uses can comprise the sequence label that obtains by predetermined polymorphic sequence is checked order, and each of described polymorphic sequence comprises described one or more polymorphic site.The information that the device of the first method (4) uses can not obtain by sequence measurement yet, for example, obtains by the non-sequence measurement such as qPCR, digital pcr, mass spectrometry or capillary gel electrophoresis.
In certain embodiments, being used for device (6) relatively comprising: determine that when the ratio of described the second fetus fractional value and the first fetus fractional value is approximately 1 the described first interested karyomit(e) is a diplontic assembly; When being approximately 1.5, the ratio of described the second fetus fractional value and the first fetus fractional value determines that the described first interested karyomit(e) is a triploid assembly; With, when being approximately 0.5, the ratio of described the second fetus fractional value and the first fetus fractional value determines that the described first interested karyomit(e) is a haploid assembly.
Preferred, be used for this extra equipment that variation is classified to copy number and further comprise a device (7 ') of analyzing described the first interested chromosomal label information, to determine whether that (i) first interested karyomit(e) is with a kind of part dysploidy, or (ii) this fetus is a mosaic, and it is not to be approximately 1 that the device (7 ') of wherein analyzing this first interested chromosomal label information is configured to indicate the ratio of the second fetus fractional value and the first fetus fractional value at the device (6) of described comparison the first fetus fractional value and the second fetus fractional value, 1.5 or carried out in 0.5 o'clock.Preferably, the signal output part of device (2), (3) and (6) is connected with device (7 ').
In certain embodiments, dissecting needle comprised device (7) or (7 ') of this first interested chromosomal label information: (a) assembly (7-1) is used for this first interested chromosomal sequence vanning is entered a plurality of parts; Whether (b) assembly (7-2), any that be used for to determine described part comprise than one or more other parts is significantly more manyed or remarkable still less nucleic acid; And (c) assembly (7-3), if any one determines that this first interested karyomit(e) does not all comprise significantly more or determines that this fetus is a mosaic during significantly still less nucleic acid with a kind of part dysploidy or comparing described part with one or more other parts when containing significantly more or significantly still less nucleic acid compare described part with one or more other parts if be used for.Preferably, the signal output part of device (2), (3) and (6) is connected with assembly (7-1), and the signal output part of assembly (7-1) is connected to assembly (7-2), and the signal output part of assembly (7-2) is connected to assembly (7-3).In certain embodiments, assembly (7-3) further determine to comprise than one or more other parts significantly more many or this first interested chromosomal part of significantly still less nucleic acid with the part dysploidy.
In certain embodiments, the first interested karyomit(e) is to be selected from lower group, and this group is comprised of karyomit(e) 1-22, X and Y.
In certain embodiments, device (6) comprises that this group is comprised of the following for the assembly that the variation of this copy number is categorized into a classification that is selected from lower group: complete karyomit(e) insertion or multiplication, complete chromosome deletion, chromosome dyad copies and chromosome dyad lacks and mosaic.
In certain embodiments, this extra medical analysis equipment further comprises:
(i) device (8) is used for determining that the copy number variation is to be caused by part dysploidy or mosaic; And
(ii) device (9) is caused by the part dysploidy if be used for this copy number variation, then determines the locus of the part dysploidy on this first interested karyomit(e).
Wherein device (8) and (9) is configured in for the device (6) that this first fetus fractional value and this second fetus fractional value are compared and determines that this first fetus fractional value and this second fetus fractional value do not carry out during approximately equal.Preferably, the signal output part of device (6) is connected to device (8), and the signal output part of device (8) is connected to device (9).In certain embodiments, the device (9) for the locus of determining the part dysploidy on this first interested karyomit(e) comprises for these first interested chromosomal these sequence labels are divided into the nucleic acid data box of this first interested karyomit(e) or the assembly of matrix; And for the assembly that these map tags of each data box are counted.
In certain embodiments, this extra equipment further comprises a sequencing device (10), this sequencing device is configured to the fetus in the parent specimen (for example, blood, blood plasma, serum or urine samples) and parent nucleic acid are checked order and obtains these sequence readings.Preferably, fetus and parent nucleic acid are Cell-free DNA (cfDNA).Preferably, the signal output part of sequencing device (10) is connected with this device (1).
In certain embodiments, sequencing device (10) is configured to carry out the synthesis method order-checking.Can use reversible dyestuff terminator to carry out the synthesis method order-checking.Perhaps, sequencing device (10) is configured to carry out the connection method order-checking.Perhaps, sequencing device (10) is configured to carry out single-molecule sequencing.In certain embodiments, sequencing device (10) and this device (1)-(6) that are used for the extras of classification be positioned at place separately.Preferably, the signal output part of sequencing device (10) is connected with this device (1) by a network.
In certain embodiments, these extras for classification further comprise the device (11) that obtains this parent specimen from mother of pregnancy.Device (11) and device (1)-(6) can be arranged in place separately.In addition, this extra equipment can further include the device (12) that extracts Cell-free DNA from this parent specimen.The device (12) that extracts Cell-free DNA can be arranged in same place with this sequencing device (10), and the device (11) that wherein obtains this parent specimen is arranged in a remote site.
In certain embodiments, device (2) comparison is at least about 100 ten thousand readings.
Test kit
In different embodiments, provide test kit to be used to implement method as herein described.In certain embodiments, these test kits comprise that one or more are for the positive internal contrast of the dysploidy of dysploidy completely and/or part.Typically, but may not, these contrasts comprise internal positive control, these positive controls comprise the nucleotide sequence of the type of wish screening.For example, be used for to determine that maternal sample exists or do not exist the contrast of the test of fetus trisomy (for example trisomy 21) can comprise DNA take trisomy 21 as feature (for example, available from the individual with trisomy 21 DNA).In some embodiments, this contrast comprises the mixture of DNA that has the individual of different dysploidy available from two or more.For example, for the test that determine to have or do not exist 13 trisomys, 18 trisomys, trisomy 21 and X monosomy, this contrast can comprise the combination available from the pregnant woman's who respectively nourishes a fetus with one of the trisomy of testing DNA sample.Except complete chromosomal aneuploidy, can also produce IPC to provide positive control for test, in order to determine to exist or do not exist the dysploidy of part.
In certain embodiments, should comprise the nucleic acid that one or more comprise trisomy 21 (T21) and/or 18 trisomys (T18) and/or 13 trisomys (T13) by (these) positive control.In certain embodiments, comprise that the nucleic acid that existing each trisomy all is T21 is provided in the container separately.In certain embodiments, the nucleic acid that comprises two or more trisomys is provided in the single container.Therefore, for example, in certain embodiments, container can comprise T21 and T18, T21 and T13, T18 and T13.In certain embodiments, container can contain T18, T21 and T13.In these different embodiments, trisomy can equal amount/concentration provide.In other embodiments, trisomy specifically estimated rate provide.In different embodiments, " deposit " solution that contrast can be used as concentration known provides.
In certain embodiments, comprise mixture available from two experimenters' cell genomic dna for detection of the contrast of dysploidy, a people is the genomic contributor of this aneuploid.For example, illustrated as mentioned, the internal positive control (IPC) of the test that be used for to determine in contrast fetus trisomy (for example trisomy 21) that produces can comprise from the sex experimenter's who carries this trisomy chromosome genomic dna and combination from the genomic dna of the known female subjects that does not carry this trisomy chromosome.In certain embodiments, shear this genomic dna with provide between about 100-400bp, between about 150-350bp or the fragment between about 200-300bp simulate circulation cfDNA fragment in the maternal sample.
In certain embodiments, in this contrast from the ratio of the DNA of the experimenter's who carries dysploidy (for example trisomy 21) fragmentation through selecting the ratio with the circulation fetus cfDNA that is found in the simulation maternal sample, in order to the IPC of the mixture that comprises fragmentation DNA is provided, this mixture comprises about 5%, about 10%, about 15%, about 20%, about 25%, about 30% DNA from the experimenter who carries this dysploidy.In certain embodiments, this contrast comprises the DNA from the different experimenters that respectively carry different dysploidy.For example, IPC can comprise about 80% unaffected women DNA, and all the other 20% can be DNA from three different experimenters that respectively carry trisomy chromosome 21, trisomy chromosome 13 and trisomy chromosome 18.
In certain embodiments, should (these) contrast comprise the cfDNA that nourishes the parent of the fetus with known chromosomal aneuploidy available from known.For example, these contrasts can comprise the cfDNA available from the pregnant woman who nourishes the fetus with trisomy 21 and/or 18 trisomys and/or 13 trisomys.This cfDNA can extract from maternal sample, and be cloned in the bacteria carrier and in bacterium growth so that continual IPC source to be provided.Scheme can increase through clone's cfDNA by for example PCR as an alternative.
Although existing contrast is above to state with respect to trisomy in the test kit, it need not to be so limited.Should be appreciated that, can produce the dysploidy that existing positive control in the test kit embodies other parts, comprise for example different section amplification and/or disappearance.Therefore, for example, in the relevant situation of the specific amplification of known different cancer and complete in fact chromosome arm or disappearance, this (these) positive control can comprise among karyomit(e) 1-22, X and the Y any one or a plurality of galianconism or long-armed.In certain embodiments, this contrast comprises the amplification of one or more arm that is selected from lower group, and this group is comprised of the following: 1q, 3q, 4p, 4q, 5p, 5q, 6p, 6q, 7p, 7q, 8p, 8q, 9p, 9q, 10p, 10q, 12p, 12q, 13q, 14q, 16p, 17p, 17q, 18p, 18q, 19p, 19q, 20p, 20q, 21q and/or 22q (referring to for example table 2).
In certain embodiments, these contrasts comprise the dysploidy (breast cancer of for example being correlated with the amplification at 20Q13 place) for known and specific amplification or the relevant any zone of disappearance.Illustrative area includes but not limited to 17q23 (relevant from breast cancer), 19q12 (relevant with ovarian cancer), 1q21-1q23 (relevant with sarcoma and different solid tumor), 8p11-p12 (relevant with breast cancer), ErbB2 amplicon etc.In certain embodiments, these contrasts comprise as showing amplification or the disappearance of the chromosomal region as shown in any one among the 3-6.In certain embodiments, these contrasts comprise amplification or the disappearance that comprises such as the chromosomal region of showing the gene as shown in any one among the 3-6.In certain embodiments, these contrasts comprise and comprise a plurality of nucleotide sequences that these nucleotide sequences comprise the amplification of the nucleic acid that comprises one or more oncogene.In certain embodiments, these contrasts comprise a plurality of nucleotide sequences, these nucleotide sequences comprise the amplification of the nucleic acid that comprises one or more gene that is selected from lower group, the consisting of of this group: MYC, ERBB2 (EFGR), CCND1 (cyclin D1), FGFR1, FGFR2, HRAS, KRAS, MYB, MDM2, CCNE, KRAS, MET, ERBB1, CDK4, MYCB, ERBB2, AKT2, MDM2 and CDK4.
It is illustrative and not restrictive that above-mentioned contrast is intended to.The content of teaching of using this paper to provide, those of ordinary skill in the art can identify many other contrasts that are fit to be attached in the test kit.
In different embodiments, except these contrasts or as the substituting of these contrasts, these test kits comprise nucleic acid and/or the nucleic acid mimics that one or more provide the marker sequence that is fit to tracking and definite sample integrity.In certain embodiments, these markers comprise the antigene strand sequence.In certain embodiments, the length of these marker sequences arrives up to about 600bp length or about 100bp at about 30bp and arrives in about 400bp length range.In certain embodiments, length that should (these) marker sequence is 30bp (or nt) at least.In certain embodiments, this marker is connected to aptamer, and the length of the marker molecules of this aptamer connection is between about 200bp (or nt) and the about 600bp (or nt), between about 250bp (or nt) and the 550bp (or nt), between about 300bp (or nt) and 500bp (or nt) or between about 350 and 450.In certain embodiments, the length of the marker molecules of this aptamer connection is about 200bp (or nt).In certain embodiments, the length of marker molecules can be about 150bp (or nt), about 160bp (or nt), 170bp (or nt), about 180bp (or nt), about 190bp (or nt) or about 200bp (or nt).In certain embodiments, the length of marker is in about 600bp (or nt) scope.
In certain embodiments, this test kit provides at least two or at least three or at least four or at least five or at least six or at least seven or at least eight or at least nine or at least ten or at least 11 or at least 12 or at least 13 or at least 14 or at least 15 or at least 16 or at least 17 or at least 18 or at least 19 or at least 20 or at least 25 or at least 30 or at least 35 or at least 40 or at least 50 different sequences.Provide different nucleic acid and/or the nucleic acid mimics of this (these) marker sequence can be stored in container/bottle separately.Alternately, different marker molecules can be kept in identical container/bottle.
In different embodiments, these markers comprise one or more DNA, or these markers comprise one or more dna analogs.Suitable stand-in include but not limited to morpholinyl-derivatives, peptide nucleic acid(PNA) (PNA) and phosphorothioate DNA.In different embodiments, these markers are attached in these contrasts.In certain embodiments, be attached to these markers in the aptamer and/or provide and be connected to aptamer.
In certain embodiments, this test kit further comprises one or more order-checking aptamer.These aptamers include but not limited to the order-checking aptamer of indexing.In certain embodiments, these aptamers comprise the sub-thread arm, and this sub-thread arm comprises an index sequence and one or more PCR priming site.
In certain embodiments, this test kit further comprises a sample collection device and is used for the collection of biological sample.In certain embodiments, this sample collection device comprises that be used for to be collected the device of blood and optionally, and one for the container that holds blood.In certain embodiments, this test kit comprises a container that is used for holding blood, and this container comprises anti-coagulant and/or cell fixing agent and/or one or more antigene strand marker sequences.
In certain embodiments, this test kit further comprises DNA extraction reagent (for example isolation medium and/or elution solution).This test kit can also comprise the reagent that checks order for to the library preparation.These reagent include but not limited to for the solution of terminal DNA plerosis and/or are used for the solution of dA tail DNA and/or are used for the solution that aptamer connects DNA.In certain embodiments, this test kit further comprises a kind of composition that comprises one or more primer sets, this or these primer set is used at least one previously selected polymorphic nucleic acid of maternal sample is increased, wherein each previously selected polymorphic nucleic acid comprises at least one polymorphic site, and wherein the forward or backwards primer in each primer set enough hybridizes to be included in by the previously selected polymorphic nucleic acid through amplification being carried out in the sequence reading that described extensive parallel order-checking produces near the dna sequence dna of described polymorphic site with one.Previously selected polymorphic sequence through amplification is checked order and can as described at the application's elsewhere, be used for determining the fetus mark of maternal sample.Previously selected polymorphic nucleic acid can comprise SNP or STR.In certain embodiments, at least one primer in each described primer set is designed to be identified in a polymorphic site that exists in the sequence reading of about 25bp, about 40bp, about 50bp or about 100bp.In certain embodiments, the hybridization of primer set and described dna sequence dna produces at least about 100bp, at least about 150bp or at least about the amplicon of 200bp.Primer set can be hybridized with the dna sequence dna that exists at the phase homologous chromosomes, or primer set can be hybridized with the dna sequence dna that exists at the coloured differently body.In certain embodiments, primer set is not hybridized with the dna sequence dna that exists at karyomit(e) 13,18,21, X or Y.
For implementing these methods and being illustrated in Figure 67 and 68 with the test kit that provides is provided multiple device as described herein embodiment.In one embodiment, test kit is for determining that the fetus mark provides.As shown in Figure 67, test kit comprises a test kit main body (1), is arranged in the clamping slot that is used for bottle rack in the test kit main body, the bottle (2) that comprises internal positive control; Comprise and be suitable for following the trail of and the bottle (3) of the marker nucleic acid of definite sample integrity and the bottle (4) that comprises buffered soln.
Test kit can comprise a plurality of extra bottles, and each in wherein said a plurality of bottles comprises different internal positive controls or different marker nucleic acid.
In certain embodiments, bottle (2) comprises two or more internal positive controls.This internal positive control comprises and is selected from lower group trisomy that this group is comprised of the following: trisomy 21, trisomy 18, trisomy 21, trisomy 13, trisomy 16, trisomy 13, trisomy 9, trisomy 8, trisomy 22, XXX, XXY and XYY.In certain embodiments, internal positive control comprises and is selected from lower group trisomy that this group is comprised of the following: trisomy 21 (T21), trisomy 18 (T18) and trisomy 13 (T13).In other embodiments, the internal positive control that is loaded in the bottle (2) comprises trisomy 21 (T21), trisomy 18 (T18) and trisomy 13 (T13).Alternately, included positive control can comprise amplification or the disappearance of the one or more part among karyomit(e) 1 to 22, X and the Y in the test kit.In certain embodiments, positive control comprises among karyomit(e) 1 to 22, X and the Y any one or an a plurality of galianconism or long-armed amplification or disappearance.In certain embodiments, bottle (2) comprises amplification or the disappearance of the one or more arms that are selected from lower group, and this group is comprised of the following: 1q, 3q, 4p, 4q, 5p, 5q, 6p, 6q, 7p, 7q, 8p, 8q, 9p, 9q, 10p, 10q, 12p, 12q, 13q, 14q, 16p, 17p, 17q, 18p, 18q, 19p, 19q, 20p, 20q, 21q and 22q.In other embodiments, bottle (2) comprises the amplification in a zone that is selected from lower group, and this group is comprised of the following: 20Q13,19q12,1q21-1q23,8p11-p12 and ErbB2.Alternately, the positive control that is loaded in the bottle (2) is included in a zone of displaying in table 3, table 4, table 5 and the table 6 or the amplification of a gene.In certain embodiments, be loaded into positive control in the bottle (2) and comprise a zone being selected from lower group or the amplification of a gene, this group is comprised of the following: MYC, ERBB2 (EFGR), CCND1 (cycle element D1), FGFR1, FGFR2, HRAS, KRAS, MYB, MDM2, CCNE, KRAS, MET, ERBB1, CDK4, MYCB, ERBB2, AKT2, MDM2 and CDK4.
Marker nucleic acid (having another name called marker molecules (MM)) included in a plurality of embodiments of test kit is antigene strand marker sequence.The length of these marker sequences can from about 30bp in about 600bp length range.In other embodiments, the length of these marker sequences from about 100bp in about 400bp length range.In certain embodiments, this test kit comprises at least 2, or at least 3, or at least 4, or at least 5, or at least 6, or at least 7, or at least 8, or at least 9, or at least 10, or at least 11, or at least 12, or at least 13, or at least 14, or at least 15, or at least 16, or at least 17, or at least 18, or at least 19, or at least 20, or at least 25, or at least 30, or at least 35, or at least 40, or at least 50 bottles that are used for different marker sequences.
In certain embodiments, included marker comprises one or more DNA in the test kit.In other embodiments, marker comprises one or more stand-in that are selected from lower group, and this group is comprised of the following: morpholino derivative, peptide nucleic acid(PNA) (PNA) and phosphorothioate DNA.
In certain embodiments, marker is attached in the described contrast.In other embodiments, marker is attached in the aptamer.In certain embodiments, the bottle of test kit (3) can further load one or more order-checking aptamers.Aptamer comprises the order-checking aptamer of indexing.These aptamers may further include the sub-thread arm, and this sub-thread arm comprises an index sequence and one or more PCR priming site.
Figure 68 shows the sketch of test kit, and this test kit may further include a sample collection device that is used for the collection of biological sample.This sample collection device comprises the device (5) and the container (6) that is used for holding blood that are used for collecting blood.In certain embodiments, device and the described container be used to holding blood that should be used for collect blood comprise anti-coagulant and cell fixing agent.
In certain embodiments, test kit may further include bottle (7), and this bottle (7) is loaded with DNA extraction reagent.Should can comprise a kind of isolation medium and/or a kind of elute soln by (these) DNA extraction reagent.
In certain embodiments, this test kit further comprises bottle (8), and this bottle (8) is loaded with the reagent for the preparation of sequencing library.Can comprise solution for terminal DNA plerosis, be used for DNA is carried out the solution of dA tailing and is used for DNA is carried out the solution that aptamer connects for the preparation of these reagent of sequencing library.
In other embodiments, this test kit further comprises bottle (9), and this bottle (9) comprises the composition for the primer that intended target nucleic acid is increased.
In certain embodiments, this test kit further comprises and teaches the guiding material of determining the fetus mark in the biological sample with described reagent.These guiding materials have been taught with these materials and have been detected trisomy or monosomy.In certain embodiments, these guiding materials have been taught the susceptibility that detects cancer or cancer with these materials.
In addition, these test kits optionally comprise mark and/or guiding material, provide guidance (for example scheme) for the reagent and/or the device that provide in this test kit are provided.For example, these guiding materials can be taught with these reagent and prepare copy number variation in sample and/or the definite biological sample.In certain embodiments, these guiding materials are taught with these materials and are detected trisomy.In certain embodiments, these guiding materials are taught the susceptibility that detects cancer or cancer with these materials.
Although the guiding material in the different test kits typically comprises material hand-written or printing, they are not limited to this.This paper is contained can store these instructions and with any media of they and end user UNICOM.These media include but not limited to electronic storage medium (such as magnetic disc, tape, pick up head, chip), optical media (such as CD ROM) etc.These media can comprise the address that arrives the internet site that these guiding materials are provided.
Describe in further detail diverse ways, device, system and purposes in following instance, these examples never are intended to limit the desired scope of the invention.Accompanying drawing wishes to be considered the integral part of this specification sheets and the present invention's explanation.Provide following instance with explanation rather than limit desired the present invention.
Experiment
Example 1
Sample preparation and cfDNA extract
Collect peripheral blood sample from being in gravidic first trimenon or second trimenon and being considered to exist in pregnant woman's body of fetus dysploidy risk.Before blood drawing, obtain letter of consent from each participant.Before amniocentesis or chorionic villus sampling, collect blood.Use chorionic villus or amniocentesis sample to carry out karyotyping to determine the fetus caryogram.
To be collected in the ACD pipe from the peripheral blood that each experimenter extracts.One pipe blood sample (about 6 to 9 milliliters/pipe) is transferred in 15 milliliters of low-speed centrifugal pipes.Use Beckman Allegra 6R whizzer and GA 3.8 type rotors, under 2640rpm, 4 ℃ with centrifugal blood 10 minutes.
Extract for cell-free plasma, the top plasma layer is transferred in 15 milliliters of high speed centrifugation pipes, and use Beckman Ku Erte Avanti J-E whizzer and JA-14 rotor, lower centrifugal 10 minutes of 16000 * g, 4 ℃.Behind blood collecting, in 72 hours, carry out two centrifugation step.The cell-free plasma that will comprise cfDNA is stored under-80 ℃, and only thaws once before blood plasma cfDNA amplification or cfDNA purifying.
Use the small-sized test kit of QIAamp blood DNA (Kai Jie) (QIAamp Blood DNA Mini kit (Qiagen)), basically from cell-free plasma, extract purified Cell-free DNA (cfDNA) according to manufacturer specification.One milliliter of buffer A L and 100 μ l protein enzyme solutions are added in the 1ml blood plasma.Under 56 ℃, this mixture was hatched 15 minutes.One milliliter of 100% ethanol is added in the blood plasma Digestive system.With the gained mixture transfer to QIAvac 24Plus column combination spare (Kai Jie) (QIAvac 24Plus columnassembly (Qiagen)) in the QIAamp micro-column of the VacValve that provides and VacConnector combination.Apply vacuum to sample, and under vacuum, with 750 μ l buffer A W1 the cfDNA that is trapped on the post strainer is washed, then carry out the washing second time with 750 μ l buffer A W24.Under 14,000RPM with centrifugal 5 minutes of this post in order to from strainer, remove any remaining damping fluid.By centrifugal with buffer A E elution cfDNA under 14,000RPM, and use QubitTM to quantize platform (QubitTMQuantitation Platform) (hero (Invitrogen)) and determine concentration.
Example 2
Initial and through preparation and the order-checking of the sequencing library of enrichment
A. prepare sequencing library-shortening stipulations (ABB)
All sequencing libraries, namely initial and through the library of enrichment, all prepared by the purified cfDNA of the about 2ng that from Maternal plasma, extracts.Use reagent N EBNext TMDNA sample preparation DNA reagent collection 1 (NEBNext TMDNA Sample Prep DNA Reagent Set1) (Item Number E6000L; Knob Great Britain biology laboratory (New England Biolabs), Ipswich, Massachusetts) following
Figure BDA00002366924902391
Carry out the library preparation.Because cell-free plasma DNA is actually into fragment, therefore no longer make this plasma dna sample become fragment by spray method or sonication.According to
Figure BDA00002366924902392
Terminal reparation module ( End RepairModule), by with cfDNA and NEBNext TM5 μ l, the 10 * phosphorylation damping fluid that provides in the DNA sample preparation DNA reagent collection 1,2 μ l deoxynucleotide solution mixtures (the every dNTP of 10mM), 1: 5 DNA polymerase of 1 μ l I diluent, 1 μ l T4 DNA polymerase and 1 μ l T4 polynucleotide kinase were hatched under 20 ℃ 15 minutes in the 1.5ml Eppendorf tube together, and the overhang of the purified cfDNA fragment of about 2ng contained among the 40 μ l is changed into blunt end through phosphorylation.Then this enzyme carried out hot deactivation in 5 minutes by at 75 ℃ this reaction mixture being hatched.This mixture is cooled to 4 ℃, and uses 10 μ l to comprise the dA tailing master mixed solution (NEBNext of Klenow fragment (3 ' to 5 ' exo minus) TMDNA sample preparation DNA reagent collection 1) and under 37 ℃, hatch the dA tailing of realizing blunt end DNA in 15 minutes.Subsequently, by under 75 ℃, this reaction mixture being hatched 5 minutes Klenow fragment carried out hot deactivation.After the Klenow fragment deactivation, use NEBNext TMThe 4 μ l T4DNA ligase enzymes that provide in the DNA sample preparation DNA reagent collection 1 used 1 μ l Yi Lumina genome aptamer oligomerization mixture (Illumina Genomic Adaptor Oligo Mix) (Item Number 1000521 in 15 minutes by under 25 ℃ reaction mixture being hatched; Yi Lumina company, Hayward, California) 1: 5 diluent with Yi Lumina aptamer (non-index Y aptamer (Non-Index Y-Adaptors)) to the DNA with the dA tail.This mixture is cooled to 4 ℃, and uses An Jinkete (Agencourt) AMPure XP PCR purification system (Item Number A63881; Beckman Ku Erte genome, Dan Fusi, Massachusetts) in the magnetic bead that provides, be purified into the cfDNA that connects through aptamer in aptamer, aptamer dimer and other reagent that never connects.Use
Figure BDA00002366924902394
High-fidelity master's mixed solution (25 μ l; Fragrant appearance beautiful (Finnzymes), Wo Ben, Massachusetts) carries out 18 PCR circulations with the Yi Lumina PCR primer that is connected aptamer (each 0.5 μ M) (Item Number 1000537 with are connected) so that the cfDNA (25 μ l) that connects of enrichment aptamer optionally.Use Yi Lumina Genomic PCR primer (Item Number 100537 and 1000538) and NEBNext TMThe Phusion HF PCR master mixed solution that provides in the DNA sample preparation DNA reagent collection 1, the DNA that aptamer is connected according to manufacturer specification carry out PCR (98 ℃, 30 seconds; 98 ℃, 10 seconds, 18 circulations; 65 ℃, 30 seconds; And 72 ℃, 30 seconds; 72 ℃ of lower final extensions 5 minutes, and remain on 4 ℃).Use An Jinkete AMPure XP PCR purification system (Agencourt AMPure XP PCRpurification system) (An Jinkete biotechnology company (Agencourt Bioscience Corporation), Billy's Buddhist, the Massachusetts), come purifying through the product of amplification according to the manufacturer specification that can obtain at www.beckmangenomics.com/products/AMPureXPProtocol_000387 v001.pdf place.The purified amplification product of elution in the triumphant outstanding EB damping fluid of 40 μ l (Qiagen EBBufferQiagen EB Buffer), and use for 2100 bioanalysis devices (Agilent technology company (Agilent technologies Inc.), Santa Clara, California) Agilent DNA 1000 test kits come concentration and the size distribution in analysing amplified library.
B. prepare sequencing library-total length stipulations
Total length stipulations described herein are the Standards Code that Yi Lumina provides basically, and only different from the Yi Lumina stipulations aspect the purifying in amplification library.The indication of Yi Lumina stipulations use gel electrophoresis purifying amplification library, and stipulations as herein described is carried out identical purification step with magnetic bead.Use for
Figure BDA00002366924902401
NEBNext TMDNA sample preparation DNA reagent collection 1 (Item Number E6000L; Knob Great Britain biology laboratory, Ipswich, Massachusetts), basically according to manufacturer specification, prepare initial sequencing library with the purified cfDNA that from Maternal plasma, extracts of about 2ng.Except aptamer connection product being carried out final purifying (this step is to use An Jinkete magnetic bead and reagent rather than purification column to carry out), institute is in steps all according to genome dna library sample preparation NEBNext TMThe appended stipulations of reagent are carried out, and use in this DNA library GAII checks order.NEBNext TMStipulations follow the stipulations that Yi Lumina provides basically, and the Yi Lumina stipulations can obtain at grcf.jhml.edu/hts/protocols/11257047_ChIP_Sample_Prep.pd f place.
According to
Figure BDA00002366924902403
The terminal module of repairing, in recirculation heater, under 20 ℃, hatched 30 minutes in 200 μ l Eppendorf tubes by 5 μ l, 10 * phosphorylation damping fluid, 2 μ l deoxynucleotide solution mixtures (the every dNTP of 10mM), 1: 5 DNA polymerase of 1 μ l I diluent, 1 μ l T4DNA polymerase and 1 μ l T4 polynucleotide kinase that 40 μ l cfDNA are provided in NEBNextTM DNA sample preparation DNA reagent collection 1, the overhang of the purified cfDNA fragment of about 2ng contained among the 40 μ l is changed into blunt end through phosphorylation.Sample is cooled to 4 ℃, and the following purifying that carries out of QIAQuick post that provides in the QIAQuick PCR purification kit (Kai Jie company, Valencia, California) is provided.50 μ l reactant transfer in the 1.5ml Eppendorf tube, and are added the triumphant outstanding damping fluid PB of 250 μ l.Gained 300 μ l are transferred in the QIAquick post, in Eppendorf centrifuge under 13,000RPM with its centrifugal 1 minute.With the triumphant outstanding damping fluid PE of 750 μ l this post is washed, and centrifugal again.Removed residual ethanol in centrifugal 5 minutes by under 13,000RPM, adding.In the triumphant outstanding damping fluid EB of 39 μ l, come elution DNA by centrifugal.Use 16 μ l to comprise the dA tailing master mixed solution (NEBNext of Klenow fragment (3 ' to 5 ' exo minus) TMDNA sample preparation DNA reagent collection 1) and according to manufacturers
Figure BDA00002366924902411
DA tailing module is hatched the dA tailing of realizing 34 μ l blunt end DNA in 30 minutes under 37 ℃.Sample is cooled to 4 ℃, and the following purifying that carries out of post that provides in the MinElute PCR purification kit (Kai Jie company, Valencia, California) is provided.50 μ l reactant transfer in the 1.5ml Eppendorf tube, and are added the triumphant outstanding damping fluid PB of 250 μ l.300 μ l are transferred in the MinElute post, in Eppendorf centrifuge under 13,000RPM with its centrifugal 1 minute.With the triumphant outstanding damping fluid PE of 750 μ l this post is washed, and centrifugal again.By under 13,000RPM, removing again residual ethanol in centrifugal 5 minutes.In the triumphant outstanding damping fluid EB of 15 μ l by centrifugal elution DNA.According to Connect fast module, ten microlitre DNA eluants were hatched under 25 ℃ 15 minutes with 1: 5 Yi Lumina genome of 1 μ l aptamer oligomerization mixture diluted liquid (Item Number 1000521), the quick ligation damping fluid of 15 μ l 2X and the quick T4DNA ligase enzyme of 4 μ l.Sample is cooled to 4 ℃, and uses the following purifying that carries out of MinElute post.The triumphant outstanding damping fluid PE of 150 microlitres is added in the 30 μ l reactants, and whole volume is transferred in the MinElute post, in Eppendorf centrifuge under 13,000RPM with its centrifugal 1 minute.With the triumphant outstanding damping fluid PE of 750 μ l this post is washed, and centrifugal again.By under 13,000RPM, removing again residual ethanol in centrifugal 5 minutes.In the triumphant outstanding damping fluid EB of 28 μ l by centrifugal elution DNA.Use Yi Lumina Genomic PCR primer (Item Number 100537 and 1000538) and NEBNext TMThe Phusion HFPCR master mixed solution that provides in the DNA sample preparation DNA reagent collection 1, the DNA eluant that 23 microlitres is connected through aptamer according to manufacturer specification carry out 18 PCR circulations (98 ℃, 30 seconds; 98 ℃, 10 seconds, 18 circulations; 65 ℃, 30 seconds; And 72 ℃, 30 seconds; 72 ℃ of lower final extensions 5 minutes, and remain on 4 ℃).Use An Jinkete AMPure XP PCR purification system (An Jinkete biotechnology company, Billy's Buddhist, the Massachusetts), come purifying amplification product according to the manufacturer specification that can obtain at www.beckmangenomics.com/products/AMPureXPProtocol_000387 v001.pdf place.An Jinkete AMPure XP PCR purification system will be removed unconjugated dNTP, primer, primer dipolymer, salt and other pollutents, and reclaim the amplicon greater than 100bp.In the triumphant outstanding EB damping fluid of 40 μ l from the An Jinkete bead elution amplification product, and use for 2100 bioanalysis device (Agilent technology companys, Santa Clara, California) Agilent DNA 1000 test kits are analyzed the size distribution in library.
C. analyze according to the sequencing library that shortens the preparation of (a) and total length (b) stipulations
The electrophorogram that is produced by the bioanalysis device is in shown in Figure 21 A and the 21B.Figure 21 A shows the total length stipulations described in the use (a) by the electrophorogram of the library DNA of the cfDNA preparation that comes from plasma sample M24228 purifying, and Figure 21 B shows the total length stipulations described in uses (b) by from plasma sample M24228 purifying and the electrophorogram of the library DNA of the cfDNA that comes preparation.In two width of cloth figure, peak value 1 and 4 all represents the next interior mark of 15bp and 1,500 upper interior mark accordingly; The digital watch of peak value top is the migration number of times of storehouse fragment expressly; And sea line shows the setting threshold of integration.Electrophorogram among Figure 21 A shows a minor peaks of the fragment with 187bp and has a main peak value of the fragment of 263bp, and the electrophorogram among Figure 21 B only shows the peak value at a 265bp place.Peak area is carried out integration, and the DNA calculating concentration that obtains 187bp peak value among Figure 21 A is 0.40ng/ μ l, and the DNA concentration of 263bp peak value is 7.34ng/ μ l among Figure 21 A, and the DNA concentration of 265bp peak value is 14.72ng/ μ l among Figure 21 B.The Yi Lumina aptamer of the known cfDNA of being connected to is 92bp, when it is deducted from 265bp, shows that the peak value size of cfDNA is 173bp.The minor peaks at 187bp place may represent the fragment of two primers of end-to-end link.When using the shortening stipulations, from the product of final library, eliminate Linear Double primer fragment.Shorten stipulations and also can eliminate other small segments more less than 187bp.In this example, to connect the concentration of cfDNA be the twice that the aptamer that uses the total length stipulations to produce connects the concentration of cfDNA for purified aptamer.Point out that these aptamers connect the concentration of cfDNA fragment all the time greater than using total length stipulations winners (data are not shown).
Therefore, an advantage using the shortening stipulations to prepare sequencing library is, the library that obtains only is included in a main peak in the 262-267bp scope all the time, and use the quality in the library of total length stipulations preparation to change, embody such as number and the mobility of the peak value except the peak value that represents cfDNA.Non-cfDNA product will occupy the space on the flow cell and reduce the quality of cluster amplification and subsequently sequencing reaction imaging, and this is the basis of the overall assignment of dysploidy state.According to the show, shorten the order-checking that stipulations do not affect the library.
Another advantage of using the shortening stipulations to prepare sequencing library is that blunt end, dA tailing and aptamer connect the step cost of these three enzymes can be finished less than one hour, thereby supports checking and the enforcement of rapid aneuploidy diagnosis service.
Another advantage is, blunt end, dA tailing and aptamer connect the step of these three enzymes to carry out in same reaction tubes, thereby avoided repeatedly sample to shift, sample shifts may cause loss of material, and the more important thing is and may cause sample mix and sample contamination.
Example 3
Prepare sequencing library by the cfDNA that does not repair: the aptamer in the solution connects
In order to determine whether further to shorten in order to further accelerate sample analysis shortening stipulations, make sequencing library and use as discussed previously order-checking of Yi Lumina genome analysis instrument II by the cfDNA that does not repair.
As described hereinly prepare cfDNA by peripheral blood sample.Do not carry out by open stipulations desired 5 ' phosphatic blunt end and the phosphorylation for the Yi Lumina platform, in order to the cfDNA sample of not repairing is provided.
Can determine, omit quality or productive rate (data are not shown) that DNA reparation or DNA reparation and phosphorylation do not affect sequencing library.
For 2 footworks in the solution of the DNA that does not repair that does not index
Concentrate at first experiment, by combination Ke Lienuo Exo-in same reaction mixture and T4-DNA ligase enzyme the cfDNA that does not repair being carried out the dA tailing simultaneously is connected with aptamer, as follows: that the cfDNA of 30 microlitre concentration between 20-150pg/ μ l carried out dA tailing (5 μ l 10X2 NEB damping fluids, 2 μ l 10nM dNTP, 1 μ l 10nM ATP and 1 μ l 5000U/ml gram row promise Exo-), and use 1 μ l 400,000U/ml T4-DNA ligase enzyme is connected to Yi Lumina Y aptamer (1: 15 diluent of 1 μ l, 3 μ M storing solutions) in the reaction volume of 50 μ l.The Y aptamer of not indexing derives from Yi Lumina.The reactant of combination was hatched under 25 30 minutes.Under 75 ℃, enzyme is carried out hot deactivation 5 minutes, and reactor product is stored under 10 ℃.
The product that aptamer connects uses SPRI bead (An Jinkete AMPure XP PCR purification system, Beckman Ku Erte genomics) to carry out purifying and carries out 18 PCR circulations.Use SPRI that purifying is carried out in the library through pcr amplification, and use Yi Lumina genome analysis instrument IIx or HiSeq to check order according to manufacturer specification, in order to obtain the single-ended reading of 36bp.Obtain many 36bp readings, cover about 10% genome.After finishing sample order-checking, Yi Lumina " order-checking device control software/real-time analysis " judges that with base file transfers on the network that connects storing device in order to carry out data analysis with binary format.Utilization is designed for the software that moves at the Linux server and comes the analytical sequence data, this software application Yi Lumina " BCLConverter " changes into human readable text with the judgement of binary format base, then call " Bowtie " program of increasing income so as with sequence with compare with reference to human genome, can this stems from the hg18 genome that NCBI (National Center forBiotechnology Information) provides with reference to human genome, and (NCBI36/hg18 be with http://genome.ucsc.edu/cgi-bin/hgGateway on the World Wide Web? org=Human﹠amp; Db=hg18﹠amp; Hgsid=166260105 obtains).
This software reads above the program process that produces and the sequence data of comparing uniquely from the genome of Bowtie output (bowtieout.txt file).Allow to have the at the most sequence alignment of 2 base mispairings, and only when itself and genome are compared uniquely, be included in and compare in the counting.Get rid of and have identical beginning and the sequence alignment of end coordinate (copy).To have 2 or about 500 to 2,500 ten thousand 36bp labels of being less than 2 mispairing and be mapped to uniquely human genome.To all map tags count and be included in the test and qualified samples in the karyomit(e) Rapid Dose Calculation in.Extend to base 2 * 10 from base 0 6, base 10 * 10 6To base 13 * 10 6And base 23 * 10 6Zone to karyomit(e) Y end is got rid of from analyze definitely, because stem from the label mapping of sex fetus to these zones of Y chromosome.
Figure 22 A shows and works as according to shortening stipulations (ABB; When ◇) preparing sequencing library and when repairing 2 footwork (INSOL according to nothing; ) preparation is mapped to the mean value (n=16) of per-cent (% karyomit(e) N) of sum of the sequence label of each human chromosomal during sequencing library.These data presentation, when being mapped to corresponding chromosomal label per-cent when using the shortening method and comparing, use without repairing 2 footworks to prepare be mapped to chromosomal label and being mapped to of less per-cent with lower GC content that sequencing library produces larger per-cent and have the more chromosomal label of high GC content.Figure 22 b, and shows without restorative procedure and reduces sequence offset along with the karyomit(e) size variation about sequence label per-cent.Available from according to shortening stipulations (ABB; △) and in the solution (2 go on foot without repairing stipulations; ) regression coefficient of the map tags of the sequencing library of preparation is R accordingly 2=0.9332 and R 2=0.9806.
Table 8. per-cent GC content/karyomit(e)
? Size (Mbp) GC(%) ? Size (Mbp) GC(%)
Chr1 247 41.37 Chr13 114 38.24
Chr2 243 39.44 Chr14 106 40.85
Chr3 199 38.74 Chr15 100 41.80
Chr4 191 38.60 Chr16 89 44.64
Chr5 181 39.35 Chr17 79 45.01
Chr6 171 39.94 Chr18 76 39.66
Chr7 159 39.78 Chr19 63 48.21
Chr8 146 40.30 Chr20 62 42.05
Chr9 140 40.17 Chr21 47 40.68
Chr10 135 40.43 Chr22 50 47.64
Chr11 134 41.37 ChrX 155 39.26
Chr12 132 40.59 ChrY 58 37.74
The shortening method changes along with each chromosomal GC percentage composition with the ratio that more also is counted as being mapped to independent chromosomal label per-cent and being mapped to independent chromosomal label per-cent when using without restorative procedure when using the shortening method that nothing is repaired 2 footworks.Calculate (people such as Constantine Buddhist nun (Constantini), genome research (Genome Res) 16:536-541[2006]) and be provided in the table 8 with respect to the public information that the GC percentage composition of karyomit(e) size is based on chromosome sequence and GC content subregion.The result is provided among Figure 22 C, the figure shows for the chromosomal ratio with high GC content significantly to reduce, and increases for the chromosomal ratio with low GC content.These data are clear to be shown, the normalization method effect that is used for overcoming the GC skew that has without restorative procedure.
These data presentation have been revised the GC skew to a certain extent without restorative procedure, and known this GC skew is relevant with the order-checking of DNA amplification.
In order to determine whether to affect without restorative procedure the ratio of the parent cfDNA that the fetus contrast checks order, determined to be mapped to the number per-cent of the label of karyomit(e) x and Y.Figure 23 A and 23B show bar graph, and these figure provide and are mapped to chromosome x (Figure 23 A; The % chromosome x) and Y (Figure 23 B; Average and the standard deviation of the per-cent of label % karyomit(e) Y), this per-cent is checked order by 10 cfDNA samples that purifying from 10 pregnant woman's blood plasma is come and obtains.Figure 23 A shows with respect to the number that uses the shortening method to obtain, and the number of label that is mapped to X chromosome when using without restorative procedure is larger.Figure 23 B shows that the label per-cent that is mapped to Y chromosome when using without restorative procedure is not different during from use shortening method.
These data presentation, without restorative procedure can not introduce for or antagonism any skew that fetus contrast mother body D NA is checked order, namely when use during without repairing method, the constant rate of the fetus sequence that checks order.
Generally speaking, these data presentation can not adversely affect the quality of sequencing library without restorative procedure, also can not affect the information that checks order and obtain by to the library.Get rid of the required DNA of open stipulations and repair the preparation that step will reduce reagent cost and accelerate sequencing library.
For 2 footworks in the solution of the DNA that does not repair that indexs
Concentrate second experiment, the cfDNA that does not repair is carried out the dA tailing, the hot deactivation of then carrying out Ke Lienuo Exo-is connected with aptamer.When using the Yi Lumina aptamer (it carries the sub-thread arm with 21 bases) of not indexing when connecting, the hot deactivation of getting rid of Ke Lienuo Exo-does not affect productive rate or the quality of sequencing library.
In order to determine whether can be applicable to multiple order-checking without restorative procedure, the Y aptamer of indexing with the self-control that comprises the index sequence with 6 bases is in order to produce the library by comprising or getting rid of the Ke Lienuore deactivation.Be different from the aptamer of not indexing, the aptamer of indexing comprises the sub-thread arm with 43 bases, and it comprises index sequence and PCR priming site.
Be initiator with the oligonucleotide available from integrated dna technique (Integrated DNA Technologies) (Ke Laerweier, Iowa), make 12 kinds of different aptamers of indexing consistent with Yi Lumina TruSeq aptamer.The aptamer sequence that oligonucleotide sequence is indexed available from disclosed Yi Lumina TruSeq.With the oligonucleotide dissolving, obtain the annealing buffer (10mM Tris, 1mM EDTA, 50mM NaCl, pH 7.5) of 300 μ M ultimate densities.To comprise the aptamer that any appointment indexs two cantilevers wait a mole oligonucleotide mixture, common 10 μ l (each 300 μ M) mix, and allow annealing (95 ℃, 6 minutes; Then slow down control from 95 ℃ and be cooled to 10 ℃).Final 150 μ M aptamers are diluted to 7.5 μ M and are stored in-20 ℃ down until use in 10mM Tris, 1mM EDTA (pH 8).
Data presentation when aptamer that use is indexed, if active Ke Lienuo Exo-is present in the same reaction with ligase enzyme and the aptamer of indexing, is carried out the library by 2 footworks so and is prepared infeasible.Yet, if at first under 75 ℃, Ke Lienuo Exo-is carried out hot deactivation 5 minutes, then add ligase enzyme and add the aptamer of indexing, 2 footworks are very feasible so.In the time of may working as the aptamer of indexing and active Ke Lienuo Exo-and exist together, the thigh displacement activity of Ke Lienuo Exo-causes the longer single-stranded dna arm of the aptamer of indexing digested, thereby eliminates the PCR primer sites.Do not carrying out or carrying out in the situation of hot inactivation step, in Ke Lienuo Exo-reaction and display 2 footworks, before the aptamer that adds ligase enzyme and index, comprise the hot deactivation of Ke Lienuo Exo-can make have the expection characteristic curve (wherein main peak is at the 290bp place) library (data are not shown) afterwards, obtain the electrophorogram of sequencing library with identical cfDNA and enzyme.Therefore, owing to be applicable to multiple order-checking without repairing method, all experiment corrections of the Y aptamer of therefore use being indexed comprise the hot deactivation of Ke Lienuo Exo-.
Example 4
Prepare sequencing library by the cfDNA that does not repair: carry out aptamer at solid surface (SS) and connect 1 step solid surface method for the DNA that does not index
In order to determine whether can further simplify without repairing library technique, to being configured in order to carry out at solid surface without repairing the sequencing library preparation method described in the example 3.Described in example 3, checked order in prepared library.
Described in example 1, prepare cfDNA by peripheral blood sample.With streptavidin painting polypropylene pipe, washing, and first assembly through biotinylated aptamer of indexing is incorporated on the pipe that is coated with through streptavidin, as follows.By under 4 ℃ with the SA overnight incubation, with 8 hole PCR pipe row (U.S.'s science and technology (USA Scientific), OK a karaoke club difficult to understand, Florida) coating contains 0.5 nmole streptavidin (the silent science and technology of match (Thermo Scientific) on the pipe, Rockford, Illinois) 50 μ l PBS.To manage washing four times, each 200 μ l with 1XTE.Biotinylated index 1 aptamer of process that 7.5 picomole, 3.75 picomole, 1.8 picomole and 0.9 picomole are among the 50 μ l TE separately adds in the pipe that is coated with through SA in duplicate, and at room temperature hatches 25 minutes.Remove unconjugated aptamer and will manage washing four times with 200 μ l TE.Described in example 3, use available from the sub-oligonucleotide of the biotinylated General adaptive of the process of IDT and make through biotin labeled index 1 aptamer.
Use is from the 1 step SS method of conceived experimenter's not cfDNA
In second row PCR pipe, in containing No. 2 NEB damping fluids of 20 nmole dNTP and 10 nmole ATP, in 50 μ l reaction volumes with control sample (NTC: without template contrast) or the about 120pg/ μ of 30 μ l l, namely about 32 fly mole, purifiedly under 37 ℃, hatched 15 minutes with the 5 Ke Lienuo Exo-of unit available from conceived women's cfDNA not.Subsequently, by under 75 ℃, reaction mixture being hatched 5 minutes with the Klenow enzyme deactivation.Ke Lienuo-DNA mixture is transferred in the respective tube of the biotinylated aptamer of process that comprises the SA combination; and by in 10 μ l 1XT4-DNA ligase enzyme damping fluids, mixture being hatched 15 minutes with 400 unit T4-DNA ligase enzymes under 25 ℃, cfDNA is connected to through fixing aptamer.Subsequently, by in 10 μ l damping fluids, 7.5 picomole being hatched the cfDNA that made it be connected to and be combined with solid phase in 15 minutes without biotinylated index 1 aptamer with 200 unit T4-DNA ligase enzymes under 25 ℃.Remove reaction mixture, and will manage washing 5 times with 200 μ l TE damping fluids.Comprise P5 and P7 primer (IDT by the PCR use; Each 1 μ M) 50 μ l Phusion PCR mixtures [Niu Yinglun biology laboratory] increase and following the circulation to the cfDNA that aptamer connects: [30 seconds, 98 ℃; (10 seconds, 98 ℃; 10 seconds, 50 ℃; 10 seconds, 60 ℃; 10 seconds, 72 ℃) 18 circulations of X; 5 minutes, 72 ℃; Hatch for 10 ℃].Gained library product is carried out SPRI cleaning [Beckman Ku Erte genomics], and analyze the quality in the characteristic curve evaluation library that obtains according to use high-sensitivity biological analyzer chip [Agilent technology, Santa Clara, California].These characteristic curvees show that the solid phase sequencing library preparation of the cfDNA that does not repair provides high yield and high-quality sequencing library (data are not shown).
Use is from the 1 step SS method of conceived experimenter's cfDNA
Use the cfDNA sample available from the pregnant woman to come testing solid surface (SS) method.
Described in example 1, prepare cfDNA by 8 peripheral blood sample available from the pregnant woman, and as indicated abovely prepare sequencing library by purified cfDNA.Checked order in the library, and analytical sequence information.
Figure 24 shows the number of not getting rid of site (NE site) on 5 samples canonical sequence genome (hg18) separately and the ratio that is mapped to the sum of these labels of not getting rid of the site, and cfDNA is by these sample preparations and is used for according to the solution described in the shortening stipulations (ABB) (packing) described in the example 2, the example 18 without repairing stipulations (2 steps; Hollow strips) and the solid surface described in this example without repairing stipulations (1 step; Grey bar) constructs sequencing library.
Data presentation shown in Figure 24, suitable according to the expression of the pcr amplification sequence of three kinds of stipulations preparation, show that the solid surface method can not make sequence variation form skew expressed in the library.
Figure 25 A demonstration is when suitable to repair the number that is mapped to uniquely each chromosomal sequence label number and obtains that obtains when check order in the standby library of solid surface legal system when nothing is repaired 2 footwork in the use mentioned solution according to nothing.Data presentation, two kinds of GC skews that all reduce sequencing data without restorative procedure.
Relation between the chromosomal size that the number of tags of Figure 25 B demonstration mapping and label shine upon.Available from being R according to shortening in stipulations (ABB), the solution without repairing stipulations (2 step) and solid surface accordingly without the regression coefficient of the map tags of the sequencing library of repairing stipulations (1 step) preparation 2=0.9332, R 2=0.9802 and R 2=0.9807.
Figure 25 C show available from according to without the sequence label/karyomit(e) of the per-cents mapping of the sequencing library of repairing the stipulations preparation of 2 steps with available from according to the sequencing library that shortens stipulations (ABB) preparation label/chromosomal ratio be the function (◇) of each chromosomal per-cent GC content, and available from according to without the sequence label/karyomit(e) of the per-cent mapping of the sequencing library of repairing the stipulations preparation of 1 step with available from according to shorten sequencing library that stipulations (ABB) prepare label/chromosomal ratio is the function () of each chromosomal per-cent GC content.Generally speaking, the data presentation among Figure 25 B and the 25C, 1 step and 2 footworks show similar GC homogenization effect, repair step because both omit the DNA of library technique.
In order to determine whether to affect without restorative procedure the ratio of the parent cfDNA that the fetus contrast checks order, determine to be mapped to the number per-cent of the label of karyomit(e) x and Y.Figure 26 A and 26B show the mean of the label per-cent that is mapped to chromosome x (Figure 26 A) and Y (Figure 26 B) and the comparison of standard deviation, and these data are available from 5 cfDNA samples by 5 pregnant woman's of ABB, 2 steps and 1 footwork blood plasma purifying are checked order.Figure 26 A shows that with respect to the number (packing) that uses the shortening method to obtain the number of tags that is mapped to X chromosome when using without restorative procedure (2 go on foot and 1 step) is larger.Figure 26 B shows when using without repairing for 2 steps and is mapped to different from when use shortening method time of the label per-cent of Y chromosome during with 1 footwork.
These data presentation, without repair solid surface 1 footwork can not introduce for or antagonism any skew that fetus contrast mother body D NA is checked order, namely when using without reparation solid surface method, the constant rate of the fetus sequence that checks order.
Generally speaking, data presentation is an easy and feasible selection at solid surface generation sequencing library for sample formulation is checked order.
Example 5
Without the high operational throughput consistency of repairing 1 step of solid surface library preparation method
Whether can be applicable to high operational throughput sample preparation by what the NGS technology checked order without repairing 1 step library preparation method for definite, in the 96 hole PCR plates that the aptamer of indexing through the SA combination is coated with, prepare 96 kinds of cfDNA libraries by 96 peripheral blood sample.Described in example 5, checked order in prepared library.
Described in example 4, carry out being coated with first PCR plate with SA, and connect through biotinylated aptamer of indexing.With 96 orifice plates respectively be listed as hole coating comprise unique index, through biotinylated aptamer.Use second 96 hole PCR plate, exist in the situation of 10 μ l Ke Lienuo master mixed solutions at each, under 37 ℃, 37 different cfDNA among the 30 μ l were carried out the dA tailing 15 minutes, then under 75 ℃, carry out Klenow enzyme deactivation 5 minutes.In a plurality of holes, use several cfDNA, amount to 94 holes and contain cfDNA; 2 holes are as contrasting without template.Will through the cfDNA mixture of dA tailing transfer in the PCR plate and in the situation that has the quick ligase enzyme master of 10 μ l mixed solution 1 the 25 ℃ of lower PCT-225 of use tetrad gradient recirculation heater (Bole (BioRad); Heracles, California) be connected to combination, through biotinylated aptamer.Interpolation connects main mixed solution 2 and connect 15 minutes under 5 ℃ for 10 μ l of the aptamer customization of respectively indexing.Remove unconjugated DNA, and wash five times through biotinylated aptamer complex compound with the DNA-of TE damping fluid combination.Add 50 μ l PCR master mixed solutions in each hole, and the DNA that aptamer connects is increased and carries out SPRI and clean as example 4 described in.Library dilution and use HiSens BA chip are analyzed.
For using standby 61 clinical samples (Figure 27 A) of ABB legal system and use without 35 study samples (Figure 27 B) of repairing the preparation of SS 1 footwork, acquisition is for the preparation of the dependency between the gained amount of the amount of the purified cfDNA of sequencing library and library product.These data presentation are as the dependency (R2=0.1534 that obtains with the standby library of the shortening legal system described in the use-case 2; Figure 27 B) when comparing, for using the library for preparing without reparation SS 1 footwork, the significantly larger (R2=0.5826 of dependency; Figure 27 A).Attention: this cfDNA sample in relatively is not identical, because clinical sample is unavailable for research and development.Yet these results show, compare the dependency that always has larger cfDNA input with library output with the ABB method without repairing SS 1 footwork.Subsequently, for all three kinds of methods, come 3 kinds of methods of comparison with the identical purified cfDNA of serial dilution amount, namely ABB, without repairing for 2 steps and without the dependency of repairing SS 1 footwork.As shown in Figure 28, when preparing the library according to SS 1 footwork, obtain best correlation (R 2=0.9457; △), then be 2 footwork (R 2=0.7666; ) and have a significantly more ABB method (R2=0.0386 of low correlation; ◇).These data presentation, no matter compare with the method for end modified [DNA repairs and phosphorylation] cfDNA, without restorative procedure, be in solution or on solid surface, all provide consistent and predictable productive rate, no matter comprise or do not comprise the DNA of reparation and the purifying of dA tailing product.
According to standby library institute's time spent of the solid surface legal system described in this example than when according to shortening the standby sequencing library time institute of legal system time spent minority times.For example, in about 4 hours, use the ABB method can manually prepare 10 to 14 samples, and when using SS 1 footwork, in 4 and 5 hours, can manually prepare 96 or 192 libraries accordingly.Also have, can easily make SS 1 footwork automatization, in order to use the NGS technology when 96 multiple order-checking repeatedly, to prepare the library.Therefore, the SS method will be suitable for the high operational throughput sample analysis of business automation.
Analysis to the DNA library shows that the solid phase sequencing library preparation of the cfDNA that does not repair provides high yield and high-quality sequencing library, and these sequencing libraries can be used for automation process through disposing so that sample analysis that need further to accelerate use NGS technology to carry out extensive parallel order-checking.The DNA that the solid surface method is applicable to repair.
Example 6
To carrying out multiple order-checking according to the standby library of 1 step SS legal system
In multiple mode, the sample that each Yi Lumina HySeq order-checking device flow cell swimming lane is indexed with six kinds of differences is to checking order at the library sample (example 20) that 96 orifice plates prepare by SS 1 footwork.Described in example 2, checked order in prepared library.Data shown in Figure 29 have compared index efficiency, assess such as the multiple order-checking that goes on foot between (hollow strips) by 2 steps (packing) and SS 1.These data presentation are not damaged index efficiency in solid surface preparation library.Figure 30 A and 30B show total per-cent (the % karyomit(e) N that is mapped to the sequence label of each human chromosomal when according to the standby sequencing library of 1 step solid surface legal system; Figure 30 A); And Figure 30 B (R2=0.9807) display sequence label per-cent is the function of karyomit(e) size.Figure 30 A and 30B show that the GC skew of SS 1 footwork is identical with 2 footworks, repairs the sample preparation zymetology because two kinds of techniques are all used without DNA.
Figure 31 shows and to be mapped to the sequence label of Y chromosome with respect to the per-cent of the label that is mapped to X chromosome, available from checking order by 42 libraries of synthesizing to check order with reversible terminator technology to checking order with the aptamer preparation of indexing and with multiple mode Yong Yi Lumina with SS 1 footwork.Data obviously distinguished available from the pregnant woman who nourishes male fetus with available from the pregnant woman's who nourishes female child sample.
Example 7
Sample preparation and DNA extraction
Collect peripheral blood sample from being in gravidic first trimenon or second trimenon and being considered to exist in pregnant woman's body of fetus dysploidy risk.Before blood drawing, obtain letter of consent from each participant.Before amniocentesis or chorionic villus sampling, collect blood.Use chorionic villus or amniocentesis sample to carry out karyotyping to determine the fetus caryogram.
To be collected in the ACD pipe from the peripheral blood that each experimenter extracts.One pipe blood sample (about 6 to 9 milliliters/pipe) is transferred in 15 milliliters of low speed centrifuge pipes.Use Beckman Allegra 6R whizzer and GA 3.8 type rotors under 2640rpm, 4 ℃ with centrifugal blood 10 minutes.
Extract for cell-free plasma, the top plasma layer is transferred in 15 milliliters of high speed centrifugation pipes, and use Beckman Ku Erte AvantiJ-E whizzer and JA-14 rotor, lower centrifugal 10 minutes of 16000xg, 4 ℃.Behind blood collecting, in 72 hours, carry out two centrifugation step.Cell-free plasma is stored under-80 ℃, and before DNA extraction, only thaws once.
By using the QIAamp small-sized test kit of DNA blood (Kai Jie), from cell-free plasma, extract Cell-free DNA according to manufacturer specification.Add five milliliters of buffer A L and the triumphant outstanding proteolytic enzyme of 500 μ l to 4.5ml in the cell-free plasma of 5ml.With phosphate buffered saline (PBS) volume-adjustment is arrived 10ml, and under 56 ℃, mixture was hatched 12 minutes.Use a plurality of posts to pass through in the Beckman Eppendorf centrifuge at the centrifugal cfDNA that from solution, separates Shen Dian under 8, the 000RPM.Wash with AW1 and AW2 damping fluid coupled columns, and with 55 μ l nuclease free water elution cfDNA.From plasma sample, extract about 3.5 to 7ng cfDNA.
All sequencing libraries are all by the purified cfDNA preparation of the about 2ng that extracts from Maternal plasma.Use reagent N EBNext TMDNA sample preparation DNA reagent collection 1 (Item Number E6000L; Knob Great Britain biology laboratory, Ipswich, Massachusetts) followingly carry out the library preparation.Because cell-free plasma DNA becomes fragment in essence, therefore no longer make this plasma dna sample become fragment by spray method or sonication.The overhang basis of the cfDNA fragment of about 2ng purifying that will in 40 μ l, comprise
Figure BDA00002366924902531
End Repair Module and change into the blunt end of phosphorylation, this is by in the 1.5ml Eppendorf tube cfDNA being used in NEBNext TMThe diluent of 1: 5 the dna polymerase i of the buffer reagent of the phosphorylation of the 5 μ l 10X that provide among the DNA Sample Prep DNA Reagent Set 1,2 μ l deoxynucleotide solution mixtures (every part of dNTP has 10mM), 1 μ l, 1 μ l T4DNA polysaccharase and 1 μ l T4 polynucleotide kinase are hatched under 20 ℃ and were carried out in 15 minutes.Then by this reaction mixture was hatched under 75 ℃ 5 minutes with the hot deactivation of these enzymes.This mixture is cooled to 4 ℃, and uses that 10 μ l's contain Klenow fragment (3 ' to 5 ' exo-) (NEBNext TMDNA Sample Prep DNAReagent Set 1) dA tailing master mixed solution is finished the dA tailing of the DNA of blunt end, and hatches under 37 ℃ 15 minutes.Subsequently, by this reaction mixture was hatched under 75 ℃ 5 minutes with the hot deactivation of these Klenow fragments.After with the Klenow fragment deactivation, use at NEBNext TMThe T4DNA ligase enzyme of the 4 μ l that provide among the DNA Sample PrepDNA Reagent Set 1, by this mixture was hatched under 25 ℃ 15 minutes, with 1: 5 the diluent (Item Number: 1000521 of the Illumina Genomic Adaptor Oligo Mix of 1 μ l; Illumina Inc., Hayward, CA) these Illumina aptamers (Non-Index Y-Adaptors) are connected on the DNA with the dA tail.This mixture is cooled to 4 ℃, and uses Agencourt AMPure XP PCR purification system (Item Number: A63881; BeckmanCoulter Genomics, Danvers, MA) in purifying is out in the magnetic bead that provides cfDNA that aptamer the is connected aptamer, aptamer dimer and other reagent that never connect.The cfDNA that the circulation of carrying out 18 PCR connects with enrichment aptamer optionally, use be
Figure BDA00002366924902541
High-FidelityMaster Mix (Finnzymes, Woburn, MA) and with the PCR primer (Part No.1000537 and 1000537) of the Illumina of aptamer complementation.Use Illumina Genomic PCR primer (Item Number 100537 and 1000538) and at NEBNext TMThe Phusion HF PCR Master Mix (according to the explanation of manufacturers) that provides among the DNA Sample Prep DNA Reagent Set1, the DNA that aptamer is connected stands PCR, and (98 ℃ are lower 30 seconds; 98 ℃ of lower 18 circulation continuous 10 seconds, 65 ℃ lower 30 seconds, and 72 ℃ lower 30 seconds; Finally extend in 72 ℃ lower 5 minutes, and remain under 4 ℃).Use Agencourt AMPure XP PCR purification system (Agencourt Bioscience Corporation, Beverly, MA) according to the explanation (can get at www.beckmangenomics.com/products/AMPureXPProtocol_000387 v001.pdf place) of manufacturers the product that increases is carried out purifying.With the wash-out in the Qiagen EB of 40 μ l damping fluid of the amplification product behind the purifying, and use 2100 Bioanalyzer (Agilent technologies Inc., Santa Clara, CA) Agilent DNA 1000Kit to the amplification library analytical concentration and distribution of sizes.
DNA after the amplification is checked order with the genome analysis instrument II of Illumina, to obtain the single-ended reading of 36bp.Belong to a specific human chromosome in order to identify a sequence, only need the stochastic sequence information of about 30bp.Longer sequence can identify more specifically target uniquely.Under current status, obtained numerous 36bp readings, covered genomic about 10%.In case finished the order-checking of sample, Illumina " sequence control software (Sequencer Control Software) " transfers to image and base judgement file in the Unix server of operation Illumina " genome analysis instrument streamline (Genome Analyzer Pipeline) " software version 1.51.Operation Illumina " Gerald " program, with with sequence with compare with reference to human genome, is this to be derived from the hg18 genome that NCBI (NationalCenter for Biotechnology Information) provides (NCBI36/hg18 is at website, world http://genome.ucsc.edu/cgi-bin/hgGateway with reference to human genome? org=Human﹠amp; Db=hg18﹠amp; The hgsid=166260105 place can get).The sequence data of comparing with this genome uniqueness, produce from above program moves a program (c2c.pl) by the computer an operation Linnux operating system and reads from Gerald Output rusults (export.txt file).Allow to have the sequence alignment of base mispairing and only when they only align with this genome uniquely, just be included in and compare in the counting.Sequence alignment (replisome) with identical initial sum termination coordinate forecloses.
To have 2 or mispairing still less about 5,000,000 to 1,500 ten thousand between the 36bp label be mapped to uniquely human genome.The label of all mappings is counted and is included within the calculating of test and both karyomit(e) dosage of qualified samples.From the base 0 of karyomit(e) Y to base 2x 10 6, base 10x 10 6To base 13x 10 6And base 23x 10 6Get rid of definitely outside analyzing to the zone at end, because the label that obtains from the masculinity and femininity fetus all is mapped to these zones of Y chromosome.
Should point out, some variation on the overall number of sequence label is mapped to the individual chromosome (interchromosomal variability) that spreads all over the sample that checks order in same round, but notices in the order-checking in different rounds (sequence order-checking process between variability) substantive larger variation has occured.
Example 8
Dosage and variation for karyomit(e) 13,18,21, X and Y
In order to check in the degree of variability between interchromosomal variability and sequencing on the number of the sequence label of mapping for all karyomit(e)s, extract the blood plasma cfDNA that obtains from the experimenter's of 48 volunteer's pregnancies peripheral blood and as illustrated the example 7 and check order, and carried out following analysis.
Determined to be mapped to the overall number (sequence label density) of each chromosomal sequence label.Alternately, the number of sequence label of mapping can be normalized to this chromosomal length, to produce a sequence label density ratio.Be normalized to the optional step of chromosomal length, thereby the figure place that still can reduce separately a numeral in the number simplifies it for human interpretation.Can be used for these sequence labels are counted normalized chromosome length can be the length that provides at genome.ucsc.edu/goldenPath/stats.html#hg18 place, website, the world.
Make the sequence label density that obtains for each karyomit(e) carry out related with each remaining chromosomal sequence label density, obtaining a qualified karyomit(e) dosage, this dosage be calculated as for the sequence label density of interested karyomit(e) (for example karyomit(e) 21) with for remaining karyomit(e) (be karyomit(e) 1-20,22 and X) the ratio of sequence label density.Table 9 provides an example of the qualified karyomit(e) dosage that calculates for interested karyomit(e) 13,18,21, X and Y, and this dosage is measured in a qualified samples therein.Measured karyomit(e) dosage for all karyomit(e)s in all samples, and provide in table 10 and table 11 for the mean dose of interested karyomit(e) 13,18,21, X and Y in the qualified samples, and in Figure 32-36, be illustrated.Figure 32 to 36 has illustrated that also each interested chromosomal karyomit(e) dosage provides a kind of the measuring that changes on the overall number of the sequence label of (with respect to each remaining karyomit(e)) mapping for each interested karyomit(e) in the karyomit(e) dosage qualified samples of specimen.Therefore, qualified karyomit(e) dosage can be identified following karyomit(e) or a group chromosome, namely, the normalization method karyomit(e) that approaches best in the variability of sample room and interested chromosomal variability, and this normalization method karyomit(e) will carry out normalized ideal sequence as the value to further statistical estimation.Figure 37 and 38 has described for karyomit(e) 13,18 and 21, and chromosome x and Y average karyomit(e) dosage that measure, that calculate in a qualified sample group.
In some cases, perhaps, this best normalization method karyomit(e) does not have minimum variability, but a kind of distribution that may have qualified dosage, this distribution is distinguished one or more specimen and these qualified samples best mutually, that is: perhaps best normalization method karyomit(e) do not have minimum variability, but may have maximum resolvability.Therefore, resolvability is taken the distribution of the variation of karyomit(e) dosage and the dosage in qualified samples into account.
Table 10 and 11 provides the variation coefficient to measure as variability, and measuring of the resolvability of t test value as karyomit(e) 18,21, X and Y is provided, and wherein the t test value is less, and resolvability is larger.The resolvability of karyomit(e) 13 is measured as the ratio of the mean value standard deviation of the difference of average karyomit(e) dosage in the qualified samples and the dosage of the karyomit(e) in the T13 specimen 13 only and qualified dosage.
When as following illustrated identification is during dysploidy in specimen, qualified karyomit(e) dosage is also as the basis of measuring threshold value.
Table 9. is for the qualified karyomit(e) dosage (n=1 of karyomit(e) 13,18,21, X and Y; Sample number into spectrum 11342,46XY)
Figure BDA00002366924902571
Table 10. is for karyomit(e) 21,18 and 13 qualified karyomit(e) dosage, variation and resolvability
Figure BDA00002366924902581
Figure BDA00002366924902591
Figure BDA00002366924902601
Table 11. is for qualified karyomit(e) dosage, variation and the resolvability of karyomit(e) 13, X and Y
Figure BDA00002366924902602
Figure BDA00002366924902611
The diagnosis example of T21, T13, T18 and a Turner syndrome case that normalization method karyomit(e), karyomit(e) dosage and the resolvability of use for interested karyomit(e) obtains is illustrated in the example 9.
Example 9
Use normalization method karyomit(e) diagnosing fetal dysploidy
For the purposes that makes karyomit(e) dosage is applicable to assess dysploidy in the bioassay sample, has obtained the maternal blood specimen and prepared cfDNA from the volunteer of pregnancy, and illustratedly checked order and analyze such as example 1 and 2.
Trisomy 21
Table 12 provides the dosage that calculates for karyomit(e) 21 in an exemplary specimen (#11403).The threshold value that calculates for the positive diagnosis of T21 is set at the standard deviation place of mean value apart from these qualified (normally) samples>2.The threshold value that the diagnosis of T21 is based on the karyomit(e) dose ratio setting in the specimen provides greatly.Used karyomit(e) 14 and 15 with independent calculation result as normalization method karyomit(e), can be used for identifying dysploidy to show have minimum variability (for example karyomit(e) 14) or karyomit(e) with maximum resolvability (for example karyomit(e) 15).The karyomit(e) dosage that use calculates has identified 13 T21 samples, and confirms that by caryogram these dysploidy samples are T21.
Table 12. is for the karyomit(e) dosage (sample #11403,47XY+21) of T21 dysploidy
Figure BDA00002366924902621
Trisomy 18
Table 13 provides the dosage that calculates for karyomit(e) 18 in a specimen (#11390).The threshold setting that calculates for the positive diagnosis of T18 is the standard deviation of the mean value that leaves qualified (normally) sample>2.The threshold value that the diagnosis of T18 is based on the karyomit(e) dose ratio setting in the specimen provides greatly.Use karyomit(e) 8 as normalization method karyomit(e).In this example, karyomit(e) 8 has minimum variability and maximum resolvability.Use karyomit(e) dosage to identify 18 T18 samples, and to turn out to be by caryogram be T18.
These data show that a normalization method karyomit(e) can have minimum variability and maximum resolvability.
Table 13. is for the karyomit(e) dosage (sample #11390,47XY+18) of T18 dysploidy
Figure BDA00002366924902622
Trisomy 13
Table 14 provides the dosage that calculates for karyomit(e) 13 in a specimen (#51236).The threshold setting that calculates for the positive diagnosis of T13 is the standard deviation of the mean value that leaves qualified sample>2.The threshold value that the diagnosis of T13 is based on the karyomit(e) dose ratio setting in the specimen provides greatly.Use karyomit(e) 5 or 3,4,5 and 6 genome to calculate karyomit(e) dosage as normalization method karyomit(e) for karyomit(e) 13.Identified a T13 sample.
Table 14. is for the karyomit(e) dosage (sample #51236,47XY+13) of T13 dysploidy
Figure BDA00002366924902631
The sequence label density of karyomit(e) 3 to 6 is average label countings of karyomit(e) 3 to 6.
These data show, karyomit(e) 3,4,5 and 6 combination provide a variability that is lower than karyomit(e) 5, and greater than the resolvability of any one maximum in other karyomit(e)s.
Therefore, can determine karyomit(e) dosage and identification dysploidy as normalization method karyomit(e) with a group chromosome.
Turner syndrome (monosomy X)
Table 15 provides in a specimen (#51238) dosage that calculates for chromosome x and Y.The threshold value that calculates for the positive diagnosis of Turner syndrome (monosomy X) is set to for X chromosome at the mean value of distance qualified (normally) sample<-2 standard deviation places, and pin is in not existing Y chromosome apart from qualified (normally) sample mean<-2 standard deviation from average places.
Table 15. is for the karyomit(e) dosage of Tener (XO) dysploidy (sample #51238,45X)
The X chromosome dosage that has is identified as having less than the sample of setting threshold and is less than an X chromosome.Same sample is confirmed as having a Y chromosome dosage less than setting threshold, and this shows that this sample does not have Y chromosome.Therefore, identify Turner syndrome (monosomy X) sample with the combination of the dosage of X and Y.
Therefore, the method that provides makes it possible to determine chromosomal CNV.Particularly, the method is by carrying out extensive parallel order-checking and normalization method karyomit(e) being identified the chromosomal aneuploidy that makes it possible to determine excessively representative and represent deficiency for sequencing data being carried out statistical study to Maternal plasma cfDNA.The sensitivity of the method and reliability allow the dysploidy of first and second trimenons of Accurate Measurement.
Example 10
Determining of part dysploidy
The purposes of sequence dosage is applied to assessing the part dysploidy by the cfDNA biology specimen for preparing from blood plasma, and such as illustrated order-checking the in the example 7.Confirm that by karyotyping this sample is to obtain from an experimenter with karyomit(e) 11 excalations.
For the analysis of the sequencing data of part dysploidy (karyomit(e) 11, the i.e. excalation of q21-q23) as illustrated and carry out for the karyomit(e) dysploidy in the example before.A significantly loss (data are not shown) of the label counting sequence label in the specimen has shown the label counting that obtains with respect to the corresponding sequence for the karyomit(e) 11 in the qualified samples to the mapping of karyomit(e) 11 between chromosomal long-armed middle base pair 81000082-103000103.Used the sequence label (810000082-103000103bp) that in each qualified samples, is mapped to the interested sequence of karyomit(e) 11 and the sequence label that in the whole genome of qualified samples, is mapped to all 20 megabasse fragments (being qualified sequence label density) to determine that qualified sequence dosage is as the ratio of the label density in all qualified samples.For all 20 the megabasse fragment computations in the whole genome mean sequence dosage, standard deviation and the variation coefficient, and the 20-megabasse sequence with minimum variability is identified as the normalization method sequence (13000014-33000033bp) (referring to table 16) on karyomit(e) 5, and this normalization method sequence is used to calculate the dosage (referring to table 17) for interested sequence in the specimen.Table 16 provides the sequence dosage of the interested sequence (810000082-103000103bp) on the karyomit(e) 11 in specimen, and this sequence dosage is calculated as the sequence label that is mapped to interested sequence and the ratio that is mapped to the sequence label of the normalization method sequence that identifies.Figure 40 show in 7 qualified samples (zero) in the sequence dosage of interested sequence and the specimen (◇) for the sequence dosage of corresponding sequence.By solid line mean value is shown, and the threshold value that calculates for the positive diagnosis of part dysploidy shown by dashed lines, it is set at 5 standard deviation places of anomaly average.The diagnosis of part dysploidy is based on little the providing of threshold value that the sequence dose ratio in the specimen is set.Confirm that by karyotyping this specimen has disappearance q21-q23 at karyomit(e) 11.
Therefore, except identification karyomit(e) dysploidy, method of the present invention can also be used to the identification division dysploidy.
Table 16. for the qualified normalization method sequence of sequence C hr11:81000082-103000103, dosage and Change (qualified samples n=7)
Figure BDA00002366924902651
Table 17. is for the sequence dosage of interested sequence (81000082-103000103) on karyomit(e) 11 (specimen 11206)
Figure BDA00002366924902652
Example 11
The displaying that dysploidy detects
Further analyze for explanation in example 2 and 3 and at the sequence data that the sample shown in Figure 32 to 36 obtains, to show the sensitivity of the method aspect the dysploidy of successfully identifying in the maternal sample.Normalized karyomit(e) dosage conduct for karyomit(e) 21,18,13, X and Y is analyzed with respect to a distribution (Y-axis) of standard deviation from average, and shown in Figure 41 A-41E.Employed normalization method karyomit(e) illustrates (X-axis) as denominator.
Figure 41 (A) shows when using karyomit(e) 14 as normalization method karyomit(e) for karyomit(e) 21, for unaffected sample (o) and trisomy 21 sample (T21; Karyomit(e) 21 dosage karyomit(e) dosage △) are with respect to a distribution of standard deviation from average.Figure 41 (B) shows when using karyomit(e) 8 as normalization method karyomit(e) for karyomit(e) 18, for unaffected sample (o) and trisomy 18 sample (T18; Karyomit(e) 18 dosage karyomit(e) dosage △) are with respect to a distribution of standard deviation from average.Figure 41 (C) shows for unaffected sample (o) and trisomy 18 sample (T13; Karyomit(e) 13 dosage karyomit(e) dosage △) are with respect to a distribution of standard deviation from average, use be 3,4,5 and 6 a genomic mean sequence label density as normalization method karyomit(e) to determine the karyomit(e) dosage of karyomit(e) 13.Figure 41 (D) shows when using karyomit(e) 4 as normalization method karyomit(e) for chromosome x, for unaffected women's sample (o), unaffected male sex's sample (△) and monosomy X sample (XO; +) in chromosome x dosage karyomit(e) dosage with respect to a distribution of standard deviation from average.Figure 41 (E) show when use 1 to 22 and the genomic mean sequence label density of X as normalization method karyomit(e) when determining the karyomit(e) dosage of karyomit(e) Y, for the distribution of the karyomit(e) Y dosage in unaffected male sex's sample (o), unaffected women's sample (△) and the monosomy X sample (+) with respect to the standard deviation from average.
These data show, trisomy 21, trisomy 18, trisomy 13 can be known with unaffected (normally) sample and distinguish.When the chromosome x dosage that has is starkly lower than the dosage of unaffected women's sample (Figure 41 (D)), and when the karyomit(e) Y dosage that has was starkly lower than the dosage of unaffected male sex's sample (Figure 41 (E)), monosomy X sample can identify easily.
Therefore, the method that provides is sensitive and for determining that it is specific existing or do not have the karyomit(e) dysploidy in the maternal blood sample.
Example 12
To determining that with extensive parallel dna sequencing fetus dyes from the acellular foetal DNA of maternal blood Colour solid dysploidy: the test group 1 that is independent of training group 1
This research is undertaken by human experimenter's scientific experimentation plan that qualified fixed point clinical study personnel get permission according to the Ethic review council (IRB) by each mechanism between in April, 2009 and in October, 2010 in 13 clinical areas of the U.S..Participating in having obtained the written consent book from every experimenter before the research.This scientific experimentation plan is designed to provide blood sample and clinical data to support the development of non-invasive PGD method.18 years old or the larger qualified participation of gravid woman of age.Carrying out collecting blood before this program for the chorionic villi sampling (CVS) of experience clinical indication or the patient that amnion pierces through, and collecting equally the result of fetus caryogram.Extract peripheral blood sample (two pipe or altogether about 20mL) from all experimenters and place acid citrate glucose (ACD) pipe (Becton Dickinson).All samples all removed identity and specify patient ID number an of anonymity.Blood sample spent the night in the temperature control type conveying containers that provides for institute be transported to the laboratory.The time that blood drawing and sample spend between accepting is recorded as the part that sample is ascended the throne.
The case study coordination personnel use anonymous patient ID number will with patient current pregnant situation and history-sensitive clinical data typing research case report form (CRF) in.Sample from non-invasive antenatal program is carried out the CYTOGENETIC ANALYSIS OF ONE of fetus caryogram in each laboratory and the result is recorded in equally among the research CRF.In all data that CRF obtains all in the clinical database in typing laboratory.After 24 to 48 hours venipuncture sampling, utilize two step centrifuging to obtain acellular blood plasma from independent blood tube.Blood plasma from single blood tube enough carries out sequencing analysis.By using QIAamp DNABloodMini kit (Qiagen) according to the explanation of manufacturers Cell-free DNA to be extracted from cell-free plasma.Because known these acellular dna fragmentations are about 170 base pairs (bp) (Fan et al., Clin Chem 56:1279-1286[2010]) in length, before order-checking, do not require DNA cracked.
Sample for this training group, cfDNA is delivered to Prognosys Biosciences, Inc. (LaJolla, CA) is used for sequencing library preparation (blunt end and be connected to cfDNA on the common aptamer) and the scientific experimentation of Application standard manufacturers is planned to check order with Illumina Genome Analyzer IIx instrument (http://www.illumina.com/).Obtained the single-ended reading of 36 base pairs.After finishing order-checking, collect all bases and judge files and analyze.For the test group sample, preparation sequencing library and check order at Illumina Genome Analyzer IIx instrument.Being prepared as follows of sequencing library carried out.Illustrated total length scientific experimentation plan mainly is the Standards Code that Illumina provides, and only different from Illumina scientific experimentation plan on the purifying in the library of increasing.Illumina scientific experimentation plan indication: the library of amplification uses gel electrophoresis to carry out purifying, and uses magnetic bead to carry out identical purification step in the scientific experimentation plan of this explanation.CfDNA with the about 2ng purifying that extracts from Maternal plasma prepares an elementary sequencing library, and this mainly uses
Figure BDA00002366924902681
NEBNext TMDNA Sample PrepDNA Reagent Set 1 (Part No.E6000L; New England Biolabs, Ipswich, MA) carry out according to the explanation of manufacturers.Except replacing purification column that the product that aptamer connects is carried out the final purifying with Agencourt magnetic bead and reagent, all follow NEBNext for the sample preparation of genome dna library according to the scientific experimentation plan in steps TMReagent (uses
Figure BDA00002366924902682
The GAII order-checking) carries out.NEBNext TMNEBNext TMThe carrying out that mainly provides according to Illumina, this can get at grcf.jhml.edu/hts/protocols/11257047_ChIP_Sample_Prep.pd f place.
The overhang of the cfDNA fragment of about 2ng purifying that will comprise in 40 μ l is by being used in NEBNext with cfDNA in the 1.5ml Eppendorf tube TMThe diluent of 1: 5 the dna polymerase i of the buffer reagent of the phosphorylation of the 5 μ l 10X that provide among the DNA Sample Prep DNA Reagent Set 1,2 μ l deoxynucleotide solution mixtures (every part of dNTP has 10mM), 1 μ l, 1 μ l T4DNA polysaccharase and 1 μ l T4 polynucleotide kinase were hatched under 20 ℃ 15 minutes, according to
Figure BDA00002366924902683
End RepairModule and change into the blunt end of phosphorylation.This sample is cooled to 4 ℃, and uses a quick post of QIA that in QIAQuick PCR Purification Kit (QIAGEN Inc., Valencia, CA), provides to carry out purifying.50 μ l reaction solutions are transferred in the 1.5ml centrifuge tube, and added the QiagenBuffer PB of 250 μ l.In the quick post of l to QIA of 300 μ that obtains, with its in an Eppendorf centrifuge under 13,000RPM centrifugal 1 minute.With the Qiagen Buffer PE washing of this post with 750 μ l, and centrifugal again.Remaining ethanol by removing under 13,000RPM in centrifugal 5 minutes again.DNA is come wash-out by centrifugal in the Qiagen Buffer of 39 μ l EB.That uses 16 μ l contains Klenow fragment (3 ' to 5 ' exo-) (NEBNext TMDNA Sample Prep DNA Reagent Set 1) dA tailing master mixed solution is finished the dA tailing of the DNA of 34 μ l blunt ends, and according to manufacturers
Figure BDA00002366924902684
DA-tailing module (
Figure BDA00002366924902685
DA-Tailing Module) under 37 ℃, hatched 30 minutes.This sample is cooled to 4 ℃, and uses a post that in MinElute PCR Purification Kit (QIAGEN Inc., Valencia, CA), provides to carry out purifying.50 μ l reaction solutions are transferred in the 1.5ml Eppendorf tube, and added the Qiagen damping fluid PB (Qiagen Buffer PB) of 250 μ l.300 μ l are transferred in the MinElute post, with its in an Eppendorf centrifuge under 13,000RPM centrifugal 1 minute.With Qiagen damping fluid (the PE Qiagen Buffer PE) washing of this post with 750 μ l, and centrifugal again.Remaining ethanol by removing under 13,000RPM in centrifugal 5 minutes again.DNA is come wash-out by centrifugal in the Qiagen Buffer of 15 μ l EB.According to
Figure BDA00002366924902691
Rapid connecting module (
Figure BDA00002366924902692
Quick Ligation Module), the DNA elutriant of ten microlitres was hatched under 25 ℃ 15 minutes with 1: 5 Illumina Genomic Adapter Oligo Mix (Item Number 1000521) diluent of 1 μ l, 2X Quick Ligation Reaction Buffer and the quick T4DNA ligase enzyme of 4 μ l of 15 μ l.Sample is cooled to 4 ℃, and uses a following MinElute post.150 microlitre Qiagen Buffer PE are added in the 30 μ l reaction solutions, and whole volumes are transferred in the MinElute post, with its in an Eppendorf centrifuge under 13,000RPM centrifugal 1 minute.With the Qiagen Buffer PE washing of this post with 750 μ l, and centrifugal again.Remaining ethanol by removing under 13,000RPM in centrifugal 5 minutes again.DNA is come wash-out by centrifugal in the Qiagen Buffer of 28 μ l EB.Use Illumina Genomic PCR primer (Item Number 100537 and 1000538) and at NEBNext TMThe Phusion HFPCR Master Mix (according to the explanation of manufacturers) that provides among the DNA Sample Prep DNA Reagent Set 1, the DNA elutriant that the aptamer of 23 microlitres is connected stands 18 PCR circulations, and (98 ℃ are lower 30 seconds; 98 ℃ of lower 18 circulation continuous 10 seconds, 65 ℃ lower 30 seconds, and 72 ℃ lower 30 seconds; Finally extend in 72 ℃ lower 5 minutes, and remain under 4 ℃).Use Agencourt AMPure XP PCR purification system (Agencourt Bioscience Corporation, Beverly, MA) according to the explanation (can get at www.beckmangenomics.com/products/AMPureXPProtocol_000387 v001.pdf place) of manufacturers the product that increases is carried out purifying.Agencourt AMPure XP PCR purification system has been removed unassembled dNTP, primer, primer dimer, salt and other pollutents, and has reclaimed the amplicon greater than 100bp.With the product of the amplification behind the purifying at the Qiagen EB of 40 μ l damping fluid wash-out from the Agencourt bead, and use 2100 Bioanalyzer (Agilent technologies Inc., Santa Clara, CA) Agilent DNA 1000Kit distribution of sizes is analyzed in the library.For training and specimen collection, the monolateral reading of 36 base pairs is checked order.
Data analysis and sample classification
Be sequence reading and the human genome assembly hg18 that obtains from the UCSC database of 36 bases compare (http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/) with length.Use allows the short tract comparative device (version 0.12.5) of the Bowtie of maximum two base mispairings (Langmead et al., Genome Biol 10:R25[2009]) to compare in comparison process.Only have and know that being mapped to a locational reading of term single gene group just is included.The genomic locus that reading is shone upon has carried out counting and has been included in the calculating of karyomit(e) dosage (referring to following content).Sequence label from the masculinity and femininity fetus is excluded beyond analyzing (exactly, from base 0 to base 2x 10 without the zone on the Y chromosome of any differentiation ground mapping part 6, base 10x 10 6To base 13x 10 6And base 23x10 6End to Y chromosome.)
Changing with the order-checking between round and round in the karyomit(e) of sequence reading distributes to make the fetus dysploidy not obvious to the distribution in the sequence site shone upon.In order to proofread and correct this variation, calculated a karyomit(e) dosage, because be normalized to for the viewed counting of normalization method chromosome sequence that presets for the counting in the interested chromosomal mapping site that provides.As previously described, a normalized chromosome sequence can form by a monosome or by a group chromosome.In the sample subset in the training group of unaffected (being qualified) sample, normalized chromosome sequence at first is identified as having interested karyomit(e) 21,18,13 and the diploid caryogram of X, consider with each euchromosome in the ratio with our interested chromosomal counting as potential denominator.Denominator karyomit(e) (being normalized chromosome sequence) is selected as so that the variation of the karyomit(e) dosage between checking order batch is minimum.Each interested karyomit(e) is confirmed as having a significant normalization method chromosome sequence (denominator) (table 10).There is not individual chromosome can be identified as a normalization method chromosome sequence for karyomit(e) 13, because neither one karyomit(e) has been confirmed as reducing the variation of the dosage of karyomit(e) 13 in the sample, that is, the extension of the NCV value of karyomit(e) 13 is not reduced to is enough to allow the T13 dysploidy is correctly identified.The ability of the behavior of they imitation karyomit(e)s 13 is selected and tested as a group to karyomit(e) 2 to 6 at random.The group of karyomit(e) 2 to 6 has been found substantially to reduce in the training group sample for the variation on the dosage of karyomit(e) 13, and therefore is selected as the normalization method chromosome sequence of karyomit(e) 13.As mentioned above, be greater than 30 for the variation of the karyomit(e) dosage of karyomit(e) Y, with its independently, monosome is used as the normalization method chromosome sequence when determining the dosage of karyomit(e) Y.The group of karyomit(e) 2 to 6 has been found substantially to reduce in the training group sample for the variation on the dosage of karyomit(e) Y, and therefore is selected as the normalization method chromosome sequence of karyomit(e) Y.
Measure with respect to of variation of overall number that each remains the sequence label of chromosomal mapping at the overall number that provides the sequence label of mapping for each interested karyomit(e) for each interested chromosomal karyomit(e) dosage in the qualified samples.Therefore, qualified karyomit(e) dosage can be identified this karyomit(e) or a group chromosome, namely has in sample best close to variability of interested chromosomal variability and will be as daring to the normalization method chromosome sequence of ideal sequence of normalized value of further statistical estimation.
The basis that is used for definite threshold when the karyomit(e) dosage of all samples is also as the dysploidy in identification specimen as described below in training group (being qualified and affected).
Table 18. is used for determining the normalization method chromosome sequence of karyomit(e) dosage
For each interested karyomit(e) in each sample of test group, determined a normalized value and be used to determine to have or do not exist dysploidy.This normalized value is calculated with the karyomit(e) dosage that a normalized karyomit(e) value (NCV) is provided as can further calculating.
Karyomit(e) dosage
For test group, calculated a karyomit(e) dosage for each interested karyomit(e) 21,18,13, X and the Y of each sample.As providing in above table 18, the karyomit(e) dosage of karyomit(e) 21 calculates as the ratio of the number of tags in the number of tags in the specimen that is mapped to the karyomit(e) 21 in the specimen and the specimen that is mapped to the karyomit(e) 9 in the specimen; The karyomit(e) dosage of karyomit(e) 18 calculates as the ratio of the number of tags in the number of tags in the specimen that is mapped to the karyomit(e) 18 in the specimen and the specimen that is mapped to the karyomit(e) 8 in the specimen; The karyomit(e) dosage of karyomit(e) 13 calculates as the ratio of the number of tags in the number of tags in the specimen that is mapped to the karyomit(e) 13 in the specimen and the specimen that is mapped to the karyomit(e) 2 to 6 in the specimen; The karyomit(e) dosage of chromosome x calculates as the ratio of the number of tags in the number of tags in the specimen that is mapped to the chromosome x in the specimen and the specimen that is mapped to the karyomit(e) 6 in the specimen; The karyomit(e) dosage of karyomit(e) Y calculates as the ratio of the number of tags in the number of tags in the specimen that is mapped to the karyomit(e) Y in the specimen and the specimen that is mapped to the karyomit(e) 2 to 6 in the specimen.
Normalized karyomit(e) value
Use in each specimen for each interested chromosomal karyomit(e) dosage and the corresponding karyomit(e) dosage in the qualified samples of training group, determined, use the normalized karyomit(e) value of following Equation for Calculating (NCV):
NCV ij = x ij - &mu; ^ j &sigma; ^ j
Wherein
Figure BDA00002366924902722
With Estimation training cell mean and the standard deviation for j karyomit(e) dosage accordingly, and For viewed j the karyomit(e) dosage of specimen i.When karyomit(e) dosage being carried out the normalization method distribution, NCV is equivalent to a statistics z mark for these dosage.In drawing from the fractile of the NCV of unaffected sample-fractile, do not observe and significantly the departing from of the linear lag.In addition, fail to veto the null hypothesis of normality for the standard testing of the normalizing degree of NCV.
For test group, calculated a NCV for each interested karyomit(e) 21,18,13, X and the Y of each sample.In order to ensure a safe and efficient classification schemes, for the dysploidy categorizing selection conservative border.For autosomal aneuploid state is classified, require NCV that karyomit(e) is classified as affected (that is, being dysploidy for this karyomit(e)); And NCV<2.5 classify as karyomit(e) unaffected.The sample that euchromosome has the NCV between 2.5 and 4.0 is classified as " without judging ".
In test, heterosomal classification is by all being undertaken by following content sequential use NCV for X and Y:
If NCV is Y>-2.0 male sex's sample standard deviation from averages, and then this sample is classified as the male sex (XY).
If NCV is Y<-2.0 male sex's sample standard deviation from averages, and NCV Y>-2.0 women's sample standard deviation from averages, then this sample is classified as women (XX).
If NCV is Y<-2.0 male sex's sample standard deviation from averages, and NCV Y<-3.0 women's sample standard deviation from averages, then this sample is classified as monosomy X, i.e. Turner syndrome.
If NCV does not meet any above standard, then this sample cup classifies as for sex and is " without judging ".
The result
The research demography
Between in April, 2009 and in July, 2010, registered altogether 1,014 patient.The mean age that patient's demography, invasive Program Type and results of karyotype are summarised in study population in the table 19 be 35.6 years old (scope was at 17 to 47 years old) and pregnant age scope be 61 day week to 38 1 day week (15 4 days weeks of average out to).The overall sickness rate of abnormal fetus karyotype is 6.8%, and wherein the T21 sickness rate is 2.5%.In having 946 experimenters of single pregnancy and caryogram, 906 (96%) presents at least a clinical generally acknowledged risk factors for the fetus dysploidy of antenatal process.Only have the high conceived age as the experimenter of its unique indication even remove those, data have still been showed for very high false positive rate of current examination mode.With ultrasonic result of ultrasonography of carrying out be: the nuchal translucency of increase, cystic hygroma or other structural birth defect, these are the strongest abnormal karyotypes of foresight in this age group.
Table 19. patient demographics
Figure BDA00002366924902741
Figure BDA00002366924902751
* the result who comprises the fetus of polycyesis, * * is by clinicist's assessment and report
Abbreviation: AMA=high pregnant age, the NT=nuchal translucency
The distribution of various ethnic background of showing in this study population is also shown in the table 19.Generally, 63% patient is the Caucasian in this research, the 17%th, and the Spaniard, the 6%th, the Aisa people, the 5%th, multi-national, and 4% be African American.Noticed that race's difference is changed significantly in different places.For example, the three unities has been registered Spain of 60% and 26% Caucasia experimenter, and three clinical points that are positioned at same state are not registered the Spain experimenter.As expected, in our not agnate result, do not observe recognizable difference.
Training dataset 1
This training group research has been selected 71 samples 435 samples that collect, that the initial stage accumulates in succession between year December in April, 2009 to 2009.All experimenters that have affected fetus (abnormal karyotype) in the experimenter of this First Series are included for order-checking, and have random choose of suitable sample and data and the unaffected experimenter of random number.Training group patient's Clinical symptoms is demographic consistent with the holistic approach shown in the table 19.The scope in pregnant age of the sample in the training group is the scopes from 10 0 day week to 23 1 day week.38 people have experienced CVS, 32 people experienced the type that amniocentesis and 1 patient do not have the invasive program of appointment (unaffected caryogram 46, XY).70% patient is the Caucasian, the 8.5%th, and the Spaniard, the 8.5%th, the Aisa people, and 8.5% be multi-ethnic.For the purpose of training, in this collection, six samples that checked order have been removed.4 samples are from the experimenter of twin pregnancy (below discuss in detail), and 1 sample has T18, and is contaminated in preparation process, and 1 sample has fetus caryogram 69, XXX, and remaining 65 samples are this training group.
The 13.7M (owing in time improvement on sequencing technologies) of the number in unique sequence site (that is, in genome with the label of unique site identification) from the 2.2M of the commitment of this training group research to later stage changes.Surpass any potential change of this scope of 6 times in order to monitor in the site of uniqueness karyomit(e) dosage, moved different, unaffected sample when finishing in the beginning of research.For the round of front 15 unaffected samples, the average number in unique site is 3.8M and is respectively 0.314 and 0.528 for the average karyomit(e) dosage of karyomit(e) 21 and karyomit(e) 18.For the round of rear 15 unaffected samples, the average number in unique site is 10.7M and is respectively 0.316 and 0.529 for the average karyomit(e) dosage of karyomit(e) 21 and karyomit(e) 18.Between the karyomit(e) dosage of karyomit(e) 21 and karyomit(e) 18, along with the time lapse of training group research, there is not statistical difference.
Illustrate for karyomit(e) 21,18 and 13 training group NCV at Figure 42.Result shown in Figure 42 is consistent with a kind of hypothesis of normality, and this hypothesis is: about 99% diploid NCV will fall into mean value ± 2.5 standard deviations.In 65 samples in this collection, the NCV scope that 8 samples with the clinical caryogram that indicates T21 have is from 6 to 20.The NCV scope from 3.3 to 12 that the sample that four clinical caryogram that have indicate fetus T18 has, and the NCV that two clinical caryogram that have sample of indicating fetus trisomy 13 (T13) has is 2.6 and 4.The distribution of NCV is owing to they dependencys to the per-cent of the fetus cfDNA in the single sample in affected sample.
Similar with euchromosome, in the training group, determined heterosomal mean value and standard deviation.Heterosomal threshold value allows 100% ground to differentiate the masculinity and femininity fetus that the training group is interior.
Test data set 1
Established karyomit(e) dosage mean value and with the standard deviation from average of training group after, from the sample of between year June in January, 2010 to 2010, from 575 samples altogether, collecting, selected a test group of 48 samples.One of them is removed from final analysis from the sample of twin pregnancy, so remaining 47 samples in test group.Making for the preparation of the sample of order-checking and the personnel of operating equipment is blind to clinical caryogram information.Pregnant age scope to similar (table 19) in the training group, seen.58% of invasive program is CVS, and is procedural demographic higher than overall, but also with the training category seemingly.50% experimenter is the Caucasian, the 27%th, and the Spaniard, the 10.4%th, Aisa people and 6.3% is African American.
In test group, the number of unique sequence label is from about 13M to 26M and difference.For unaffected sample, for karyomit(e) 21 and karyomit(e) 18, karyomit(e) dosage is respectively 0.313 and 0.527.For karyomit(e) 21, karyomit(e) 18 and karyomit(e) 13, test group NCV is shown in Figure 43 and be sorted in the table 20 and provide.
Table 20. test group classification data test group categories data
Figure BDA00002366924902771
* MX is the monosomy of X chromosome, and Y chromosome does not have sign
In test group, 13/13 experimenter with the caryogram that is designated as fetus T21 is correctly identified as having the NCV of scope from 5 to 14.Eight/eight experimenters with the caryogram that is designated as fetus T18 are correctly identified as having the NCV of scope from 8.5 to 22.In this test group, the simple sample with the C that classifies as T13 be classified as NCV wherein be approximately 3 without judging.
For test data set, all male sex's samples are correctly identified, comprise and have complex karyotype 46, the sample (table 11) of XY+ marker chromosomes (can not identify by cytogenetics). have 19 to be correctly validated in 20 women's samples, and women's sample is classified as without judging.Be three samples of 45, X for caryogram in the test group, have in three two to be correctly validated and to be monosomy X, and 1 be classified as without judging (table 20).
Twins
There is one to be from twin pregnancy for having in the initial sample of selecting of training group in four and the test group.Threshold value may be subject to the puzzlement of the different values of the cfDNA that expects in the environment of twin pregnancy as used herein.In the training group, be single chorion 47 from the caryogram of one of them twins sample, XY+21.Second twins sample be different ovum and amniocentesis each fetus is carried out separately.In this twin pregnancy, fetus has the caryogram of 47, XY+21 and another has a normal caryogram 46, XX.In these two cases, based on the acellular classification of method discussed above sample is classified as T21.Two twin pregnancies of in the training group other are correctly classified as for T21 unaffected (all twins all show the diploid caryogram for karyomit(e) 21).For the twin pregnancy in the test group, only to twins B established caryogram (46, XX), and this algorithm correctly to be classified as for T21 be unaffected.
Conclusion
These data show that extensive parallel sequencing can be used to measure a plurality of unusual fetus caryogram from pregnant woman's blood.These data show, can use independently the test group data to identify to 100% correct classification of sample with trisomy 21 and trisomy 18.Even in the situation of the fetus with abnormality karyotype, the neither one sample utilizes the algorithm of the method to be sorted out mistakenly.Importantly, this algorithm is determining in the group of two twin pregnancies to exist or not exist aspect the T21 same performance good equally.In addition, this research has checked the many continuous sample from a plurality of centers, not only represented the scope of the abnormal karyotype that people may see in the commercial clinical environment, also show the importance that not accurately sorted out by the sex gestation of common trisome, arrived unacceptable false positive rate with the height of emphasizing in current Prenatal Screening, to exist.These data provide valuable opinion for the great potential of utilizing the method in future.The analytical table of the subset of unique gene locus understands the increase on the consistent Poisson counting statistics value of variance.
These data are set up on the basis of the discovery of Fan and Quake, Fan has confirmed with Quake: use extensive parallel order-checking to determine that without wound the sensitivity of fetus dysploidy only is subjected to restriction (Fan and the Quake of counting statistics from Maternal plasma, PLos One 5, e10439[2010]).Because order-checking information spreads all over the collection of whole genome, so this method can be determined any dysploidy or the variation of other copy numbers, comprise and inserting and disappearance.Caryogram from one of them sample has a little disappearance between q21 and q23 in karyomit(e) 11, when sequencing data is analyzed, observe the minimizing of the relative number of regional interior label about 10% of a 25Mb initial at the q21 place in 500k base data box.In addition, in the training group, three property caryogram that have minute owing to the mosaicism in the cytogenetics analysis are arranged in the sample.These caryogram are: i) 47, and XXX[9]/45, X[6], ii) 45, X[3]/46, XY[17], and iii) 47, XXX[13]/45, X[7].The sample ii that shows some cells that contain XY is correctly classified as XY.The sample i (from the CVS process) and the iii (from amniocentesis) that all show the mixture of XXX and X cell by cytogenetics analysis (consistent with the mosaic Turner syndrome) are classified as respectively without judging and monosomy X.
In test during this algorithm, for the karyomit(e) 21 from the sample (Figure 43) of test group, another interesting data point is observed a NCV who has between-5 and-6.Although this sample is diploid by cytogenetics at karyomit(e) 21, this caryogram has been showed and the triploid chimerism of part: 47, XX+9[9 for karyomit(e) 9]/46, XX[6].Because karyomit(e) 9 is used in the karyomit(e) dosage (table 18) of determining karyomit(e) 21 in the denominator, this has reduced total NCV value.The result who provides in following instance 13 has confirmed to use normalization method karyomit(e) to determine the ability of fetus trisomy 9 in this sample.
The conclusion of the sensitivity of relevant these methods such as Fan only is only correct when employed algorithm can be considered at random any or systematic bias that sequence measurement brings.If this sequencing data is not by suitably normalization method, then the analytical results of gained will be inferior to counting statistics.The people such as Chiu notice in their recent paper, they use karyomit(e) 18 that extensive parallel sequence measurement obtains and 13 measuring result is coarse, and conclusion is more to study the mensuration (people such as Chiu, BMJ 342:c7401[2011]) that the method is applied to T18 and T13.The method of using in the people's such as Chiu paper has simply been used the number of interested chromosomal sequence label in their case karyomit(e) 21, this number has carried out normalization method by the overall number of the label in this order-checking round.The challenge part of this approach is: the distribution of label on each karyomit(e) can be from the order-checking round to order-checking round and difference, and has therefore increased the entire change that dysploidy mensuration is measured.For the result of Chiu algorithm is compared with the chromosomal dosage that uses in this example, use the method for people's recommendations such as Chiu to analyze again the test data of karyomit(e) 21 and 18, as shown in Figure 44.Generally, observed compression in the scope of NCV for each of karyomit(e) 21 and 18, and observed reducing of definite rate, wherein utilized the NCV threshold value 4.0 that is used for the dysploidy classification correctly to identify 10/13 T21 and 5/8 T18 sample from our test group.
The people such as Ehrich also only focus on T21 and the use algorithm identical with people such as Chiu (Ehrich et al., Am J Obstet Gynecol 204:205e1-e11[2011]).In addition, after a skew of the test group z score measures of observing them and outside comparable data (i.e. training group), they have carried out retraining to establish classification boundaries to test group.Although this method is feasible in principle, in the reality with challenging be to determine to require how many samples to train and need how long once carry out retraining to guarantee the correct of these grouped datas.A kind of method that alleviates this problem is to comprise contrast in each order-checking round, and these are calibrated to the amount of illumination baseline and for quantitative behavior.
The data of using present method to obtain show, when being used for that the chromosome counting data are carried out normalized algorithm when optimised, extensive parallel order-checking can be determined multiple fetal chromosomal abnormalities from pregnant woman's blood plasma.Being used for quantitative present method not only will check order and reduce to minimum with systematic variation at random between the round, also allow to spread all over whole genome dysploidy is classified, the most significant T21 of being and T18. require larger sample collection to test the algorithm of measuring for T13.For this purpose, carrying out clinical studyes prospective, blind, many places with the diagnostic accuracy of further proof present method.
Example 13
In all karyomit(e)s of single specimen, determine to exist or do not exist at least 5 kinds of different karyomit(e)s non- Orthoploidy
In order to prove that present method is used for determining each group parent specimen (test group 1; Example 12) there is or do not exist the ability of any karyomit(e) dysploidy in, in unaffected test group sample (training group 1; Example 12) identified the normalization method chromosome sequence of systematically determining in, and these normalization method chromosome sequences is used to calculate all the chromosomal karyomit(e) dosage for each specimen.Determining to exist in each test and the training group sample or do not exist any or multiple different complete fetal chromosomal dysploidy is by the order-checking information realization that obtains from the single order-checking round that each single sample is carried out.
Use karyomit(e) density, namely for the number of the sequence label of each karyomit(e) identification in the sample of each test group of explanation in example 12,, systematically definite normalization method chromosome sequence that a monosome dosage has been determined to be comprised of a monosome or a group chromosome by calculating among karyomit(e) 1-22, X and the Y each.By use each possible chromosomal systematically calculate as denominator for each chromosomal karyomit(e) dosage determined among karyomit(e) 1-22, X and the Y each, the normalization method chromosome sequence systematically determined.For example, for karyomit(e) 21 as interested karyomit(e), the ratio of the number of tags sum that the number of the number of the sequence label that obtains for karyomit(e) 21 (interested karyomit(e)) as (i) and the sequence label that (ii) obtains for each residue karyomit(e) and all possible combination for residue karyomit(e) (not comprising karyomit(e) 21) obtain, calculated karyomit(e) dosage, that is: 1,2,3,4,5 etc. until 20,21,22, X and Y; 1+2,1+3,1+4,1+5 etc. are until 1+20,1+22,1+X and 1+Y; 1+2+3,1+2+4,1+2+5 etc. are until 1+2+20,1+2+22,1+2+X and 1+2+Y; 1+3+4,1+3+5,1+3+6 etc. are until 1+3+20,1+3+22,1+3+X and 1+3+Y; 1+2+3+4,1+2+3+5,1+2+3+6 etc. are until 1+2+3+20,1+2+3+22,1+2+3+X and 1+2+3+Y; And and so on, like this so that all possible combination of all karyomit(e) 1-20,22, X and Y all is used as normalization method chromosome sequence (molecule) to determine all possible karyomit(e) dosage for each each the interested karyomit(e) in these qualified (dysploidy) samples in the training group.Determined in the same way karyomit(e) dosage for the karyomit(e) 21 in all training group samples, and these normalization method chromosome sequences of systematically determining for karyomit(e) 21 are determined as causing in a dosage having for 21 and spread all over the single or group chromosome that all training samples have minimum variability.Repeated identical analysis with determine will as for the carrying out of each residue karyomit(e) (comprising karyomit(e) 13,18, X and Y) monosome or the chromosomal of the normalization method chromosome sequence systematically determined, that is, used that all possible genome is incompatible to be determined in all training samples for the normalization method sequence (individual chromosome or a group chromosome) of every other interested karyomit(e) 1-12,14-17,19-20,22, X and Y.Therefore, all regard all karyomit(e)s as interested karyomit(e), and determined a normalization method sequence of systematically determining in all karyomit(e)s in each the unaffected sample in the training group each.Table 21 provides the individual chromosome or the genome that go out as the normalization method recognition sequence of systematically determining for each interested karyomit(e) 1-22, X and Y.As highlighting by table 21, for some interested karyomit(e), the normalization method chromosome sequence of systematically determining is confirmed as single karyomit(e) (for example when karyomit(e) 4 is interested karyomit(e)), and for other interested karyomit(e), the normalization method chromosome sequence of systematically determining is confirmed as a group chromosome (for example when karyomit(e) 21 is interested karyomit(e)).
Table 21. normalization method chromosome sequence chromosomal for all, that systematically determine
Figure BDA00002366924902821
In table 22, provide mean value, standard deviation (SD) and the variation coefficient (CV) for each the determined normalization method chromosome sequence of systematically determining in all karyomit(e)s.
Table 22. is for mean value, standard deviation (SD) and the change of the normalization method chromosome sequence of systematically determining Different coefficient (CV)
Interested karyomit(e) Mean value SD CV
1 0.36637 0.00266 0.72%
2 0.31580 0.00068 0.22%
3 0.21983 0.00055 0.18%
4 0.98191 0.02509 2.56%
5 0.30109 0.00076 0.25%
6 0.21621 0.00059 0.27%
7 0.21214 0.00044 0.21%
8 0.25562 0.00068 0.27%
9 0.12726 0.00034 0.27%
10 0.24471 0.00098 0.40%
11 0.26907 0.00098 0.36%
12 0.12358 0.00029 0.23%
13a 0.26023 0.00122 0.47%
14 0.09286 0.00028 0.30%
15 0.21568 0.00147 0.68%
16 0.25181 0.00134 0.53%
17 0.46000 0.00248 0.54%
18a 0.10100 0.00038 0.38%
19 1.43709 0.02899 2.02%
20 0.19967 0.00123 0.62%
21a 0.07851 0.00053 0.67%
22 0.69613 0.01391 2.00%
Xb 0.46865 0.00279 0.68%
Yb 0.00028 0.00004 14.97%
aDo not comprise trisomy
bFemale child
The variation (reflecting such as the value by CV) that spreads all over the karyomit(e) dosage of all training samples has confirmed that the normalization method chromosome sequence of systematically determining is used for providing the purposes of a large signal to noise ratio and dynamicrange, thereby allow with high susceptibility and high specificity dysploidy to be determined, shown in following content.
For the susceptibility and the specificity that prove the method have been determined, for in all interested karyomit(e) 1-22, X and Y each sample in the training group for the karyomit(e) dosage of all interested karyomit(e) 1-22, X and Y, and each of all samples in the test group of explanation in example 11 used in above table 21, provide accordingly, the normalization method chromosome sequence systematically determined.
Use is for each interested chromosomal normalization method chromosome sequence of systematically determining, determined the existence of any fetus dysploidy in the sample of each training group and in each specimen or do not existed, that is, determined that whether karyomit(e) 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22, X and Y contain a complete fetal chromosomal dysploidy to each sample.Obtained sequence information for all karyomit(e)s in the sample of each training group and in each specimen, be the number of sequence label, and use the number of the sequence label that obtains with those normalization method chromosome sequences (table 21) corresponding, that systematically determine of in test group, determining to calculate as previously discussed a monosome dosage for each karyomit(e) in each training and testing sample.The number of the sequence label that obtains for the normalization method chromosome sequence of systematically determining in each training sample is used to determine each chromosomal karyomit(e) dosage in each training sample, and the number of the sequence label that obtains for the normalization method chromosome sequence of systematically determining in each specimen is used to determine each chromosomal karyomit(e) dosage in each specimen.In order to ensure dysploidy being carried out safety and effectively classification, as illustrated in the example 12, selected same conservative border.
Training group result
In Figure 45, provide the drawing for karyomit(e) 21,18 and 13 karyomit(e) dosage in the sample of training group of normalization method chromosome sequence that use systematically determines.When using the normalization method chromosome sequence of systematically determining, namely during the group of karyomit(e) 4+14+16+20+22, wherein 8 samples of clinical caryogram indication T21 have the NCV between 5.4 and 21.5.When using the normalization method chromosome sequence (being the group of karyomit(e) 4+14+16+20+22) of systematically determining, wherein 8 samples of clinical caryogram indication T21 have the NCV between 5.4 and 21.5.When using the normalization method chromosome sequence (being the group of karyomit(e) 2+3+5+7) of systematically determining, wherein 4 samples of clinical caryogram indication T18 have the NCV between 3.3 and 15.3.The T21 sample of training group illustrates (zero) as last 8 samples of karyomit(e) 21 data; The T18 sample of training group illustrates (△) as last 4 samples of karyomit(e) eighteen data; And the T13 sample of training group illustrates () as last 2 samples of karyomit(e) 13 data.
These data show, can determine different, complete fetal chromosomal dysploidy with the normalization method chromosome sequence and with its correct classification with high degree of confidence.Because all samples with affected caryogram all have the NCV greater than 3, have about 0.1% possibility, that is: these samples are the part in the unaffected distribution.
Similar with euchromosome, when the normalization method chromosome sequence (being the group of karyomit(e) 4+8) of systematically determining when being used to chromosome x, and when the normalization method chromosome sequence (being the group of karyomit(e) 4+6) of systematically determining when being used to karyomit(e) Y, all women and male fetus in the training group all are correctly identified out.In addition, all 5 monosomy X samples are all identified.Figure 46 A shows the X chromosome of NCV (X-axis) determine for to(for) each sample in the training group and the graphic representation of the NCV (Y-axis) that determines for Y chromosome.Be that all samples of monosomy X has the NCV value less than-4.83 by caryogram.Those monosomy X sample with caryogram consistent with 45, X caryogram (completely or chimeric) has as expected one and approaches zero Y NCV value.Women's sample all is gathered near the NCV=0 for X and Y.
Test group result
In Figure 47, provide and use the drawing for karyomit(e) 21,18 and 13 karyomit(e) dosage in specimen of the relevant normalization method chromosome sequence of systematically determining.When using the normalization method chromosome sequence (being the group of karyomit(e) 4+14+16+20+22) of systematically determining, wherein there are 13 to be correctly validated out the NCV that has between 7.2 and 16.3 in 13 samples of clinical caryogram indication T21.When using the normalization method chromosome sequence (when being the group of karyomit(e) 2+3+5+7) of systematically determining, all 8 samples of clinical caryogram indication T18 identified NCV that has between 12.7 and 30.7 all wherein.When using the normalization method chromosome sequence (being the group of karyomit(e) 2+3+5+7) of systematically determining, all 8 samples of clinical caryogram indication T18 identified NCV that has between 12.7 and 30.7 all wherein.The T21 sample of test group illustrates (zero) as last 13 samples of karyomit(e) 21 data; The T18 sample of test group illustrates (△) as last 8 samples of karyomit(e) eighteen data; And the T13 sample of test group illustrates () as the last sample of karyomit(e) 13 data.
These data show, can with high degree of confidence with systematically determine, the normalization method chromosome sequence determines different complete fetal chromosomal dysploidy and with its correct classification.With the training category seemingly, all samples with affected caryogram all has the NCV greater than 7, this shows a minimum possibility, that is: these samples are parts of unaffected distribution.(Figure 47).
Similar with euchromosome, when the normalization method chromosome sequence (being the group of karyomit(e) 4+8) of systematically determining when being used to chromosome x, and when the normalization method chromosome sequence (being the group of karyomit(e) 4+6) of systematically determining when being used to karyomit(e) Y, all women and male fetus in the test group all are correctly identified out.In addition, all 3 monosomy X samples are all identified.Figure 46 B shows the X chromosome of NCV (X-axis) determine for to(for) each sample in the test group and the drawing of the NCV (Y-axis) that determines for Y chromosome.
As described above, present method allows to determine to exist or do not exist the karyomit(e) dysploidy a kind of complete or part of each karyomit(e) 1-22, X and Y in each sample.Except measure complete karyomit(e) dysploidy T13, T18, T21 monosomy X, the method has also been measured therein the existence of trisomy 9 in the specimen.When using the normalization method chromosome sequence (being the group of karyomit(e) 3+4+8+10+17+19+20+22) of systems measurement, for interested karyomit(e) 9, identified a sample (Figure 48) with NCV of 14.4.This sample is corresponding to the specimen in the example 12, and this specimen basis is under a cloud for the low dosage of the deformity of karyomit(e) 21 to be dysploidy (wherein having used karyomit(e) 9 as the normalization method chromosome sequence in example 12) for karyomit(e) 9.
These data show that the sample that 100% sample has the clinical caryogram of indication T21, T13, T18, T9 and monosomy X is correctly identified out.Figure 49 shows in each of 47 specimen the graphic representation for the NCV of each of karyomit(e) 1-22.The median of NCV is normalized to zero.These data show, the sensitivity of method of the present invention (comprising the normalization method chromosome sequence that use is systematically determined) with 100% and 100% specificity have been determined existing of all karyomit(e) dysploidy of 5 types of existing in this test group, and point out that clearly the method can be identified in any sample any one any karyomit(e) dysploidy for karyomit(e) 1-22, X and Y.
Example 14
Determine to exist or do not have part fetal chromosomal dysploidy: determine cat's eye syndrome
DiGeorge syndrome (22q11.2 deletion syndrome) by the illness that the defective in chromosome 22 causes, causes the bad development of several body systems.Usually the medical care problem that is associated with DiGeorge syndrome comprises heart defect, bad function of immune system, cleft palate, parathyroid gland and behavioral disorder.The number of the problem that is associated with DiGeorge syndrome and severity have very large variation.Almost each have the people of DiGeorge syndrome need to be from the expert's in a plurality of fields treatment.
In order determine there to be or not exist the excalation of fetal chromosomal 22, obtained a blood sample by mother being implemented venipuncture, and described in cfDNA such as the above example and prepare.CfDNA behind the purifying is connected on the aptamer and use Illumina cBot clusters, and station (clusterstation) makes it stand the cluster amplification.Use reversible dyestuff terminator to carry out extensive parallel order-checking, to produce millions of 36bp readings.These sequence readings and human hg19 reference gene group are compared, and the reading that will be mapped to uniquely on the reference gene group is counted as label.
The group of a qualified samples that all is known as the diploid (be chromosome 22 or its any part is known only exist with diploid condition) of chromosome 22 at first checked order and do not analyze with each (not comprising regional 22q11.2) for 1000 sections of 3 megabasses (Mb) and obtain a plurality of sequence labels.If human genome comprises about 3,000,000,000 bases (3Gb), 1000 sections of 3Mb have approximately formed genomic remainder separately.Each can be separately or as the group service of a sector sequence, these sector sequences are used to determine the normalization method sector sequence of interested section, i.e. the 3Mb of 22q11.2 zone in these 1000 sections.The number that is mapped to the sequence label on each single 1000bp section is used for calculating the section dosage in the 3Mb zone of 22q11.2 individually.In addition, all possible combination of two or more sections is used to determine the section dosage for interested section in all qualified samples.Cause having this single 3Mb section of the section dosage that spreads all over the minimum variability of sample or the combination of two or more 3Mb sections and be selected as the normalization method sector sequence.
The number that is mapped to the sequence label on the interested section in each qualified samples is used to determine the section dosage in each qualified samples.The mean value of the section dosage in all qualified samples and standard variance are calculated and are used for definite threshold, section dosage and these threshold values of determining in specimen can be compared.Preferably, calculate normalized section value (NSV) for all the interested sections in all qualified samples, and be worth to come setting threshold with these.
Subsequently, the number that is mapped to the label of normalization method sector sequence in corresponding specimen is used to determine the dosage of interested section in the specimen.Calculate a normalization method section value (NSV) for the section in the specimen as described earlier and the NCV of interested section in the specimen and the threshold value of using qualified samples to determine are compared to determine to exist or do not exist the disappearance of 22q11.2 in specimen.
Test NCV<-3 show a kind of the losing in interested section, namely have the excalation of chromosome 22 (22q11.2) in specimen.
Example 15
The faeces DNA test of carrying out for obtaining predicting the outcome of II stage colorectal cancer patients
About 30% will recur and die from the disease that it is suffered from all II stage colorectal cancer patients.The II stage colorectal cancer patients that palindromia occurred demonstrates significantly more at karyomit(e) 4,5,15q, 17q and 18q to be lost.Specifically, II stage colorectal cancer patients losing on 4q22.1-4q35.2 shows with worse result and is associated.The patient who determine to exist or do not exist these genomes to change can assisted Selection to carry out adjuvant therapy (people such as Brosens, analysis of cells pathology/cell tumour is learned (AnalyticalCellular Pathology/Cellular Oncology) 33:95-104[2010]).)
In order to determine in suffering from II stage colorectal cancer patients, to exist or do not have one or more chromosome deletioies in 4q22.1 to the 4q35.2 zone, ight soil and/or plasma sample have been obtained from this or these patient.Faeces DNA is according to people such as Chen, J Natl Cancer Inst 97:1124-1132[2005] the method preparation described; And plasma dna is to prepare according to the method for describing in the above example.According to NGS method described here DNA is checked order, and the sequence information of this or these patient's sample is used to calculate the section dosage for one or more sections of crossing over 4q22.1 to 4q35.2 zone.Section dosage is to determine with the normalization method section dosage of formerly determining in a qualified ight soil and/or plasma sample group respectively.Calculated the section dosage in the specimen (patient's sample), and to have or do not exist one or more chromosome dyads disappearances in 4q22.1 to 4q35.2 zone be by each interested section and the threshold value of being set by the NSV in the qualified samples group are compared to determine.
Example 16
Detect by Maternal plasma DNA being checked order to carry out full gene group fetus dysploidy: in prediction The accuracy of the diagnosis in property, the blind multicenter study
Be used for to determine that the parent specimen exists or do not exist the method for dysploidy to be used for perspective study, and the accuracy of its diagnosis is as mentioned below and illustrate.Perspective study proves that further the inventive method is used for for the effect of crossing over genomic gemini detection fetus dysploidy.Actual pregnant woman colony is simulated in blind research, and wherein the fetus caryogram is unknown, and all samples of selecting to have any abnormal karyotype checks order.Definite result of the classification that will make according to the inventive method compares to determine that the method is to the diagnosis capability of multiple chromosomal aneuploidy with the fetus caryogram that derives from the invasive program.
The general introduction of this example
In perspective blind research, collect blood sample (clinicaltrials.gov NCT01122524) at 60 U.S.'s websites from 2,882 women that carry out the antenatal diagnosis program.
Independently the biostatistican selects to have all single pregnancies of any abnormal karyotype and the gestation with euploid caryogram of selecting at random of equal number.The method according to this invention is carried out chromosome classification to each sample and is compared with the fetus caryogram.
In the analysis cohort of 532 samples, the case of 89/89 trisomy 21 (sensitivity 100% (95%CI 95.9-100)), (the sensitivity 97.2% of the case of 35/36 trisomy 18, (95%CI 85.5-99.9)), (the sensitivity 78.6% of the case of 11/14 trisomy 13, (95%CI 49.2-99.9)), 232/233 women (sensitivity 99.6%, (95%CI 97.6->99.9)), 184/184 the male sex (sensitivity 100%, (95%CI 98.0-100)) and the case (sensitivity 93.8%, (95%CI 69.8-99.8)) of 15/16 monosomy X be classified.In unaffected experimenter, there are not euchromosome dysploidy false positive (100% specificity, (95%CI>98.5-100)).In addition, the fetus, three routine translocation trisomicses, two example other euchromosome trisomys (20 and 16) and other sex chromosome dysploidy (XXX, XXY and XYY) that have trisomy 21 (3/3), trisomy 18 (1/1) and monosomy X (2/7) chimerism are correctly classified.
These results prove that further present method detects the effect of the fetus dysploidy of crossing over genomic gemini with Maternal plasma DNA.Be used for trisomy 21,18,13 and monosomy the X highly sensitive and the specificity that detect show that present method can be combined in existing dysploidy examination algorithm to reduce unnecessary invasive program.
Material and method
Carry out MELISSA (maternal blood be exactly the source of diagnosing fetal dysploidy) research as perspective multicenter observational study, with blind nido case: check analysis.Enlist the antenatal program of experience invasive to determine the pregnant woman more than 18 years old and 18 years old (Clinicaltrials.gov NCT01122524) of fetus caryogram.Qualified criterion comprises gestation 80 day week and the pregnant woman of 22 weeks between 0 day, and it satisfies at least one item in following additional criteria: age 〉=38 year old; Positive examination test result (serum analysis value and/or nuchal translucency (NT) observed value); There is the ultrasonic wave marker relevant with fetus dysploidy increased risk; Or before nourished the aneuploid fetus.From agreeing that all women that participate in obtain the written consent book.
Register according to the scheme of the Ethic review council (IRB) approval of each mechanism at the medical centre place that 60 geography in 25 states disperse.Engage two clinical study tissues (CRO) (elder brother safe (Quintiles), De Han, the North Carolina state; And An Pusen (Emphusion), San Francisco, California) keep studying and be blind and clinical data management, data monitoring, biometrics and data analysis service are provided.
Before any invasive program, peripheral veins blood sample (17mL) is collected in two acid citrate glucose (ACD) pipe (must Supreme Being), remove sign and number with unique research and carry out mark.The position researchist will study numbering, data and blood drawing time and be input in the safe electronic medical records account (eCRF).Whole blood sample is transported to laboratory (the healthy company (Verinata Health, Inc.) of Wei Ruinatai, California) from a plurality of websites all through the night in the container of controlled temperature system.After receiving and carrying out sample survey, according to previous described method (referring to example 13) preparation cell-free plasma and in 2 to 4 aliquots containigs freezer storage when-80 ℃ of lower until order-checkings.If it is to receive, touch up to be cool and to comprise at least 7mL blood that recording laboratory carries out the date and time sample of sample reception all through the night, determine that so it is fit to analyze.Qualified sampling report is to CRO and be used for the selection (vide infra and Figure 50) of stochastic sampling tabulation in the time of will receiving weekly.The clinical data that will derive from the current gestation of women and fetus caryogram by the website researchist is input among the eCRF and by CRO to be verified.
Sample size determine tolerance range based on the estimated value of the target zone of the performance characteristic (sensitivity and specificity) of index test.Exactly, determine the number of the contrast of the case of influenced (T21, T18, T13, the male sex, women or monosomy X) and uninfluenced (non-T21, non-T18, non-T13, the non-male sex, non-women or non-monosomy X), in order to assess accordingly sensitivity and specificity (N=(1.96 √ p (1-p)/error span) 2, wherein p=sensitivity or specific estimated value) in the less error span of predesignating based on normality approximation.Suppose that real sensitivity is 95% or larger, the sample size between 73 to 114 examples guarantees that the tolerance range of sensitivity estimated value will be so that 95% lower bound of putting letter subregion (CI) will be 90% or larger (error span≤5%).For less sample size, the evaluated error amplitude larger (from 6% to 13.5%) of the 95%CI of plan sensitivity.In order to estimate specificity with larger tolerance range, at the larger unaffected contrast number of sample phase plan (for about 4: 1 ratios of case).The tolerance range of guaranteeing thus specific estimated value reaches at least 3%.Therefore, along with sensitivity and/or specificity increase, the tolerance range of putting the letter subregion also will increase.
Determine based on sample size, (minimum 110 are subjected to case that T21, T18 or T13 affect and 400 unaffected with regard to trisomy to CRO design stochastic sampling scheme so that order-checking with the tabulation that produces selected sample, thereby allowing to reach in these cases half has except 46, caryogram beyond XX or 46, the XY).Suitable selection has the experimenter of single pregnancy and qualified blood sample.Get rid of have failed test sample, without the experimenter (Figure 50) of caryogram record or polycyesis.In whole research, regularly produce tabulation and deliver to the healthy laboratory of Wei Ruinatai.
For six kinds of independent classifications each qualified blood sample is analyzed.These classifications are for karyomit(e) 21,18 and 13 aneuploid state, and the sex state of the male sex, women and monosomy X.Although be still blind, for six kinds of each plasma dna sample independently each in the classification produce one of three kinds of classification (affected, unaffected or be not classified) perspectively.When using this scheme, that same sample may be classified as in an analysis is affected the dysploidy of karyomit(e) 21 (for example for) and be classified as unaffected the euploid of karyomit(e) 18 (for example for) in another is analyzed.
The conventional medium cell genetic analysis of the cell that obtains by chorionic villus sampling (CVS) or amniocentesis is used as reference standard in this research.Carrying out the fetus caryogram in participating in the normally used diagnostic test of website chamber determines.If the patient has experienced CVS and amniocentesis after registration, the caryogram that so amniocentesis is produced is used for researching and analysing.If can't obtain the caryogram in mid-term, allow so fluorescence in situ hybridization (FISH) result (table 24) of targeting staining body 21,18,13, X and Y.All abnormal karyotype reports are (namely except 46, XX and 46, beyond the XY) all by the cytogeneticist's examination that authenticates through the council, and with respect to karyomit(e) 21,18 and 13 and sex state XX, XY and monosomy X be categorized as affected or unaffected.
The following abnormal karyotype of the stipulations predesignated agreement regulation will be appointed as by the cytogeneticist ' being inspected ' state of caryogram: triploidy, tetraploidy, related karyomit(e) 21,18 or 13 complex karyotype (for example mosaic), the heterosomal mosaic with mixing, sex chromosome dysploidy or the caryogram that can not be translated by source document fully (for example the unknown originate marker karyomit(e)) except trisomy.Because cytogenetics diagnosis is for known to the order-checking laboratory, so all are all analyzed and are appointed as the classification (order-checking classification) of using order-checking information to determine according to the inventive method independently through the sample of cytogenetics inspection, but are not included in the statistical study.The state that checked only belongs to relevant one or more (for example will check mosaic T18 from karyomit(e) 18 is analyzed, but by other analyses, such as karyomit(e) 21,13, X and Y, think ' unaffected ') (tables 25) in six kinds of analyses.From analyze, do not check out when stipulations design can't perfect foresight other unusual and rare complex karyotypes (table 26).
Contained data only limit to authorized user's (research website, CRO and signatory clinical staff) in eCRF and the clinical data storehouse.Any employee of Wei Ruinatai health all can not access until when making known.
After receiving the chance sample tabulation from CRO, described in example 13, from the process selected plasma sample that thaws, extract total Cell-free DNA (mixture of parent and fetus).Utilize Yi Lumina TruSeq test kit v2.5 to prepare sequencing library.Check order, carry out at Yi Lumina HiSeq 2000 instruments in the healthy laboratory of Wei Ruinatai (6 clumps, i.e. 6 sample/swimming lanes).Obtain the single-ended reading of 36 base pairs.Shine upon reading at whole genome, and the sequence label on each interested karyomit(e) is counted and is used for as indicated abovely for classification independently sample being classified.
Clinical stipulations need the evidence of foetal DNA existence with the report classification results.The classification of the male sex or aneuploid is regarded as the ample evidence of foetal DNA.In addition, also for the existence of foetal DNA, use two kinds of allele-specific methods that each sample is tested.In first method, use AmpflSTRMinifiler test kit (life technology (Life Technologies), San Diego, California) to examine the existence of the fetus component in the Cell-free DNA.On ABI 3130 genetic analysis instrument, carry out the electrophoresis of STR (STR) amplicon according to the stipulations of manufacturers.The intensity of each peak value of reporting of the per-cent form by relatively being the intensity summation that accounts for all peak values is analyzed all nine the str locus seats in this test kit, and the existence of minor peaks is used for providing the evidence of foetal DNA.In the situation that does not have the micro-STR that can identify, aliquots containig with the SNP group sample for reference with 15 kinds of single nucleotide polymorphism (SNP), wherein from people's such as Jede (Kidd) group, select, average heterozygosity 〉=0.4 (people such as Jede, jus gentium medical science (Forensic Sci Int) 164 (1): 20-32[2006]).The allele-specific method that can be used for detecting and/or quantize the foetal DNA in the maternal sample is described in the U.S. Patent Publication 20120010085,20110224087 and 20110201507, and these announcements are incorporated herein by reference.
Normalized karyomit(e) value (NCV) is to arrange to determine by calculate all euchromosomes and heterosomal all possible denominator described in example 13, yet, because the order-checking in this research is before to carry out with the different instrument of Multi-example/swimming lane work from us, so have to determine new normalization method karyomit(e) denominator.Normalization method karyomit(e) denominator in the current research be based on analysis and research sample before to have 110 independently the training group of (namely not being from the MELISSA qualified samples) unaffected sample (being qualified sample) check order and determine.New normalization method karyomit(e) denominator is to arrange to determine by calculating all euchromosomes and heterosomal all possible denominator, thereby for whole genomic all karyomit(e)s the variation of unaffected training group is minimized (table 23).
Being applied to providing the NCV rule of the euchromosome classification of each specimen is described in the example 12, namely for the classification of autosomal dysploidy, it is that affected (i.e. this chromosomal aneuploid) and NCV<2.5 item are unaffected with chromosome classification that NCV>4.0 require chromosome classification.Have the autosomal sample of NCV between 2.5 and 4.0 and be called " not being classified ".
Sex chromosome classification in this test is undertaken by the NCV that uses in order for X and Y, and is as follows:
1. if NCV X<-4.0 and NCV Y<2.5 classify sample as monosomy X so.
2. if NCV X>-2.5 and NCV X<2.5 and NCV Y<2.5 classify sample as women (XX) so.
3. if NCV X>4.0 and NCV Y<2.5 classify sample as XXX so.
4. if NCV X>-2.5 and NCV X<2.5 and NCV Y>33 classify sample as XXY so.
5. if NCV X<-4.0 and NCV Y>4.0 classify sample as the male sex (XY) so.
6. if satisfy condition 5, but NCV Y is about 2 times of NCV X expection observed value, classifies sample as so XYY.
7. if the NCV of chromosome x and Y does not meet any above criterion, classify sample as so and with regard to sex, be not classified.
Because the laboratory is blind to clinical information, so do not regulate sequencing result for any following demographic variable: parent weight index, smoking state, have diabetes, pregnant type (spontaneous or assist), previous gestation, previous dysploidy or conceptional age.Utilize that neither parent is not again male parent's sample classifies and do not depend on specific gene seat or allelic observed value according to the classification of present method.
Make known and analyze before sequencing result is returned independently signatory biostatistican.The personnel of research website, CRO (comprising the biostatistican who produces the stochastic sampling tabulation) and signatory cytogeneticist are blind to sequencing result.
All chromosomal normalization method chromosome sequences of systematically determining of table 23.
Figure BDA00002366924902941
Statistical method is recorded in the detail statistics plan of analysis of this research.For six kinds of each that analyze in the classification, use clo amber-Pearson method (Clopper-Pearson method) meter sensitivity and specific point estimate and 95% put the letter subregion accurately.For all statistical estimators that carry out, remove and not detect foetal DNA, ' being inspected ' complex karyotype (according to the agreement of stipulations definition) or to test ' not being classified ' sample by order-checking.
The result
Between in June, 2010 and in August, 2011,2,882 pregnant woman have been registered in this research.The feature of qualified experimenter and selected cohort is provided in the table 24.Registration and blood is provided but finds subsequently during data monitoring, to go beyond the actual conceptional age that comprises criterion and when registration and surpass the experimenter in 22 0 day week and allow to keep under study for action (n=22).Three in these samples in selected group.Figure 50 shows sample in registration and the flow process between analyzing.There are 2,625 samples that are fit to selection.
Table 24. patient demographic
Figure BDA00002366924902951
Figure BDA00002366924902971
* the GA when the invasive program.
The * penetration coefficient that ultrasonic wave is unusual in having the fetus of abnormal karyotype is higher
Abbreviation: BMI-weight index; The IUGR-intrauterine fetal growth retardation
According to the stochastic sampling scheme, the subject group that selection has all qualified experimenters of abnormal karyotype and nourishes the euploid fetus be used for to be analyzed (Figure 50 B), is approximately 4: 1 unaffected so that the Research Group that always checks order produces for trisomy 21: affected experimenter's ratio.By this technique, select 534 experimenters.Because the sample tracing problem based is removed two samples from analyze, wherein whole chain of custody is not passed through quality audit (Figure 50) between sample hose and the data acquisition subsequently.Produce thus by 60 532 experimenters that study 53 contributions in the website for analysis.The demography of selected cohort is similar to total cohort.
Test performance
Figure 51 A-51C shows the schema that karyomit(e) 21,18 and 13 dysploidy are analyzed, and Figure 51 D-51F shows the gender analysis flow process.Table 27 shows in six analyses sensitivity, the specificity of each and puts the letter subregion, and Figure 52,53 and 54 shows the diagram sample distribution according to the NCV after the order-checking.Analyze in the classification at all 6, do not remove 16 samples (3.0%) owing to detecting foetal DNA.After making known, there is not recognizable Clinical symptoms in these samples.The number of the caryogram of inspection of all categories depends on the situation (fully being specified among Figure 52) of analyzing.
Sensitivity and specificity for detection of the method for analyzing the T21 in the colony (n=493) are 100% (95%CI=95.9,100.0) and 100% (95%CI=99.1,100.0) (table 27 and Figure 51 A) accordingly.This example comprises following correct classification: a kind of complicated T21 caryogram 47, XX, inv (7) (p22q32) ,+21; With two kinds of transposition T21 that result from Robertsonian translocation (Robertsoniantranslocations), wherein a kind of with regard to monosomy X or mosaic (45, X ,+21, der (14; 21) q10; Q10) [4]/46, XY ,+21, der (14; 21) q10; Q10) [17] and 46, XY ,+21, der (21; 21) q10; Q10).
Detecting sensitivity and the specificity of analyzing the T18 in the colony (n=496) is 97.2% (85.5,99.9) and 100% (99.2,100.0) (table 27 and Figure 51 B).Although checked (according to stipulations) from initial analysis, four samples that have the mosaic caryogram with regard to T21 and T18 all correctly are categorized as with regard to dysploidy ' affected ' (table 25) by method of the present invention.Because they are correctly detected, so they are pointed out in the left side of Figure 51 A and 51B.All all the other samples that checked all correctly be categorized as with regard to karyomit(e) 21,18 and trisomy 13 with regard to unaffected (table 25).Detecting sensitivity and the specificity of analyzing the T13 in the colony is 78.6% (49.2,99.9) and 100% (99.2,100.0) (Figure 51 C).A detected T13 case by due to the Robertsonian translocation (46, XY ,+13, der (13; 13) q10; Q10).In karyomit(e) 21 is analyzed, seven samples that are not classified (1.4%) are arranged, in karyomit(e) 18 is analyzed, five (1.0%) are arranged, and in karyomit(e) 13 is analyzed, two (0.4%) (Figure 51 A-51C) arranged.In all categories, have three samples overlapping, these samples have concurrently the caryogram that is inspected (69, XXX) and do not detect foetal DNA.A sample that is not classified during karyomit(e) 21 analyzed correctly is identified as the T13 of karyomit(e) 13 in analyzing, and a sample that is not classified during karyomit(e) 18 analyzed correctly is identified as the T21 of karyomit(e) 21 in analyzing.
The caryogram that table 25. is inspected
Figure BDA00002366924902991
*Analyze the experimenter who gets rid of classifications owing to the marker karyomit(e) in the clone from all.
* Caryogram 48, XXY ,+18 are not classified and do not detect the experimenter of sex chromosome dysploidy in karyomit(e) 18 is analyzed.
Unusual and the complicated caryogram that table 26. is not inspected
Figure BDA00002366924903002
Figure BDA00002366924903011
*After making known, notice in the order-checking label from karyomit(e) 6 that the normalized karyomit(e) value (NCV) of increase is 3.6.
Be used for determining that the sex chromosome analysis colony (women, the male sex or monosomy X) of the method performance is 433.The extracted arithmetic that we are used for that the sex state is classified allows the sex chromosome dysploidy is determined accurately, thereby obtains the higher number of results that is not classified.Sensitivity and specificity for detection of diploid women state (XX) are 99.6% (95%CI=97.6,>99.9) and 99.5% (95%CI=97.2,>99.9) accordingly; Sensitivity and specificity for detection of the male sex (XY) all are 100% (95%CI=98.0,100.0); And (45, sensitivity X) and specificity are 93.8% (95%CI=69.8,99.8) and 99.8% (95%CI=98.7,>99.9) (Figure 33 D-f) for detection of monosomy X.Although by analytical review (according to stipulations), but the order-checking of mosaic monosomy X caryogram is classified as follows (table 25): 2/7 is classified as monosomy X, 3/7 is classified as and has the Y chromosome component that is classified as XY, and has 2/7 of XX chromosome complement and be classified as the women.Two samples that the method according to this invention is categorized as monosomy X have caryogram 47, XXX and 46, XX.For caryogram 47, XXX, 47, XXY and 47, XYY, 8/10ths sex chromosome dysploidy is by correctly classification (table 25).If the sex chromosome classification is confined to monosomy X, XY and XX, can correctly be categorized as the male sex to the sample that major part is not classified so, but can not identifies XXY and XYY dysploidy.
Except karyomit(e) 21,18, trisomy 13 and sex are classified exactly, sequencing result can also with in two samples (47, XX ,+16 and 47, XX ,+20) for the dysploidy of karyomit(e) 16 and 20 correctly classify (table 26).Interesting is that long-armed (6q) and two clinically samples of complicated variation that copy (one of them is 37.5 megabasses in size) with karyomit(e) 6 show that the order-checking label in the karyomit(e) 6 causes NCV to increase (NCV=3.6).In another sample, the method according to this invention detects the dysploidy of karyomit(e) 2, but do not observe in the fetus caryogram when amniocentesis (46, XX).Other complex karyotype varients shown in the table 25 and 26 comprise the sample from other unusual fetuses that have chromosome inversion, disappearance, transposition, triploidy and do not detect herein, but may use method of the present invention classifying under the higher order-checking density and/or under further algorithm optimization.In these cases, method of the present invention can correctly be categorized as sample unaffected and sex with regard to trisomy 21,18 or 13.
In this research, 38/532 by analysis sample is from the women who lives through supplementary reproduction.Wherein, 17/38 sample has chromosome abnormalty; In this subgroup, do not detect false positive or false negative.
The sensitivity of table 27. the method and specificity
Figure BDA00002366924903021
Discuss
Should determine that the perspective study of whole karyomit(e) fetus dysploidy was that design is used for the situation of sample collection in the simulating reality world, processing and analysis by Maternal plasma.Obtain whole blood sample at the registration website, do not need to process immediately, and be transported to the order-checking laboratory all through the night.With the perspective study that before only the related to karyomit(e) 21 (people such as Pa Luomaiji (Palomaki), medical genetics (Genetics in Medicine) 2011:1) opposite, in this research, all qualified samples with any abnormal karyotype are checked order and analyze.The order-checking laboratory does not know that in advance which fetal chromosomal may be influenced, does not know the ratio of aneuploid and euploid sample yet.This research and design is enlisted excessive risk research pregnant woman group and is guaranteed statistically evident dysploidy prevalence rate, and table 25 and 26 has been pointed out the complicacy of the caryogram analyzed.The result proves: i) can detect fetus dysploidy (comprise and being caused by translocation trisomics, mosaic and complicated variation) under highly sensitive and specificity; And ii) dysploidy in karyomit(e) does not affect the ability that the inventive method is used for correctly identifying other chromosomal euploid states.As if the algorithm that had utilized in the previous research can not determine to be present in inevitably other dysploidy in the general clinical colony (people such as Ai Lixi (Erich), U.S.'s journal of obstetrics and gynecology (Am J Obstet Gynecol) in March, 2011 effectively; 204 (3): 205e1-11; The people such as Zhao, British Medical Journal (BMJ) 2011; 342:c7401).
About mosaic, in this research analysis of order-checking information can be correctly classified to the sample that has a mosaic caryogram for karyomit(e) 21 and 18 in 4/4 the affected sample.These result's proofs are for detection of the sensitivity of the analysis of the special characteristic of Cell-free DNA in the complex mixture.In a case, for the sequencing data of karyomit(e) 2 indication chromosomal aneuploidy complete or part, and are diploids for the amniocentesis results of karyotype of karyomit(e) 2.In two other examples, sample has 47, XXX caryogram and another sample has 46, the XX caryogram, and method of the present invention is monosomy X with these sample classifications.Might these be mosaic cases, perhaps pregnant woman self be mosaic.(should remember that importantly order-checking is carried out total DNA, this total DNA is the combination of parent and foetal DNA.Although) by the invasive program amnion cell or fine hair are carried out CYTOGENETIC ANALYSIS OF ONE current to be the reference standard of dysploidy classification, can not to get rid of low-level mosaic to the caryogram that a limited number of cell carries out.Current clinical study design does not comprise that long-term baby follows up a case by regular visits to or contact placenta tissue when childbirth, so we can not determine that these are true or false positive results.We infer, determine to compare with Standard karyotype, the specificity of order-checking technique finally can provide to the unusual sensitiveer identification of foetal DNA, particularly in the situation of mosaic with the algorithm combination that basis is optimized for detection of whole genomic the inventive method.
World antenatal diagnosis association has delivered the rapid reaction statement (this people such as (Benn), antenatal diagnosis (Prenat Diagn) 2012doi:10.1002/pd.2919) that supplies commercial usability to comment on that extensive parallel order-checking (MPS) is used for the antenatal detection of mongolism (Down syndrome).They state that before the population screening based on the extensive parallel order-checking of routine of introducing for the fetus mongolism, the evidence that need to test is as in the women by pregnancy in vitro fertilization in some subgroups.The result of report shows that present method is accurately in this pregnant woman group herein, and wherein there is higher dysploidy risk in many people.
Although these results have proved that present method of utilizing through the algorithm optimized is used for the whole genome from dysploidy risk higher women's single pregnancy is carried out the premium properties of dysploidy when detecting, but lower and when being polycyesis when prevalence rate, particularly in low risk colony, the more experiences of needs are set up the credibility to the diagnosis capability of the method.At the commitment of clinical implementation, should after first or second trimenon screening results of positive gestation, use order-checking information that karyomit(e) 21,18 and 13 is classified according to present method.To reduce thus by the unnecessary invasive program due to the false positive screening results, the minimizing of the program that simultaneous is relevant with adverse events.The invasive program may be confined to confirm the positive findings that obtained by order-checking.Yet, exist the pregnant woman to want to avoid the clinical scenarios (for example parent advanced age and infertility) of invasive program; They may require this test as the replacement scheme of preliminary examination and/or invasive program.Consulting was to guarantee that they understand the restriction of test and result's implication before all patients should accept fully test.Multi-example carries out experience accumulation along with utilizing more, and this test might will substitute current examination experimental plan and become preliminary examination, and finally becomes the non-invasive diagnostic test of fetus dysploidy.
Example 17
Determine that by NCV the fetus mark is to exist fetal chromosomal complete or part non-whole in the discrimination analysis sample Ploidy
Suppose that the karyomit(e) dosage of relevant fetal chromosomal in the maternal sample and the fetus mark of increase increase pro rata, people expection is for complete interested karyomit(e), will determine existence or not have complete fetal chromosomal aneuploidy based on the ff value of NCV value.In order to prove that the ff that is determined by NCV can be used for the chromosomal aneuploidy and the existence of the chromosomal aneuploidy of part or the contribution of mosaic sample of distinguishing complete, use the Artificial sample of setting up the mixture of fetus that simulation finds and parent cfDNA from mother and their children's genomic dna in pregnant woman's circulation.The value based on NCV of fetus mark is a kind of form of above-mentioned hypothesis fetus mark.
Mother and children's DNA is available from Julius Korir medical research association (Coriell Institute for MedicalResearch) (Camden, New Jersey).DNA identification and sample caryogram are provided in the table 27.
Table 27. example 17
Figure BDA00002366924903051
Figure BDA00002366924903061
Following the sample of chromosomal aneuploidy that comprises complete karyomit(e) or part is analyzed.
In all cases, shear from mother's genomic dna with from children's genomic dna by sonication, wherein peak value is 200bp.To comprise mother DNA add 0%, 5% or the Artificial sample of 10%w/w children DNA process to prepare sequencing library, described in example 12, use the synthesis method order-checking that it is checked order with extensive parallel mode.Each artificial DNA sample uses independently flow cell order-checking four times at the order-checking device, so that 4 sequence information collection of each sample that comprises 0%, 5% and 10% children DNA to be provided.36bp reading and human canonical sequence genome hg19 are compared, and the label of uniquely mapping is counted.In 4 flow cell swimming lanes that use for each sample each obtains about 125X 10 6Individual sequence label.
Identification normalization method karyomit(e) (single or karyomit(e) group) in the qualified samples group that comprises 20 male sex and 20 women gDNA libraries is as described in other parts of this paper.Normalization method karyomit(e) for karyomit(e) 21 is identified as karyomit(e) 4+ karyomit(e) 16+ chromosome 22; Normalization method karyomit(e) for karyomit(e) 7 is identified as karyomit(e) 4+ karyomit(e) 6+ karyomit(e) 8+ karyomit(e) 12+ karyomit(e) 19+ karyomit(e) 20; Normalization method karyomit(e) for karyomit(e) 15 is identified as karyomit(e) 9+ karyomit(e) 12+ karyomit(e) 14+ karyomit(e) 19+ karyomit(e) 20; Normalization method karyomit(e) for chromosome 22 is identified as karyomit(e) 19; And the normalization method karyomit(e) for chromosome x is identified as karyomit(e) 4+ karyomit(e) 6+ karyomit(e) 7+ karyomit(e) 8.The interested karyomit(e) that obtains by Artificial sample is checked order is counted with the sequence label of corresponding normalization method karyomit(e) (single karyomit(e) or karyomit(e) group), and be used for calculating karyomit(e) dosage and calculate NCV.
In this example, use the NCV for the karyomit(e) 21 in the sample mixture (1) to determine ff, wherein NCV 21AThat this specimen comprises trisome 21 for the karyomit(e) 21 determined NCV values in the specimen (1), and CV 21UIt is the variation coefficient of the dosage of determined karyomit(e) 21 in qualified samples (comprising diploid karyomit(e) 21); And NCV wherein XAThat this specimen comprises trisome 21 for the determined NCV value of the chromosome x in the specimen (1), and CV XUIt is the variation coefficient of the dosage of determined chromosome x in qualified samples (comprising impregnable female child karyomit(e)).
Figure 56 shows the dosage (ff that uses karyomit(e) 21 in synthetic maternal sample (1) 21) per-cent " ff " determined is along with the dosage (ff that uses chromosome x X) figure that the per-cent " ff " determined changes, this sample comprises the DNA from the children with trisomy 21.
Data presentation, karyomit(e) dosage and the NCV that stems from it increase and increase pro rata along with ff, and have 1: 1 relation between the per-cent ff of the per-cent ff of the dose determination that uses trisome (being karyomit(e) 21) and the dose determination that uses known karyomit(e) (being chromosome x) as single karyomit(e) existence.
Figure 57 shows the dosage (ff that uses karyomit(e) 7 in synthetic maternal sample (2) 7) per-cent " ff " determined is along with the dosage (ff that uses chromosome x X) figure that the per-cent " ff " determined changes, this sample comprises the DNA from an euploid mother and her children, and its these children carry excalation in karyomit(e) 7.
As shown in for sample (1) and (2), data presentation karyomit(e) dosage increases and increases pro rata along with ff with the NCV that stems from it.Yet, be in the situation of chromosomal aneuploidy partly in dysploidy, use the chromosomal karyomit(e) dosage of part aneuploid (ff 7) the per-cent ff that determines not with the dosage (ff that uses chromosome x X) the per-cent ff that determines is corresponding.Therefore, departing from shown 1: 1 of complete trisomy sample relation shows and has the part dysploidy.
Figure 58 shows the dosage (ff that uses karyomit(e) 15 in synthetic maternal sample (3) 15) per-cent " ff " determined is along with the dosage (ff that uses chromosome x X) figure that the per-cent " ff " determined changes, this sample comprises the DNA from an euploid mother and her children, and these children are 25% mosaic types with partial replication of karyomit(e) 15.
As shown in for sample (1) and (2), the ff that using dosage is determined and the NCV that stems from it increase and increase pro rata along with ff.As shown in the sample (2), sample (3) comprises the chromosomal aneuploidy of part, and uses the chromosomal karyomit(e) dosage of part aneuploid (ff 15) the per-cent ff that determines not with the dosage (ff that uses for chromosome x X) the per-cent ff that determines is corresponding.Lack correspondence between two ff and show the dysploidy of existence part rather than complete chromosomal aneuploidy.
Figure 59 shows the dosage (ff that uses chromosome 22 in the Artificial sample (4) 22) per-cent " ff " of determining and the figure that stems from its NCV, this sample comprises 0% children DNA (i); With 10% DNA from unaffected twin boys (ii), known this son does not have the chromosomal aneuploidy of the part of chromosome 22; And 10% DNA from affected twin boys (iii), known this son has the chromosomal aneuploidy of the part of chromosome 22.Data presentation, for comprising " ff " that determine from the sample of unaffected twinborn DNA and by four NCV according to the Rapid Dose Calculation of chromosome 22 close to zero, this shows the dysploidy that does not have chromosome 22 in unaffected children; And when according to the Rapid Dose Calculation of chromosome x, unaffected twinborn " ff " confirms that " ff " of unaffected twins' sample is about 10%.Data also show, for comprising from the sample of affected twinborn DNA and by the dosage (ff according to chromosome 22 22) four NCV calculating " ff " that determine are about 3%, this shows have dysploidy in chromosome 22; And as the dosage (ff according to chromosome x X) when calculating, " ff " confirms that " ff " of unaffected twins' sample is about 10%.Ff 22With ff XBetween lack correspondence and show that the dysploidy of chromosome 22 in affected twins is the chromosomal aneuploidy of part.
Therefore, data presentation, in the maternal sample of the cfDNA that comprises male fetus, karyomit(e) dosage can be used for distinguishing dysploidy and/or the dysploidy complete or part that has existing part in complete trisomy and the mosaic sample with the NCV value that stems from it.The dysploidy of part can be increase or the minimizing of a karyomit(e) part.Randomly, can described in example 12, obtain the dysploidy of part and/or the fractionation of mosaic by the fetus mark with karyomit(e) dosage and estimation.
Above-mentioned fetus fraction method can also be used for determining that the one or more fetuses of polycyesis have the possibility of dysploidy.For example, in a fraternal twin's case, find according to NCV XThe fetus mark that value is determined is 8.3%, and by NCV 21The mark that value records is 5.0%.Show thus in this a pair of male fetus and only have to have a T21 dysploidy, and confirmed this result by results of karyotype.Have in the twinborn example of parent at another, the fetus mark of determining according to X chromosome is 7.3%, and the fetus mark of being determined by karyomit(e) 18 is 8.9%.In this example, determine that according to caryogram two twins are the T18 male sex.
Example 18
Determine that by NCV the fetus mark is to identify existing of fetal chromosomal aneuploidy complete in the clinical sample
The ff (CNff) that determines for identity basis NCV can be used for distinguishing the existence of chromosomal aneuploidy complete in the clinical sample and the chromosomal aneuploidy of part, uses the cfDNA available from pregnant woman blood that the interested karyomit(e) 21 in the clinical sample, 13 and 18 is quantized.Existence by caryogram checking trisomy.
From following sample, obtain cfDNA: respectively nourish 46 maternal samples of pregnant woman with male fetus of trisomy 21 (T21); 13 maternal samples of respectively nourishing the pregnant woman of a fetus with trisomy 18 (T18); And 3 maternal samples of nourishing the pregnant woman of a male fetus with trisomy 13 (T13).These clinical samples are the sample from the clinical study described in the example 16.Separate cfDNA, and described in example 16, but prepare sequencing library with new Yi Lumina v3 chemical substance.
Also use new Yi Lumina v3 chemical substance to checking order by deriving from the known sequencing library that makes for karyomit(e) 21, the cfDNA of 18 and 13 unaffected qualified samples.To be mapped to for the sequence reading that qualified samples obtains human canonical sequence genome hg19, and count shining upon uniquely corresponding to the sequence reading of all chromosome sequences (not shielding tumor-necrosis factor glycoproteins) of human canonical sequence genome hg19, and be used for systematically determining to serve as each interested karyomit(e) 21,18 and 13 normalization method karyomit(e) at which karyomit(e) of specimen or any group chromosome.
Following table 28 shows be used for determining of identifying, and each specimen is for the normalization method karyomit(e) (denominator karyomit(e)) of the karyomit(e) dosage (ratio) of karyomit(e) 1-22, X and Y.
The confession that table 28. example 18-systematically identifies is used for the normalization method of T21, T18 and T13 specimen and dyes Colour solid
Figure BDA00002366924903101
Figure BDA00002366924903111
When having identified the normalization method karyomit(e) in the qualified samples, specimen is checked order, and to be mapped to each karyomit(e) 21,18 in the specimen, 13 and the corresponding chromosomal sequence label of normalization method count, and be used for calculating karyomit(e) dosage (ratio).Then, the following equation of basis as discussed previously calculates the NCV value:
NCV iA = R iA - R lU &OverBar; &sigma; iU Equation 21.
For each specimen, determine for karyomit(e) x and interested chromosomal fetus mark according to the described following equation of other parts in this specification sheets:
Ff=2 * | NCV IACV IU| equation 28.
Figure 60 shows the figure of determined CNffx contrast CNff21 in the sample that comprises fetus T trisomy 21.As expecting that for complete chromosomal aneuploidy CNffx and the NCV determined (CNff21) that uses karyomit(e) 21 are complementary.
Similarly, in the T18 specimen, CNffx and the NCV determined (CNff18) that uses karyomit(e) 18 are complementary (Figure 61), and in the T13 specimen, CNffx and the NCV determined (CNff13) that uses karyomit(e) 13 be complementary (Figure 62).
Figure 60 also shows the fetus mark that the sample that affected by T21 for female child obtains.As desired, the CNff21 in these " women " samples can't be by comparing to verify with chromosome x.In order to verify the CNff21 of women's sample, can determine the known CNff that can not become the karyomit(e) (for example karyomit(e) 1) of fetus aneuploid.Scheme as an alternative, the CNff21 of " women " sample can be by comparing itself and NCNff to determine, for example determines by as described in other parts of this paper the label of polymorphic sequence being counted.
Therefore, the gained NCV value of the complete chromosomal copy number variation of sequence label number and identification can be used for the corresponding fetus mark in definite aneuploid/affected sample.The correspondence of interested chromosomal CNff and the euploid chromosomal CNff of known not right and wrong can be used for confirming the existence of complete karyomit(e) trisomy.
Example 19
Determine that by NCV the fetus mark is to exist the fetal chromosomal aneuploidy of part in the identification clinical sample
The ff (CNff) that determines for identity basis NCV can be used for identifying and locating the existence of chromosomal aneuploidy with the chromosomal aneuploidy of part of part in the clinical sample, and the cfDNA that is identified as the clinical sample with karyomit(e) 17 dysploidy described in example 18 to controlling oneself checks order and analyzes.
Use is mapped to the sequence label (above table 28) of the normalization method karyomit(e) (karyomit(e) 16+ karyomit(e) 20+ chromosome 22) of identifying in karyomit(e) 17 in the specimen and the qualified samples group, calculates in the specimen for each chromosomal NCV value.
Figure 63 shows the figure for the NCV value of karyomit(e) 1-22 and X in the specimen.As shown in the figure, for the NCV value of karyomit(e) 17 be confirmed as having NCV>4, it is for selecting for the chromosomal threshold value of identification aneuploid.Also shown is the NCV value for chromosome x, as expected, chromosome x has negative NCV.
Calculate the CNff of karyomit(e) 17 and chromosome x according to following equation:
Ff (i)=2*NCV JACV JUEquation 25,
And definite CNff17=3.9% and CNffX=13.5%.
Difference between the CNff shows that having the dysploidy of part maybe may be mosaic.
In order to distinguish dysploidy partly and possible mosaic, come number of tags is counted for the continuous matrix/subregion of each 100Kbp on the karyomit(e) 17, and calculate normalized binary value (NBV) for each subregion.Separately the normalization method of number of tags is by determining that label/data box and the ratio that has formed objects and have a number of tags summation in 20 data boxes with the immediate GC content of institute's analytical data case carry out in the subregion.Therefore, in this case, normalization method is relevant with GC content.Randomly, data box normalization method also may be relevant with the variability of data box dosage, as determining in for the described qualified samples of karyomit(e) dosage/ratio.In this example, GCC Z score equals such as following determined NBV value:
NBV ij = x ij - Mj MAD Equation 29,
M wherein jAnd MAD jBe for the estimation median of j karyomit(e) dosage in the qualified samples group and the deviation of adjusting through median accordingly, and x IjJ the karyomit(e) dosage of observing for specimen i.
For getting on the Y-axis that form-separating is illustrated in Figure 64 as the normalized GCC Z of indication GC along the normalized binary value (NBV) of each 100Kbp subregion of the length of karyomit(e) 17.Figure shown in Figure 64 obviously illustrates corresponding to approximate last 200 in the karyomit(e) 17, the gene copy number increase of the subregion of 000bp.This discovery conforms to the caryogram that the sample that copies at q ter place for explanation karyomit(e) 17 provides.
Therefore, CNff can be used for identifying with the positioning dyeing body in the dysploidy of part.
Example 20
Verification sample integrity in the multi-biological check of parent cfDNA
It is synthetic and in order to verifying the integrity of whole blood and blood plasma maternal source sample to have the known marker molecules that is not included in the sequence in any known genome, and these samples check order through processing with the mixture that extracts fetus in the maternal sample and parent cfDNA and to it.
The experimental data of current and previous has shown that the mean length of cfDNA is about 170bp.Use blast search, for all genome logins, identify the antigene strand sequence of non-existent 170bp in any known genome.Six marker molecules (MM1-MM6) are based on sequence (the SEQ ID NO:1-6 of the antigene strand sequence of identifying; Table 29) synthetic, and following integrity in order to verification sample.
Table 29
Marker molecules
Figure BDA00002366924903141
In pregnant woman's body, collect peripheral blood to 4 blood collection tube (the Cell-Free DNA of Omaha, Nebraska State city Shi Te Rieke Corp. (Streck, Inc.Omaha NE) TMBCT) in and be transported to the laboratory all through the night and analyze.Two following marker molecules that add of whole blood source sample.A blood source sample adds 720pg marker molecules 1 (MM1), and the second blood source sample adds 720pg marker molecules 2.All 4 pipes are all under 4 ℃ under 1600g centrifugal 10 minutes.Shift out the blood plasma supernatant liquor in from four pipes each, and put it in the 5mL high speed centrifugation pipe and under 4 ℃ under 16000g centrifugal 10 minutes.The blood plasma that has added the whole blood of marker molecules partly is distributed in the pipe separately and storage under-80 ℃.Partly then be divided into the 1.1mL aliquots containig from the blood plasma with two residual blood pipes (adding).Blood plasma source sample is prepared as follows.100 pik MM1 are added in the blood plasma aliquots containig, 100pg MM2 adds in the blood plasma aliquots containig 2, etc., to obtain 6 through the blood plasma source sample of mark, each blood plasma source sample is included in-80 ℃ of different marker molecules (MM1-MM6) of lower storage.
Each pipe and each 1 pipe through the source blood sample of mark through the blood plasma source sample of mark is thawed, and according to the method described in the example 1, use the small-sized test kit of triumphant outstanding blood (Qiagen Blood Mini Kit) to extract DNA.Use comprises the TruSeq of index 1-6 TMDNA sample preparation reagents box be (the San Diego, CA city
Figure BDA00002366924903151
), prepare the library with every kind of sample DNA of 30 microlitres.Sequencing library is prepared, thereby uses indexed molecule 1 to index so that comprise the sample of MM1, comprise that the sample of MM2 is indexed etc. index of reference 2.Sequencing library uses Agilent bioanalysis device DNA1000 test kit (Agilent technology company, Santa Clara, California) to quantize and is diluted to 4nM with triumphant outstanding damping fluid EB.With indexing and compiling and further be diluted to 2nM through the sample of mark, then use Yi Lumina TruSeq SBS test kit v3, according to table 30, in four swimming lanes of Yi Lumina HiSeq flow cell, check order.
Table 30
The layout of multiple order-checking flow cell
Figure BDA00002366924903152
The sequence reading is compared and compared with the synthetic reference genome that comprises antigene strand marker molecules sequence with reference to genome hg19 with human.Unique (only once namely) is mapped to hg19 counts (table 31) with reference to genome or the synthetic genomic sequence reading of reference with marker molecules sequence.
Table 31
The MM sequence is corresponding with source sample cfDNA sequence
Figure BDA00002366924903171
*The I=index
*The L=swimming lane
Data show, for each sample, the sequence of determining to have added the MM in the sample of source is only corresponding with the sequence of the cfDNA of the source sample that adds MM.For instance, the data of sample 1 show, the sequence of reading of determining to be mapped to MM1 is only corresponding with the sequence of the cfDNA that has obtained from the source sample (plasma sample 1) that adds MM1.In addition, from the reading that the order-checking cfDNA of source sample 1 obtains, do not exist different sequence (for example MM2) to show that source sample 1 is not by another sample (sample 2 of for example originating) crossed contamination.
Example 21
Internal positive control
Develop a kind of process positive control for parent cfDNA being carried out extensive parallel order-checking, for trisomy 13, trisomy 18 and trisomy 21 provide qualitative positive staining body dosage and NCV value.
To be applied to from three male patients' of the known trisomy that has accordingly Chr13, Chr18 and Chr21 the genomic dna that becomes fragment in the DNA background that the women becomes fragment.By PAGE the genomic dna that becomes fragment is carried out size Selection, comprising the fragment of length in from about 150bp to about 250bp scope, thereby simulate the size of fetus cfDNA.DNA to process the size Selection of T13, T18 and T21 contrast carries out purifying and carries out end reparation, and use Nanodrop (Wilmington, Delaware State city (Wilmington, DE)) measurement concentration.Prepared DNA confirms on bioanalysis device highly sensitive DNA chip (Agilent, Santa Clara, California).These DNA of trisomy 13, trisomy 18 and trisomy 21 obtain from riel Institute for Medical Research of section (Coriell Institute forMedical Research) (Camden, New Jersey city (Camden, NJ)).Women's genomic dna obtains from rich sincere company (The Biochain Institute) (California Hayward city (Hayward, CA)).A small amount of trisome DNA is applied in the main women DNA background, with " male fetus " DNA mark of simulation in women's " parent " DNA background.Composition to this DNA mixture carries out optimizing, so that check when determining the copy number variation when being used for checking order, mixture is always positive to trisomy 13, trisomy 18 and trisomy 21 reports qualitatively, and wherein 13,18 and 21 NCV value is greater than 4.
Parent cfDNA extracts from plasma sample, and these plasma samples obtain from the pregnant woman; And the sequencing library of the contrast DNA of preparation maternal sample cfDNA and T13, T18 and T21 is used for multiple order-checking, and Yong Yi Lumina platform carries out this multiple order-checking.In each flow cell of sequenator, four positive controls and 56 samples are checked order.As described at the application's elsewhere, obtain the 36bp reading, identify a plurality of chromosomal labels, and calculate the NCV value.
Figure 69 A, B and C show the NCV value of parent specimen (◇) and internal positive control ().The NCV value surpasses 4 and is confirmed as having accordingly the copy number variation for interested karyomit(e) 13 (A), 18 (B) and 21 (C).It is related that the NCV that this figure has showed positive control and the NCV of parent specimen carry out, and identifies it and have the copy number variation, be i.e. karyomit(e) 13,18 and 21 additional copy.
Internal positive control can be designed to simulate the variation of complete chromosomal variation and chromosome dyad, these internal positive controls can be used for the antenatal diagnosis check with such as determining the check that fetus mark etc. is correlated with as spreading all over as described in this specification sheets by extensive parallel order-checking.
Example 22
Use extensive parallel order-checking to determine the fetus mark: sample preparation and cfDNA extract
Collect peripheral blood sample from being in gravidic first trimenon or second trimenon and being considered to exist in pregnant woman's body of fetus dysploidy risk.Before blood drawing, obtain letter of consent from each participant.Before amniocentesis or chorionic villus sampling, collect blood.Use chorionic villus or amniocentesis sample to carry out karyotyping to determine the fetus caryogram.
To be collected in the ACD pipe from the peripheral blood that each experimenter extracts.One pipe blood sample (about 6 to 9 milliliters/pipe) is transferred in 15 milliliters of low-speed centrifugal pipes.Use Beckman Allegra 6R whizzer and GA 3.8 type rotors, under 2640rpm, 4 ℃ with centrifugal blood 10 minutes.
Extract for cell-free plasma, the top plasma layer is transferred in 15 milliliters of high speed centrifugation pipes, and use Beckman Ku Erte Avanti J-E whizzer and JA-14 rotor, lower centrifugal 10 minutes of 16000 * g, 4 ℃.Behind blood collecting, in 72 hours, carry out two centrifugation step.The cell-free plasma that will comprise cfDNA is stored under-80 ℃, and only thaws once before blood plasma cfDNA amplification or cfDNA purifying.
Use the small-sized test kit of QIAamp blood DNA (Kai Jie), basically from cell-free plasma, extract purified Cell-free DNA (cfDNA) according to manufacturer specification.One milliliter of buffer A L and 100 μ l protein enzyme solutions are added in the 1ml blood plasma.Under 56 ℃, this mixture was hatched 15 minutes.One milliliter of 100% ethanol is added in the blood plasma Digestive system.With the gained mixture transfer to QIAvac 24Plus column combination spare (Kai Jie) in the QIAamp micro-column of the VacValve that provides and VacConnector combination.Apply vacuum to sample, and under vacuum, with 750 μ l buffer A W1 the cfDNA that is trapped on the post strainer is washed, then carry out the washing second time with 750 μ l buffer A W24.Under 14,000RPM with centrifugal 5 minutes of this post in order to from strainer, remove any remaining damping fluid.By centrifugal with buffer A E elution cfDNA under 14,000RPM, and use QubitTM to quantize platform (Invitrogen (hero)) and determine concentration.
Example 23
Use extensive parallel order-checking to determine the fetus mark: preparation sequencing library, order-checking and analysis sequencing data
A. prepare sequencing library
All sequencing libraries, namely target, elementary and through the library of enrichment is all prepared by the purified cfDNA of the about 2ng that extracts from Maternal plasma.Use
Figure BDA00002366924903201
NEBNext TMDNA sample preparation DNA reagent collection 1 (Item Number E6000L; Knob Great Britain biology laboratory, Ipswich, Massachusetts) reagent followingly carry out the library preparation.Because cell-free plasma DNA becomes fragment in essence, therefore no longer make this plasma dna sample become fragment by spray method or sonication.According to
Figure BDA00002366924903202
The terminal module of repairing is by with cfDNA and NEBNext TMThe 5 μ l 10X phosphorylation damping fluids that provide in the DNA sample preparation DNA reagent collection 1,2 μ l deoxynucleotide solution mixtures (each dNTP of 10mM), 1: 5 DNA polymerase of 1 μ l I diluent, 1 μ l T4DNA polymerase and 1 μ l T4 polynucleotide kinase were hatched under 20 ℃ 15 minutes in the 1.5ml Eppendorf tube together, and the overhang that is included in the purified cfDNA fragment of about 2ng among the 40 μ l is changed into the phosphorylation blunt end.Then this enzyme carried out hot deactivation in 5 minutes by at 75 ℃ this reaction mixture being hatched.This mixture is cooled to 4 ℃, and uses 10 μ l to comprise the dA tailing master mixed solution (NEBNext of Klenow fragment (3 ' to 5 ' exo-) TMDNA sample preparation DNA reagent collection 1) and under 37 ℃, hatch the dA tailing of realizing blunt end DNA in 15 minutes.Subsequently, by under 75 ℃, this reaction mixture being hatched 5 minutes Klenow fragment carried out hot deactivation.After the Klenow fragment deactivation, use NEBNext TMThe 4 μ l T4DNA ligase enzymes that provide in the DNA sample preparation DNA reagent collection 1 are by hatching reaction mixture 15 minutes under 25 ℃, with 1 μ l Yi Lumina genome aptamer oligomerization mixture (Item Number 1000521; Hayward city, California Yi Lumina company) 1: 5 diluent is connected to the DNA with the dA tail with Yi Lumina aptamer (non-index Y aptamer).This mixture is cooled to 4 ℃, and uses An Jinkete AMPure XP PCR purification system (Item Number A63881; Beckman Ku Erte genome, Dan Fusi, Massachusetts) in the magnetic bead that provides, be purified into the cfDNA that aptamer connects in aptamer, aptamer dimer and other reagent that never connects.Use
Figure BDA00002366924903203
High-fidelity master's mixed solution (fragrant appearance beautiful, Wo Ben, Massachusetts) and the Yi Lumina PCR primer that be connected aptamer (Item Number 1000537 with are connected) carry out that 18 PCR circulate so that the cfDNA of enrichment aptamer connection optionally.The Phusion HF PCR master mixed solution that provides in Yi Lumina Genomic PCR primer (Item Number 100537 and 1000538) and the NEBNextTM DNA sample preparation DNA reagent collection 1 is provided, the DNA that aptamer is connected according to manufacturer specification carry out PCR (98 ℃, 30 seconds; 98 ℃, 10 seconds, 18 circulations; 65 ℃, 30 seconds; And 72 ℃, 30 seconds; 72 ℃ of lower final extensions 5 minutes, and remain on 4 ℃).Use An Jinkete AMPure XP PCR purification system (An Jinkete biotechnology company, Billy's Buddhist, the Massachusetts), come purifying through the product of amplification according to the manufacturer specification that can obtain at www.beckmangenomics.com/products/AMPureXPProtocol_000387 v001.pdf place.The purified amplified production of elution in the triumphant outstanding EB damping fluid of 40 μ l, and use for 2100 bioanalysis device (Agilent technology companys, Santa Clara, California) Agilent DNA 1000 test kits are analyzed concentration and the size distribution through the library of amplification.
B. order-checking
Use genome analysis instrument II (Yi Lumina company, San Diego, California, USA), according to standard manufacture merchant stipulations, library DNA is checked order.The copy that uses Yi Lumina/Suo Lekesa technology to carry out the stipulations of genome sequencing can instruct 2007 the 29th pages to find in disclosed BioTechniques.RTM. stipulations of in December, 2006, and at World Wide Web biotechniques.com/default.asp? page=protocol﹠amp; Subsection=article_display﹠amp; Find on the id=112378.
The DNA library is diluted to 1nM and sex change.According to can in the program described in Yi Lumina cluster station users' guidebook that World Wide Web illumina.com/systems/genome analyzer/cluster_station.ilmn obtains (Illumina ' s Cluster Station User Guide) and the cluster station operational guidance (ClusterStation Operations Guide), making library DNA (5pM) carry out the cluster amplification.Use Yi Lumina genome analysis instrument II that the DNA through amplification is checked order, in order to obtain the single-ended reading of 36bp.Identify a sequence and belong to a specific human chromosome, only need the stochastic sequence information of about 30bp.Longer sequence can identify more specifically target uniquely.Under current status, obtained numerous 36bp readings, covered genomic about 10%.
C. analyze sequencing data to determine the fetus mark
In case finished the order-checking of sample, Yi Lumina " sequence control software " transfers to image and base judgement file in the Unix server of operation Yi Lumina " genome analysis instrument streamline (Genome Analyzer Pipeline) " software version 1.51.Use the BOWTIE program, 36bp reading and artificial reference genome (for example SNP genome) are compared.This artificial reference genome is identified as the grouping of having contained the allelic polymorphic dna sequence dna that comprises in the polymorphic target sequence.For instance, the artificial reference genome is the SNP genome that comprises SEQ ID NO:7-62.Only unique reading that is mapped to this artificial gene group is used for analyzing the fetus mark.Mating the genomic reading of SNP fully can be regarded as label and it is filtered.In residual readings, the reading that only has one or two mispairing be can be regarded as label and is included in the analysis.Each the label that is mapped in the polymorphic allelotrope is counted, and the fetus mark is defined as being mapped to the ratio of number and the number of the label that is mapped to time equipotential gene (being fetus allelotrope) of the label of main allelotrope (being parent allelotrope).
Example 24
Select euchromosome SNP to determine the fetus mark
One group of 28 euchromosome SNP are inventories of being selected from 92 SNP (people such as Parkes, human genetics 127:315-324[2010]) and are selected from the LifeTechnologies that Web address is appliedbiosystems.com TMThe applying biological system in (Carlsbad, California city).Primer is designed to be included in by carrying out at Yi Lumina analyser GII in the 36bp reading that extensive parallel order-checking was produced to guarantee this SNP site near the sequence hybridization in the SNP site on the cfDNA with one, and produces length is enough to carry out the bridge-type amplification during cluster forms amplicon.Therefore, primer is designed to produce at least amplicon of 110bp, and these amplicons produce at least dna molecular of 200bp when making up with General adaptive (San Diego, CA city Yi Lumina company) that is used for the cluster amplification.Identify primer sequence, and by integrated dna technique (San Diego, the California) synthetic primer is gathered (being forward and reverse primer) and with the storage of 1 μ M solution form, is ready to use in described in example 25 to 27, and polymorphic target sequence increases.Table 33 provides RefSNP (rs) to deposit the identity numbering, has been used for the primer of amplification target cfDNA sequence and the sequence that comprises the possible allelic amplicon of SNP that will use these primers to produce.The SNP that provides in table 33 is for 13 target sequences that increase simultaneously at a multiple check.The group that provides in table 33 is an exemplary SNP group.Can adopt still less or more SNP comes for polymorphic target nucleic acid enrichment fetus and mother body D NA.Operable extra SNP is included in the SNP that provides in the table 34.SNP allelotrope is showed with runic and is underlined.Can be used for the method according to this invention and determine that other extra SNP of fetus mark comprise rs315791, rs3780962, rs1410059, rs279844, rs38882, rs9951171, rs214955, rs6444724, rs2503107, rs1019029, rs1413212, rs1031825, rs891700, rs1005533, rs2831700, rs354439, rs1979255, rs1454361, rs8037429 and rs1490413, these SNP have been analyzed by TaqMan PCR for definite fetus mark, and be disclosed in U.S. Provisional Application table 61/296, in 358 and 61/360,837.
Table 33
Be used for determining the SNP group of fetus mark
Figure BDA00002366924903251
Figure BDA00002366924903261
Figure BDA00002366924903271
Table 34
Be used for determining the extra SNP of fetus mark
Figure BDA00002366924903272
Figure BDA00002366924903281
Figure BDA00002366924903291
Figure BDA00002366924903301
Figure BDA00002366924903311
Example 25
Determine the fetus mark by the target library being carried out extensive parallel order-checking
In order to determine the cfDNA mark of fetus in the maternal sample, each polymorphic nucleotide sequence of target that comprises SNP is increased and for the preparation of the target library of checking order with extensive parallel model.
Extract as mentioned above cfDNA.The target sequencing library is prepared as follows.The cfDNA that comprises among the purified cfDNA of 5 μ l is increased in the 50 μ l reaction volumes that comprise 7.5 μ l, 1 μ M primer mixture (table 1), 10 μ l NEB 5X master's mixed solutions and 27 μ l water.Use following cycling condition, carry out thermal cycling with GeneAmp9700 (applying biological system): under 95 ℃, hatched 1 minute, then lower 20 seconds at 95 ℃, lower 1 minute at 68 ℃, and 68 ℃ lower 30 seconds, circulate 20 to 30 times, then under 68 ℃, finally hatched 5 minutes.Finally remain under 4 ℃, until shift out sample for partly making up with the not amplification of purified cfDNA sample.Use An Jinkete AMPure XP PCR purification system (Item Number A63881; Beckman Ku Erte genome, Dan Fusi, Massachusetts) product through amplification is carried out purifying.Finally remain under 4 ℃, until for shifting out in preparation target library.Analyze (Agilent technology company, California Sen Niweier city (Sunnyvale, CA)) through the concentration of the product of the product of amplification and definite process amplification with 2100 bioanalysis devices.Prepare described in the sequencing library of the target nucleic acid of process amplification such as the example 23, and use that (in December, 2006, was disclosed BioTechniques.RTM. stipulations guide 2007 the 29th page, and at World Wide Web biotechniques.com/default.asp by the synthesis method order-checking of reversible dyestuff terminator and according to the Yi Lumina stipulations? page=protocol﹠amp; Subsection=article_display﹠amp; Id=112378) check order with extensive parallel model.As described, analyze and count being mapped to the genomic label of reference that is formed by 26 sequences that comprise SNP (13 pairs, each is to representing two allelotrope) (being SEQ ID NO:7-32).
Table 35 provides the label counting that checks order and obtain to the target library, and the fetus mark that calculates that obtains from sequencing data.
Table 35
Determine the fetus mark by polymorphic nucleic acid library being carried out extensive parallel order-checking
Figure BDA00002366924903341
The result shows, each polymorphic nucleotide sequence that comprises at least one SNP can increase from the cfDNA that derives from the Maternal plasma sample, to construct a library, this library can check order to determine the mark of fetal nucleic acid in the maternal sample by extensive parallel model.
Example 26
In cfDNA sequencing library sample, determine the fetus mark behind fetus and the parent nucleic acid enriching.
The fetus and the parent cfDNA that comprise in the elementary sequencing library that uses purified fetus and parent cfDNA to construct for enrichment, with the part of the purified cfDNA sample polymorphic target nucleic acid sequence that increases, and the sequencing library of the polymorphic target nucleic acid that preparation is increased, fetus and the parent nucleotide sequence of this sequencing library in order to comprise in this elementary library of enrichment.
The method is corresponding to illustrated workflow among Figure 10.Described in example 23, prepare the target sequencing library from the part of purified cfDNA.Described in example 23, use the remainder of purified cfDNA to prepare elementary sequencing library.By elementary and target sequencing library are diluted to 10nM, and make up to provide the sequencing library of enrichment with target library and elementary library with 1: 9 ratio, realize for comprise in the target library through the enrichment to elementary library of the polymorphic nucleic acid of amplification.Described in example 23, to checking order and sequencing data is analyzed in the library of enrichment.
Table 36 provides the number of the genomic sequence label of SNP that is mapped to informedness SNP, and these informednesses SNP checks order by the enriched library to the plasma sample that derives from each pregnant woman who nourishes accordingly T21, T13, T18 and monosomy X fetus and identifies.The following calculating of fetus mark: allelotrope xFetus mark %=((∑ allelotrope xThe fetus sequence label)/(the parent sequence label of ∑ allelotrope x)) * 100
Table 36 also provides and has been mapped to human number with reference to genomic sequence label.Use and the plasma sample that is used for determining that corresponding fetus mark is identical, determine existence or do not have dysploidy with being mapped to the genomic label of human reference.Count to determine that with sequence label the method for dysploidy is described in U.S. Provisional Application 61/407,017 and 61/455,849778, these applications are incorporated into this in full with it by reference.
Table 36 carries out extensive parallel order-checking by the enriched library to polymorphic nucleic acid and determines the fetus mark
Figure BDA00002366924903361
Figure BDA00002366924903371
Example 27
Determine the fetus mark by extensive parallel order-checking:
In purified cfDNA sample for the fetus of polymorphic nucleic acid and the enrichment of parent nucleic acid.
The fetus and the parent cfDNA that comprise the purification of samples of the cfDNA that goes out from the Maternal plasma sample extraction for enrichment, with the part of the purified cfDNA polymorphic target nucleic acid sequence that increases, each polymorphic target nucleic acid sequence comprises a SNP who is selected from the SNP group that provides in table 33.
The method is corresponding to illustrated workflow among Fig. 9.Described in example 22, obtain cell-free plasma from the maternal blood sample, and from plasma sample purifying cfDNA.Determining ultimate density is 92.8pg/ μ l..The cfDNA that comprises among the purified cfDNA of 5 μ l is increased in the 50 μ l reaction volumes that comprise 7.5 μ l, 1 μ M primer mixture (table 1), 10 μ l NEB 5X master's mixed solutions and 27 μ l water.Carry out thermal cycling with Gene Amp9700 (applying biological system).Use following cycling condition: under 95 ℃, hatched 1 minute, then 95 ℃ lower 20 seconds, 68 ℃ lower 1 minute, and 68 ℃ lower 30 seconds, circulate 30 times, then under 68 ℃, finally hatched 5 minutes.Finally remain under 4 ℃, until shift out sample for partly making up with the not amplification of purified cfDNA sample.Use An Jinkete AMPure XP PCR purification system (Item Number A63881; Beckman Ku Erte genome, Dan Fusi, Massachusetts) product through amplification is carried out purifying, and use Nanodrop 2000 (the silent science and technology of match (Thermo Scientific), Wilmington, the Delaware State) to quantize concentration.Purified amplified production dilution in 1: 10 and 0.9 μ l (371pg) in water added in the purified cfDNA sample of 40 μ l add to obtain 10%.The fetus of existing enrichment and parent cfDNA be for the preparation of sequencing library in the purified cfDNA sample, and check order described in example 22.
Table 37 provides for each label counting that obtains among karyomit(e) 21,18,13, X and the Y, i.e. sequence label density, and the label counting that obtains with reference to the polymorphic sequence of the informedness that comprises in the genome for SNP, i.e. SNP label density.Data show that order-checking information can be by obtaining being checked order by single library of purified parent cfDNA sample arrangement, and the enrichment of this parent cfDNA sample comprises the sequence of SNP, to determine simultaneously to exist or do not exist dysploidy and fetus mark.Described in U.S. Provisional Application 61/407,017 and 61/455,849, use the number that is mapped to chromosomal label to determine to exist or do not exist dysploidy.In given example, data show that the mark of foetal DNA among the plasma sample AFR105 can and be defined as 3.84% from five informedness SNP sequencing results quantifications.For karyomit(e) 21,13,18, X and Y, provide sequence label density.
This example shows that the enrichment stipulations are for determining that by single order-checking process dysploidy and fetus mark provide essential label counting.
Table 37
Determine the fetus mark by extensive parallel order-checking:
In purified cfDNA sample for polymorphic nucleic acid enriching fetus and parent nucleic acid
Figure BDA00002366924903401
Example 28
The capillary electrophoresis of the polymorphic sequence by comprising STR is determined the fetus mark
For the fetus mark in the maternal sample of determining to comprise fetus and parent cfDNA, from the volunteer pregnant woman who nourishes the sex fetus, collect peripheral blood sample.Described in example 22, acquisition and processing peripheral blood sample are to provide purified cfDNA.
Use
Figure BDA00002366924903402
MiniFiler TMPcr amplification test kit (applying biological system, Foster city, California) according to manufacturer specification, is analyzed ten microlitre cfDNA samples.Briefly, be included among the 10 μ l cfDNA comprise the fluorescently-labeled primer of 5 μ l (
Figure BDA00002366924903403
MiniFiler TMPrimer set) and
Figure BDA00002366924903404
MiniFiler TMIncrease in the 25 μ l reaction volumes of main mixed solution, should
Figure BDA00002366924903405
MiniFiler TMMain mixed solution comprises AmpliTaq Polysaccharase and relevant damping fluid, salt (1.5 mM MgCl 2) and 200 μ M deoxidation nucleoside triphosphates (dNTP:dATP, dCTP, dGTP and dTTP).Fluorescently-labeled primer is to use 6FAM TM, VIC TM, NED TM, and PET TMDyestuff carries out the forward primer of mark.Use following cycling condition, carry out thermal cycling with Gene Amp9700 (applying biological system): under 95 ℃, hatched 10 minutes, then lower 20 seconds at 94 ℃, lower 2 minutes at 59 ℃, and 72 ℃ lower 1 minute, circulate 30 times, then under 60 ℃, finally hatched 45 minutes.Finally remain under 4 ℃, until shift out sample for analyzing.By in 8.7 μ lHi-DiTM methane amides (applying biological system) and 0.3 μ l GeneScanTM-500LIZ interior dimensions standard (applying biological system) dilution 1 μ l through the amplification product prepare through the amplification product, and usage data is collected HID_G5_POP4 (applying biological system) and 36cm capillary array, and (applying biological system) analyzes with ABIPRISM3130xl genetic analysis instrument.All gene types are all used GeneMapper_ID v3.2 software (applying biological system), and the allelic ladder (allelic ladders) that use manufacturers provides and data box and group carry out.
All gene types measurements all on the 3130xl of applying biological system genetic analysis instrument, use the size ± 0.5-nt " window " that obtains for each allelotrope to carry out, to allow to detect and proofread and correct allelic comparison.Any sample allelotrope of size outside ± 0.5-nt window is defined as OL, i.e. " (OffLadder) that the somatotype standard substance is outer ".OL allelotrope is that size exists MiniFiler TMThe allelotrope of performance not in the allelic ladder, or not corresponding with allelic ladder, but because measuring error and so that size just in time at outside window allelotrope.Minimum peak height threshold>50RFU is based on confirmatory experiment setting, carries out these confirmatory experiments to avoid carrying out somatotype when stochastic effect may be disturbed the accurate reading of mixture.The calculating of fetus mark is based on averages all information marker.The informedness marker is identified by exist the peak value that falls into in the parameter of the initialize data case of the STR that analyzes at electrophorogram.
Use is calculated the fetus mark according to main allelotrope on determined each the str locus seat of in triplicate injection and time allelic average peak height.The rule that is applicable to this calculating is:
1. for outer allelotrope (OL) data of allelic somatotype standard substance that are not included in the calculating; And
2. the peak height that is only obtained by>50RFU (relative fluorescence unit) is included in the calculating.
3. if only have data box to exist, then to be considered to right and wrong informational for marker; And
4. if judged second data box, but the peak value of the first and second data boxes on the peak height its relative fluorescence unit (RFU) 50% to 70% in, mark and this marker of then not measuring minority are not considered to informational.
For the inferior allelic mark of any informedness marker that provides by with the peak height of accessory constituent divided by the peak height of main ingredient with calculate, and be expressed as per-cent, at first be calculated as for each information gene seat
Fetus mark=(the ∑ time allelic peak height/allelic peak height of ∑ master) X 100,
The fetus mark that comprises the sample of two or more informednesses STR will be calculated as the mean value of the fetus mark that calculates for two or more informedness markers.
Table 38 provides from the cfDNA to the experimenter that nourishes male fetus and analyzes the data that obtain.
Table 38
The fetus mark of in conceived experimenter's cfDNA, determining by analyzing STR
Figure BDA00002366924903421
The result shows, cfDNA can be used for determining to exist or do not have foetal DNA, and is indicated such as the detection of accessory constituent on one or more STR allelotrope, is used for determining fetus mark per-cent, and be used for determining sex of foetus, as existing or not existing Amelogenin allelotrope indicated.

Claims (96)

1. medical analysis equipment is used for determining comprising the fetus mark of parent specimen of the mixture of fetus and parent nucleic acid, and described equipment comprises:
(a) device is used for receiving described fetus and a plurality of sequence readings of parent nucleic acid from described parent specimen;
(b) device is used for described a plurality of sequence readings and one or more karyomit(e) reference sequences are compared, and provides thus and the corresponding a plurality of sequence labels of these sequence readings;
(c) device, be used for identification from a number of those sequence labels of one or more interested karyomit(e)s or interested chromosome segment, these karyomit(e)s or chromosome segment are selected from karyomit(e) 1-22, X and Y and section thereof, and be used for each for described one or more interested karyomit(e)s or interested chromosome segment, identification is from a number of those sequence labels of at least one normalization method chromosome sequence or normalization method chromosome segment sequence, to determine a karyomit(e) dosage or chromosome segment dosage, wherein, described interested karyomit(e) or interested chromosome segment have copy number variation, and wherein said copy number variation is that the described karyomit(e) dosage by will each karyomit(e) in described one or more interested karyomit(e)s or the interested chromosome segment or chromosome segment compares definite with a respective threshold for each karyomit(e) in described one or more interested karyomit(e)s or the interested chromosome segment or chromosome segment; And
(d) device is used for determining described fetus mark with the dosage of described interested chromosomal dosage or described interested chromosome segment.
2. equipment as claimed in claim 1 is as calculating for the sequence label number of described selected interested karyomit(e) or section identification and ratio for the sequence label number of at least one corresponding normalization method chromosome sequence of described selected interested karyomit(e) or section or normalization method chromosome segment recognition sequence by the definite described karyomit(e) dosage of device (c) or section dosage wherein; Perhaps
That ratio as the sequence label density ratio of at least one corresponding normalization method chromosome sequence of the described selected interested karyomit(e) of the sequence label density ratio of described selected interested karyomit(e) or section and each or section or normalization method chromosome segment sequence calculates by the definite described karyomit(e) dosage of device (c) or section dosage wherein.
3. equipment as claimed in claim 1, further comprise a device (e), this device is used for calculating normalized karyomit(e) value (NCV) or normalized section value (NSV), wherein calculate this NCV described karyomit(e) dosage is associated with the mean value of corresponding karyomit(e) dosage in a combination lattice sample, as:
NCV iA = R iA - R &OverBar; lU &sigma; iU
Wherein
Figure FDA00002366924800022
With
Figure FDA00002366924800023
Respectively estimation mean value and the standard deviation for i karyomit(e) dosage in this combination lattice sample, and
Figure FDA00002366924800024
That wherein said i karyomit(e) is described interested karyomit(e) for i karyomit(e) dosage that karyomit(e) calculates in the specimen;
Wherein calculate this NSV described chromosome segment dosage is associated with the mean value of corresponding chromosome segment dosage in a combination lattice sample, as
NSV iA = R iA - R &OverBar; lU &sigma; iU
Wherein
Figure FDA00002366924800026
With
Figure FDA00002366924800027
Respectively estimation mean value and the standard deviation for i chromosome segment dosage in this combination lattice sample, and That wherein said i chromosome segment is described interested chromosome segment for i chromosome segment dosage that chromosome segment calculates in the specimen.
4. equipment as claimed in claim 3, wherein this device (d) is determined described fetus mark according to following formula:
ff=2×|NCV iACV iU|
Wherein ff is the fetus fractional value, NCV IAThe normalized karyomit(e) value on i karyomit(e) in described specimen, and CV IUBe the variation coefficient of i chromosomal dosage determining in described qualified samples, wherein said i karyomit(e) is described interested karyomit(e); Or
Wherein this device (d) is determined described fetus mark according to following formula;
ff=2×|NSV iACV iU|
Wherein ff is the fetus fractional value, NSV IAThe normalized chromosomal region segment value on i chromosome segment in described specimen, and CV IUBe the variation coefficient of the dosage of i chromosome segment determining in described qualified samples, wherein said i chromosome segment is described interested chromosome segment.
5. equipment as claimed in claim 1, wherein said interested karyomit(e) is the X chromosome of euchromosome or male fetus, described interested chromosome segment is selected from the X chromosome of euchromosome or male fetus.
6. equipment as claimed in claim 1, wherein at least one normalization method chromosome sequence or normalization method chromosome segment sequence are that this carries out in the following manner: (i) identification is for a plurality of qualified samples of this interested karyomit(e) or section for an interested karyomit(e) that is associated or section selected karyomit(e) or section; (ii) come for this selected karyomit(e) or section double counting karyomit(e) dosage or chromosome segment dosage with a plurality of potential normalization method chromosome sequences or normalization method chromosome segment sequence; And (iii) individually or in a combination this normalization method chromosome sequence or normalization method chromosome segment sequence are selected, thereby in the karyomit(e) dosage that calculates or chromosome segment dosage, provided minimum variability and/or maximum resolvability.
7. equipment as claimed in claim 1, wherein said normalization method chromosome sequence are to be selected among karyomit(e) 1-22, X and the Y any one or a plurality of individual chromosome or group chromosomes.
8. equipment as claimed in claim 1, wherein said normalization method sector sequence are from any one or a plurality of single section or one group of sections among karyomit(e) 1-22, X and the Y.
9. equipment as claimed in claim 1, wherein said copy number variation is to be selected from lower group, this group is comprised of the following: complete chromosome duplication, complete chromosome deletion, partial replication, part multiplication, partial insertion and excalation.
10. equipment as claimed in claim 1, further comprise a device, the described fetus mark that this device is used for using karyomit(e) dosage or chromosome segment dose determination shows the definite fetus mark of the unbalanced information that is present in non-described chromosomal one or more polymorphisms interested of allelotrope from the fetus of parent specimen and parent nucleic acid and compares with using.
11. one kind is used for medical analysis equipment that the copy number variation of Fetal genome is classified, this equipment comprises:
(1) device is used for receiving a plurality of sequence readings from fetus and the parent nucleic acid of a parent specimen;
(2) devices are used for these sequence readings being compared with one or more karyomit(e) reference sequences and a plurality of sequence labels corresponding with these sequence readings being provided thus;
(3) devices are used for identification from a number of one or more interested chromosomal those sequence labels, and determine that first an interested karyomit(e) in this fetus makes a variation with a kind of copy number;
(4) devices are used for calculating first a fetus fractional value by a kind of the first method, and this first method is not used the information from these first interested chromosomal these labels;
(5) devices are used for calculating second a fetus fractional value by a kind of the second method, and this second method is used the information from these labels of this first chromosome; And
(6) devices are used for this first fetus fractional value and this second fetus fractional value are compared, and use this relatively this first chromosomal copy number variation interested to be classified.
12. equipment as claimed in claim 11, wherein the device of this first method (4) comprises and uses an assembly that calculates this first fetus fractional value from the information of unbalanced one or more polymorphisms of allelotrope in the fetus that represents this parent specimen and the parent nucleic acid, and described polymorphism is present in the non-described first chromosomal karyomit(e) interested; With
Wherein the device of this second method (5) comprising:
(a) assembly be used for to calculate sequence label number from this first interested karyomit(e) and at least one normalization method chromosome sequence to determine a karyomit(e) dosage; And
(b) assembly, by this second method by this second fetus fractional value of this karyomit(e) Rapid Dose Calculation.
13. equipment as claimed in claim 12, wherein the information of the device of this first method (4) use comprises the sequence label that obtains by predetermined polymorphic sequence is checked order, and each of described polymorphic sequence comprises described one or more polymorphic site.
14. equipment as claimed in claim 13, wherein the information of the device of this first method (4) use obtains by non-sequence measurement.
15. equipment as claimed in claim 14, wherein said method are qPCR, digital pcr, mass spectrometry or capillary gel electrophoresis.
16. equipment as claimed in claim 12, the device of wherein said the second method (5) further comprises an assembly that calculates normalized karyomit(e) value (NCV), this assembly is associated described karyomit(e) dosage with the mean value of corresponding karyomit(e) dosage in a combination lattice sample, as:
NCV iA = R iA - R &OverBar; lU &sigma; iU
Wherein
Figure FDA00002366924800052
And σ IURespectively estimation mean value and the standard deviation for i in this combination lattice sample chromosomal dosage, and R IAThat wherein said i karyomit(e) is described interested karyomit(e) for i karyomit(e) dosage that karyomit(e) calculates in the specimen.
Pass through the following formula evaluation 17. calculate the assembly of described fetus mark in the equipment as claimed in claim 16, the device of wherein said the second method (5):
ff=2×|NCV iACV iU|
Wherein ff is this second fetus fractional value, NCV IAThe normalized karyomit(e) value on i karyomit(e) in the described specimen, and CV IUBe the variation coefficient of the individual chromosomal dosage of the i of institute determined in described qualified samples, wherein said i karyomit(e) is described interested karyomit(e).
18. equipment as claimed in claim 11, the device of wherein said the first method (4) comprising:
(a) assembly be used for to calculate sequence label number from the non-described first chromosomal karyomit(e) interested and at least one normalization method chromosome sequence to determine a karyomit(e) dosage; And
(b) assembly is used for by this first method from this first fetus fractional value of this karyomit(e) Rapid Dose Calculation; With
The device of wherein said the second method (5) comprising:
(a) assembly be used for to calculate sequence label number from this first interested karyomit(e) and at least one normalization method chromosome sequence to determine a karyomit(e) dosage; And
(b) assembly is used for by this second method from this second fetus fractional value of this karyomit(e) Rapid Dose Calculation.
19. equipment as claimed in claim 18, the device (5) of the device of wherein said the first method (4) and described the second method further comprises respectively be used to an assembly that calculates normalized karyomit(e) value (NCV) and the assembly that uses this NCV, wherein calculating NCV is that the karyomit(e) dosage that will calculate is associated with a mean value that makes up the corresponding karyomit(e) dosage in the lattice sample, as:
NCV iA = R iA - R &OverBar; lU &sigma; iU
Wherein
Figure FDA00002366924800062
And σ IURespectively estimation mean value and the standard deviation for i in this combination lattice sample chromosomal dosage, and R IAI chromosomal dosage in the specimen of calculating,
Wherein
For the device (4) of this first method, described i karyomit(e) is described the non-described first chromosomal karyomit(e) interested;
For the device (5) of this second method, described i karyomit(e) is the described first interested karyomit(e).
20. equipment as claimed in claim 19, wherein the assembly of the calculating fetus mark of the device (5) of the device of the first method (4) and the second method passes through the following formula evaluation:
ff=2×|NCV iACV iU|
Wherein ff is the fetus fractional value, NCV IAThe normalized karyomit(e) value on i karyomit(e) in the described specimen, and CV IUIt is the variation coefficient of i chromosomal dosage in the described qualified samples;
Wherein
For the device (4) that is used for this first method, described i karyomit(e) is described the non-described first chromosomal karyomit(e) interested;
For the device (5) that is used for this second method, described i karyomit(e) is the described first interested karyomit(e).
21. equipment as claimed in claim 20, wherein when described fetus was the male sex, described the non-described first chromosomal karyomit(e) interested was X chromosome.
22. such as claim 12 or 18 described equipment, wherein should be used for the device (6) that this first fetus fractional value and this second fetus fractional value compare is comprised whether approximately equalised assembly of definite these two fetus fractional values.
23. equipment as claimed in claim 22, wherein this device (6) further comprises an assembly, is used for determining when this two fetus fractional value approximately equals that a kind of ploidy hypothesis that this second method implies is real.
24. equipment as claimed in claim 23, this ploidy hypothesis that wherein implies in this second method is: this first interested karyomit(e) has a kind of complete chromosomal aneuploidy.
25. equipment as claimed in claim 24, wherein this first interested chromosomal complete chromosomal aneuploidy is a kind of monosomy or a kind of trisomy.
26. equipment as claimed in claim 25, further comprise a device, this device is used for analyzing this first interested chromosomal label information to determine whether that (i) first interested karyomit(e) is with a kind of part dysploidy, or (ii) this fetus is a mosaic, and device that wherein be used for to analyze this first interested chromosomal label information is configured to be used for indicating these two fetus fractional values not carry out during approximately equal in the device that this first fetus fractional value and this second fetus fractional value compare at this.
27. equipment as claimed in claim 11, the device of wherein said the first method (4) comprises uses an assembly that calculates this first fetus fractional value from the information of unbalanced one or more polymorphisms of allelotrope in the fetus that represents this parent specimen and the parent nucleic acid, and described polymorphism is present in the non-described first chromosomal karyomit(e) interested; With
The device of described the second method (5) comprises uses an assembly that calculates this second fetus fractional value from the information of unbalanced one or more polymorphisms of allelotrope in the fetus that represents this parent specimen and the parent nucleic acid, and described polymorphism is present in the described first interested karyomit(e).
28. equipment as claimed in claim 27 wherein is used for device (6) relatively and comprises:
An assembly is used for determining that when the ratio of described the second fetus fractional value and the first fetus fractional value is approximately 1 the described first interested karyomit(e) is diploid;
An assembly is used for determining that when the ratio of described the second fetus fractional value and the first fetus fractional value is approximately 1.5 the described first interested karyomit(e) is triploid; With
An assembly is used for determining that when the ratio of described the second fetus fractional value and the first fetus fractional value is approximately 0.5 the described first interested karyomit(e) is monoploid.
29. equipment as claimed in claim 28, further comprise a device analyzing described the first interested chromosomal label information, to determine whether that (i) first interested karyomit(e) is with a kind of part dysploidy, or (ii) this fetus is a mosaic, and it is not to be approximately 1 that the device of wherein analyzing this first interested chromosomal label information is configured to indicate the ratio of the second fetus fractional value and the first fetus fractional value at the device (6) of described comparison the first fetus fractional value and the second fetus fractional value, 1.5 or carried out in 0.5 o'clock.
30. such as claim 26 or 29 described equipment, the device that wherein should be used for this first interested chromosomal label information of analysis comprises:
(a) assembly is used for this first interested chromosomal sequence is cased into a plurality of parts;
Whether (b) assembly, any that be used for to determine described part comprise than one or more other parts is significantly more manyed or remarkable still less nucleic acid; And
(c) assembly, any one contains significantly more or during significantly still less nucleic acid, determines that this first interested karyomit(e) is with a kind of part dysploidy compare described part with one or more other parts if be used for; If perhaps all do not comprise significantly more or during significantly still less nucleic acid, determine that this fetus is a mosaic comparing described part with one or more other parts.
31. equipment as claimed in claim 30, wherein this assembly (c) is further determined, comprise than one or more other parts significantly more many or remarkable still less this first interested chromosomal part of nucleic acid with this part dysploidy.
32. equipment as claimed in claim 11, wherein this first interested karyomit(e) is to be selected from lower group, and this group is comprised of karyomit(e) 1-22, X and Y.
33. equipment as claimed in claim 11, wherein this device (6) comprises that this group is comprised of the following for the assembly that the variation of this copy number is categorized into a classification that is selected from lower group: complete karyomit(e) insertion, complete chromosome deletion, chromosome dyad copies and chromosome dyad lacks and mosaic.
34. equipment as claimed in claim 11 further comprises:
(i) be used for determining that this copy number variation is by a kind of part dysploidy or the device that mosaic causes; And
(ii) if when the variation of this copy number is caused by a kind of part dysploidy, be used to determine a device of the locus of this part dysploidy on this first interested karyomit(e),
Wherein (i) and (ii) in these devices be configured to not carry out during approximately equal at definite this first fetus fractional value of this device that is used for this first fetus fractional value and this second fetus fractional value are compared and this second fetus fractional value.
35. equipment as claimed in claim 34, wherein the device for this locus of determining this part dysploidy on this first interested karyomit(e) comprises for these first interested chromosomal these sequence labels classification are entered the nucleic acid data box of this first interested karyomit(e) or an assembly of matrix; And for the assembly that the map tags of each case is counted.
36. such as claim 1 or 11 described equipment, wherein said parent specimen is a kind of blood, blood plasma, serum or urine samples.
37. such as claim 1 or 11 described equipment, wherein said fetus and parent nucleic acid are Cell-free DNA (cfDNA).
38. such as claim 1 or 11 described equipment, further comprise a sequenator, this sequenator is configured to for these fetuses of a parent specimen and parent nucleic acid being checked order and obtaining these sequence readings.
39. equipment as claimed in claim 38, wherein said sequenator are configured to be used to carrying out the synthesis method order-checking.
40. equipment as claimed in claim 39, wherein said sequenator are configured to use reversible dyestuff terminator to carry out the synthesis method order-checking.
41. equipment as claimed in claim 38, wherein said sequenator are configured to be used to carrying out the connection method order-checking.
42. equipment as claimed in claim 38, wherein said sequenator are configured to be used to carrying out single-molecule sequencing.
43. equipment as claimed in claim 38, the wherein device of this sequenator and equipment as claimed in claim 1 (a)-(d), or the device of equipment as claimed in claim 11 (1)-(6) are arranged in place separately and connect by a network.
44. equipment as claimed in claim 38 further comprises a device that is used for obtaining from conceived mother this parent specimen.
45. equipment as claimed in claim 44 wherein is positioned at place separately for device (1)-(6) that obtain the device of this parent specimen and the device of equipment as claimed in claim 1 (a)-(d) or equipment as claimed in claim 11.
46. equipment as claimed in claim 44 further comprises a device that is used for extracting from this parent specimen Cell-free DNA.
47. equipment as claimed in claim 46, this device and this sequenator that wherein are used for the extraction Cell-free DNA are positioned at same place, and wherein are positioned at a remote site for the equipment that obtains this parent specimen.
48. equipment as claimed in claim 38, wherein fetus and the parent nucleic acid in this parent specimen is Cell-free DNA.
49. such as claim 1 or 11 described equipment, this device (2) that wherein is used for comparing is compared the reading at least about 100 ten thousand.
50. the method for the fetus mark of the parent specimen of a mixture that is used for determining comprising fetus and parent nucleic acid, described method comprises:
(a) fetus from the parent specimen and parent nucleic acid receive a plurality of sequence readings;
(b) described sequence reading and one or more karyomit(e) reference sequences are compared, and a plurality of sequence labels corresponding with these sequence readings are provided thus;
(C) Identification of interest from one or more chromosome or chromosomal region of interest that a number of sequence tags; these chromosomes or chromosome segments selected from 1-22, X and Y and the block; and this is of interest for one or more chromosomes or chromosomal regions of interest can be identified in each of the at least one normalized from chromosomal sequence or chromosome segments normalized sequence of a sequence tag that number; to identify a chromosomes or chromosome segments dose
Wherein, described one or more interested karyomit(e) or interested chromosome segment have copy number variation, and wherein said copy number variation is that the dosage by will each karyomit(e) in described one or more interested karyomit(e)s or the interested chromosome segment or chromosome segment compares definite with a respective threshold for each karyomit(e) in described one or more interested karyomit(e)s or the interested chromosome segment or chromosome segment;
(d) use described karyomit(e) dosage or chromosome segment dosage, determine described fetus mark.
51. method as claimed in claim 50, the described karyomit(e) dosage of wherein determining in step (c) or described section dosage are as calculating for the sequence label number of described selected interested karyomit(e) or section identification with for selected interested karyomit(e) or the section ratio to the sequence label number of corresponding at least one corresponding normalization method chromosome sequence or normalization method chromosome segment recognition sequence; Perhaps
That ratio as the sequence label density ratio of at least one corresponding normalization method chromosome sequence of the described selected interested karyomit(e) of the sequence label density ratio of described selected interested karyomit(e) or section and each or section or normalization method chromosome segment sequence calculates by the definite described karyomit(e) dosage of device (c) or section dosage wherein.
52. method as claimed in claim 50 further comprises and calculates a normalized karyomit(e) value (NCV), wherein calculate this NCV with described karyomit(e) dosage be associated at a mean value that makes up the corresponding karyomit(e) dosage in the lattice sample, as:
NCV iA = R iA - R &OverBar; lU &sigma; iU
Wherein
Figure FDA00002366924800131
With
Figure FDA00002366924800132
Respectively estimation mean value and the standard deviation for i karyomit(e) dosage in this combination lattice sample, and That wherein said i karyomit(e) is described interested karyomit(e) for i karyomit(e) dosage that karyomit(e) calculates in the specimen.
53. method as claimed in claim 52 is wherein determined described fetus mark according to following formula:
ff=2×|NCV iACV iU|
Wherein ff is this fetus fractional value, NCV IAThe normalized karyomit(e) value on i karyomit(e) in specimen, and CV IUBe the variation coefficient of i chromosomal dosage determining in these qualified samples, wherein said i karyomit(e) is described interested karyomit(e).
54. method as claimed in claim 53, wherein said interested karyomit(e) are the X chromosomes of euchromosome or male fetus, described interested chromosome segment is selected from the X chromosome of euchromosome or male fetus.
55. method as claimed in claim 50, further comprise and calculate a normalized section value (NSV), wherein calculate this NSV described chromosome segment dosage is associated with the mean value of corresponding chromosome segment dosage in a combination lattice sample, as:
NSV iA = R iA - R &OverBar; lU &sigma; iU
Wherein
Figure FDA00002366924800135
With
Figure FDA00002366924800136
Respectively estimation mean value and the standard deviation for i chromosome segment dosage in this combination lattice sample, and
Figure FDA00002366924800137
That wherein said i karyomit(e) is described interested karyomit(e) for i chromosome segment dosage that chromosome segment calculates in the specimen.
56. method as claimed in claim 55 is wherein determined described fetus mark according to following formula:
ff=2×|NSV iACV iU|
Wherein ff is this fetus fractional value, NSV IAThe normalized chromosomal region segment value on i chromosome segment in specimen, and CV IUBe the variation coefficient of the dosage of i chromosome segment determining in these qualified samples, wherein said i karyomit(e) is described interested karyomit(e).
57. method as claimed in claim 50, wherein at least one normalization method chromosome sequence or normalization method chromosome segment sequence are that this carries out in the following manner: (i) identification is for a plurality of qualified samples of this interested karyomit(e) or section for an interested karyomit(e) that is associated or section selected karyomit(e) or section; (ii) come for this selected karyomit(e) or section double counting karyomit(e) dosage or chromosome segment dosage with a plurality of potential normalization method chromosome sequences or normalization method chromosome segment sequence; And (iii) individually or in a kind of combination this normalization method chromosome sequence or normalization method chromosome segment sequence are selected, thereby in the karyomit(e) dosage that calculates or chromosome segment dosage, provided minimum variability and/or maximum resolvability.
58. method as claimed in claim 50, wherein said normalization method chromosome sequence are selected among karyomit(e) 1-22, X and the Y any one or a plurality of individual chromosome or group chromosomes.
59. method as claimed in claim 50, wherein said normalization method sector sequence is from any one or a plurality of single section or one group of sections among karyomit(e) 1-22, X and the Y.
60. method as claimed in claim 50, wherein said copy number variation is to be selected from lower group, and this group is comprised of the following: complete chromosome duplication, complete chromosome deletion, partial replication, part multiplication, partial insertion and excalation.
61. method as claimed in claim 50 comprises that further the described fetus mark that will use karyomit(e) dosage or chromosome segment dose determination compares with using from showing the fetus mark that the unbalanced information that is present in non-described first chromosomal one or more polymorphisms interested of allelotrope determines in the fetus of parent specimen and the parent nucleic acid.
62. one kind is used for medical analysis method that the copy number variation of Fetal genome is classified, described method comprises:
(a) fetus from a parent specimen and parent nucleic acid obtain a plurality of sequence readings;
(b) these sequence readings and one or more karyomit(e) reference sequences are compared, and a plurality of sequence labels corresponding with these sequence readings are provided thus;
(c) identification is from a number of one or more interested chromosomal those sequence labels, and determines that first an interested karyomit(e) in this fetus makes a variation with a kind of copy number;
(d) calculate first a fetus fractional value by a kind of the first method, this first method is not used the information from these first interested chromosomal these labels;
(e) calculate second a fetus fractional value by a kind of the second method, this second method is used the information from these labels of this first chromosome; And
(f) compare and use this relatively this copy number variation of this first chromosome to be classified this first fetus fractional value and this second fetus fractional value.
63. method as claimed in claim 62, wherein this first method comprises using from the information of unbalanced one or more polymorphisms of allelotrope in the fetus that represents this parent specimen and the parent nucleic acid and calculates this first fetus fractional value, and described polymorphism is present in the non-described first chromosomal karyomit(e) interested; With
Wherein this second method comprises:
(a) calculate sequence label number from this first interested karyomit(e) and at least one normalization method chromosome sequence to determine a karyomit(e) dosage; And
(b) pass through this second method from this second fetus fractional value of this karyomit(e) Rapid Dose Calculation.
64. such as the described method of claim 63, wherein the information of this first method use comprises the sequence label that obtains by predetermined polymorphic sequence is checked order, each of described polymorphic sequence comprises described one or more polymorphic site.
65. such as the described method of claim 63, wherein the information of this first method use obtains by non-sequence measurement.
66. such as the described method of claim 65, wherein said method is qPCR, digital pcr, mass spectrometry or capillary gel electrophoresis.
67. such as the described method of claim 63, wherein said the second method further comprises calculates normalized karyomit(e) value (NCV), wherein said the second method is used this NCV, wherein said calculating is associated described karyomit(e) dosage with the mean value of corresponding karyomit(e) dosage in a combination lattice sample, as:
NCV iA = R iA - R &OverBar; lU &sigma; iU
Wherein
Figure FDA00002366924800162
And σ IURespectively estimation mean value and the standard deviation for i in this combination lattice sample chromosomal dosage, and R IAThat wherein said i karyomit(e) is described interested karyomit(e) for i karyomit(e) dosage that karyomit(e) calculates in the specimen.
68. such as the described method of claim 67, described the second method of wherein calculating described fetus mark comprises by the following formula evaluation:
ff=2×|NCV iACV iU|
Wherein ff is this second fetus fractional value, NCV IAThe normalized karyomit(e) value on i karyomit(e) in the described specimen, and CV IUBe the variation coefficient of i chromosomal dosage determining in described qualified samples, wherein said i karyomit(e) is described interested karyomit(e).
69. method as claimed in claim 62, wherein said the first method comprises:
(a) calculating is from the sequence label number of described the non-described first chromosomal karyomit(e) interested and at least one normalization method chromosome sequence, to determine this non-described first chromosomal karyomit(e) dosage interested; And
(b) pass through this first method from this first fetus fractional value of this karyomit(e) Rapid Dose Calculation; Comprise with wherein said the second method:
(a) calculate sequence label number from this first interested karyomit(e) and at least one normalization method chromosome sequence to determine a karyomit(e) dosage; And
(b) pass through this second method from this second fetus fractional value of this karyomit(e) Rapid Dose Calculation.
70. such as the described method of claim 69, wherein said the first method and described the second method further comprise calculates corresponding normalization method karyomit(e) value (NCV), and the first method and the second method are used corresponding NCV, wherein said calculating is associated the karyomit(e) dosage of determining with a mean value that makes up the corresponding karyomit(e) dosage in the lattice sample, as:
NCV iA = R iA - R &OverBar; lU &sigma; iU
Wherein
Figure FDA00002366924800172
And σ IURespectively estimation mean value and the standard deviation for i in this combination lattice sample chromosomal dosage, and R IAI chromosomal dosage in the specimen of calculating,
Wherein
For this first method, described i karyomit(e) is described the non-described first chromosomal karyomit(e) interested;
For this second method, described i karyomit(e) is the described first interested karyomit(e).
71. such as the method that claim 70 is stated, wherein the first method and the second method are passed through the following formula evaluation:
ff=2×|NCV iACV iU|
Wherein ff is the fetus fractional value, NCV IAThe normalized karyomit(e) value on i karyomit(e) in the described specimen, and CV IUIt is the variation coefficient of i chromosomal dosage in the described qualified samples;
Wherein
For being used for this first method, described i karyomit(e) is described the non-described first chromosomal karyomit(e) interested;
For being used for this second method, described i karyomit(e) is the described first interested karyomit(e).
72. such as the described method of claim 71, wherein when described fetus was the male sex, described the non-described first chromosomal karyomit(e) interested was X chromosome.
73. method as claimed in claim 62, wherein this step (f) comprises and determines whether approximately equal of these two fetus fractional values.
74. such as the described method of claim 73, wherein this step (f) further comprises: determine that when this two fetus fractional value approximately equals a kind of ploidy hypothesis that implies in this second method is real.
75. such as the described method of claim 74, this ploidy hypothesis that wherein implies in this second method is that this first interested karyomit(e) has a kind of complete chromosomal aneuploidy.
76. such as the described method of claim 75, wherein this first interested chromosomal this complete chromosomal aneuploidy is a kind of monosomy or a kind of trisomy.
77. such as claim 63 or 69 described methods, further comprise a step (g): analyze this first interested chromosomal this label information, to determine whether that (i) first interested karyomit(e) is with a kind of part dysploidy, or (ii) at these two fetus fractional values not during approximately equal, this fetus is a mosaic.
78. method as claimed in claim 62, wherein said the first method comprises using from the information of unbalanced one or more polymorphisms of allelotrope in the fetus that represents this parent specimen and the parent nucleic acid calculates this first fetus fractional value, and described polymorphism is present in the non-described first chromosomal karyomit(e) interested;
With
Described the second method comprises using from the information of unbalanced one or more polymorphisms of allelotrope in the fetus that represents this parent specimen and the parent nucleic acid calculates this second fetus fractional value, and described polymorphism is present in the described first interested karyomit(e).
79. such as the described method of claim 78, wherein step (f) comprising:
When being approximately 1, the ratio of described the second fetus fractional value and the first fetus fractional value determines that the described first interested karyomit(e) is diploid;
When being approximately 1.5, the ratio of described the second fetus fractional value and the first fetus fractional value determines that the described first interested karyomit(e) is triploid; With
When being approximately 0.5, the ratio of described the second fetus fractional value and the first fetus fractional value determines that the described first interested karyomit(e) is monoploid.
80. such as the described method of claim 79, comprise that further the ratio when the second fetus fractional value and the first fetus fractional value is not when being approximately 1,1.5 or 0.5, analyze the step (g) of described the first interested chromosomal label information, determining whether (i) first interested karyomit(e) with a kind of part dysploidy, or (ii) this fetus is a mosaic.
81. such as claim 77 or 80 described methods, wherein dissecting needle comprises the described step (g) of this first interested chromosomal this label information:
(a) this first interested chromosomal this sequence is cased into a plurality of parts;
(b) determining whether in the described part any comprises than one or more other parts significantly more manys or significantly still less nucleic acid; And
(c) comparing with one or more other parts, if any one of described part contains significantly more or during nucleic acid significantly still less, determine that this first interested karyomit(e) is with a kind of part dysploidy; Perhaps comparing with one or more other parts, if described part does not all comprise significantly more or during nucleic acid significantly still less, determine that this fetus is a mosaic.
82. such as the described method of claim 81, further comprise, determine to comprise than one or more other parts significantly more many or this first interested chromosomal part of significantly still less nucleic acid with this part dysploidy.
83. method as claimed in claim 62, wherein this first interested karyomit(e) is to be selected from lower group, and this group is comprised of karyomit(e) 1-22, X and Y.
84. method as claimed in claim 62, wherein this step (f) comprises the variation of this copy number is categorized into and is selected from a classification of lower group, and this group is comprised of the following: complete chromosome duplication or multiplication, complete chromosome deletion, chromosome dyad copies and chromosome dyad lacks and mosaic.
85. method as claimed in claim 62 further comprises:
(i) determine whether this copy number variation is caused by a kind of part dysploidy or a mosaic; And
(ii) when being caused by a kind of part dysploidy, the variation of this copy number determines the locus of this part dysploidy on this first interested karyomit(e),
Wherein this step (f) compares this first fetus fractional value and this second fetus fractional value, has determined not approximately equal of this first fetus fractional value and this second fetus fractional value.
86. such as the described method of claim 85, the locus of wherein determining the part dysploidy on this first interested karyomit(e) comprises these first interested chromosomal these sequence labels is categorized into nucleic acid data box or matrix in this first interested karyomit(e); And the map tags in each case is counted.
87. such as claim 50 or 62 described methods, wherein this comparison in the step (b) comprises that comparison is at least about 1,000,000 readings.
88. such as claim 50 or 62 described methods, further comprise the fetus in the described parent specimen and parent nucleic acid checked order, to obtain these nucleic acid readings.
89. such as the described method of claim 88, wherein this order-checking comprises and checks order to provide these sequence readings to the Cell-free DNA from this parent specimen.
90. such as the described method of claim 88, wherein said order-checking comprises carries out extensive parallel order-checking to produce these sequence readings to these parents from this parent specimen with fetal nucleic acid.
91. such as the described method of claim 90, wherein said extensive parallel order-checking is the synthesis method order-checking.
92. such as the described method of claim 91, reversible dyestuff terminator is used in wherein said synthesis method order-checking.
93. such as the described method of claim 90, wherein said extensive parallel order-checking is the connection method order-checking.
94. such as the described method of claim 90, wherein said extensive parallel order-checking is single-molecule sequencing.
95. such as claim 50 or 62 described methods, further comprise from a conceived organism obtaining described parent specimen.
96. such as claim 50 or 62 described methods, wherein said maternal sample is a kind of blood, blood plasma, serum or urine samples.
CN201210441134.8A 2012-04-12 2012-11-07 Copy the detection and classification of number variation Active CN103374518B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810154581.2A CN108485940B (en) 2012-04-12 2012-11-07 Detection and classification of copy number variation
CN201710644858.5A CN107435070A (en) 2012-04-12 2012-11-07 Copy the detection and classification of number variation

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US13/445,778 2012-04-12
US13/445,778 US9447453B2 (en) 2011-04-12 2012-04-12 Resolving genome fractions using polymorphism counts
US13/482,964 US20120270739A1 (en) 2010-01-19 2012-05-29 Method for sample analysis of aneuploidies in maternal samples
US13/482,964 2012-05-29
US13/555,037 US9260745B2 (en) 2010-01-19 2012-07-20 Detecting and classifying copy number variation
US13/555,037 2012-07-20

Related Child Applications (2)

Application Number Title Priority Date Filing Date
CN201810154581.2A Division CN108485940B (en) 2012-04-12 2012-11-07 Detection and classification of copy number variation
CN201710644858.5A Division CN107435070A (en) 2012-04-12 2012-11-07 Copy the detection and classification of number variation

Publications (2)

Publication Number Publication Date
CN103374518A true CN103374518A (en) 2013-10-30
CN103374518B CN103374518B (en) 2018-03-27

Family

ID=49460351

Family Applications (4)

Application Number Title Priority Date Filing Date
CN201220583608.8U Expired - Lifetime CN204440396U (en) 2012-04-12 2012-11-07 For determining the kit of fetus mark
CN201710644858.5A Pending CN107435070A (en) 2012-04-12 2012-11-07 Copy the detection and classification of number variation
CN201810154581.2A Active CN108485940B (en) 2012-04-12 2012-11-07 Detection and classification of copy number variation
CN201210441134.8A Active CN103374518B (en) 2012-04-12 2012-11-07 Copy the detection and classification of number variation

Family Applications Before (3)

Application Number Title Priority Date Filing Date
CN201220583608.8U Expired - Lifetime CN204440396U (en) 2012-04-12 2012-11-07 For determining the kit of fetus mark
CN201710644858.5A Pending CN107435070A (en) 2012-04-12 2012-11-07 Copy the detection and classification of number variation
CN201810154581.2A Active CN108485940B (en) 2012-04-12 2012-11-07 Detection and classification of copy number variation

Country Status (1)

Country Link
CN (4) CN204440396U (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104152553A (en) * 2014-07-21 2014-11-19 上海交通大学 Kit for auxiliary diagnosis of 21-trisomy syndrome of fetus to be tested
CN106795558A (en) * 2014-05-30 2017-05-31 维里纳塔健康公司 Detection fetus Asia chromosomal aneuploidy and copy number variation
CN107810502A (en) * 2015-05-18 2018-03-16 瑞泽恩制药公司 For copying the method and system of number variation detection
CN107922973A (en) * 2015-07-07 2018-04-17 远见基因组系统公司 Method and system for the modification detection based on sequencing
CN107960107A (en) * 2015-06-15 2018-04-24 默多克儿童研究所 The method for measuring chimerism
CN108026576A (en) * 2015-09-22 2018-05-11 香港中文大学 Pass through the shallow deep sequencing accurate quantitative analysis foetal DNA fraction of mother's plasma dna
CN108348167A (en) * 2015-09-09 2018-07-31 优比欧迈公司 For the diagnosis of brain-cranium face health associated disease from microbial population and therapy and system
CN108603228A (en) * 2015-12-17 2018-09-28 夸登特健康公司 The method for determining oncogene copy number by analyzing Cell-free DNA
RU2674700C2 (en) * 2016-12-30 2018-12-12 Общество с ограниченной ответственностью "Научно-производственная фирма ДНК-Технология" (ООО "НПФ ДНК-Технология") Method of determining the source of aneuploid cells on the blood of a pregnant woman
CN109689891A (en) * 2016-07-06 2019-04-26 夸登特健康公司 The method of segment group spectrum analysis for cell-free nucleic acid
CN111105844A (en) * 2019-11-22 2020-05-05 广州金域医学检验集团股份有限公司 Somatic cell variation classification method, device, equipment and readable storage medium
US10731149B2 (en) 2015-09-08 2020-08-04 Cold Spring Harbor Laboratory Genetic copy number determination using high throughput multiplex sequencing of smashed nucleotides
CN111948394A (en) * 2020-08-10 2020-11-17 山西医科大学 Application of TSTA3 and LAMP2 as target objects in esophageal squamous cell carcinoma metastasis detection and drug screening
CN112639120A (en) * 2018-07-24 2021-04-09 阿费梅特里克斯公司 Array-based methods and kits for determining copy number and genotype of pseudogenes
CN112823391A (en) * 2019-06-03 2021-05-18 Illumina公司 Quality control metrics based on detection limits
CN113096726A (en) * 2016-02-03 2021-07-09 维里纳塔健康公司 Use of cell-free DNA fragment size to determine copy number variation
US11072814B2 (en) 2014-12-12 2021-07-27 Verinata Health, Inc. Using cell-free DNA fragment size to determine copy number variations
CN113462781A (en) * 2015-01-13 2021-10-01 香港中文大学 Detection of cancer using size and number aberrations of plasma DNA
CN113528645A (en) * 2014-03-14 2021-10-22 凯尔迪克斯公司 Methods for monitoring immunosuppressive therapy in a transplant recipient
CN113684277A (en) * 2021-09-06 2021-11-23 南方医科大学南方医院 Method for predicting ovarian cancer homologous recombination defect based on biomarker of genome copy number variation and application
AU2018288772B2 (en) * 2017-06-20 2022-02-24 Illumina, Inc. Methods and systems for decomposition and quantification of dna mixtures from multiple contributors of known or unknown genotypes
CN114093417A (en) * 2021-11-23 2022-02-25 深圳基因家科技有限公司 Method and device for identifying chromosomal arm heterozygosity loss
US11990208B2 (en) 2017-06-20 2024-05-21 Illumina, Inc. Methods for accurate computational decomposition of DNA mixtures from contributors of unknown genotypes

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108427864B (en) * 2018-02-14 2019-01-29 南京世和基因生物技术有限公司 A kind of detection method, device and computer-readable medium copying number variation
CN110656159B (en) * 2018-06-28 2024-01-09 深圳华大生命科学研究院 Copy number variation detection method
CN110880356A (en) * 2018-09-05 2020-03-13 南京格致基因生物科技有限公司 Method and apparatus for screening, diagnosing or risk stratification for ovarian cancer
CN109628579B (en) * 2019-01-13 2022-11-15 清华大学 Detection method for determining whether chromosome number in biological sample is abnormal
CN110373477B (en) * 2019-07-23 2021-05-07 华中农业大学 Molecular marker cloned from CNV fragment and related to porcine ear shape character
CN110317877A (en) * 2019-08-02 2019-10-11 苏州宏元生物科技有限公司 Application of the unstable variation of one group chromosome in preparation diagnosis bladder transitional cell carcinoma, the reagent or kit of assessing prognosis
CN110452985A (en) * 2019-08-02 2019-11-15 苏州宏元生物科技有限公司 Application of the unstable variation of one group chromosome in the reagent or kit for preparing diagnosing liver cancer, assessment prognosis
CN112342627A (en) * 2019-08-09 2021-02-09 深圳市真迈生物科技有限公司 Preparation method and sequencing method of nucleic acid library
CN111394474B (en) * 2020-03-24 2022-08-16 西北农林科技大学 Method for detecting copy number variation of GAL3ST1 gene of cattle and application thereof
CN111476497B (en) * 2020-04-15 2023-06-16 浙江天泓波控电子科技有限公司 Distribution feed network method for miniaturized platform
CN112322722B (en) * 2020-11-13 2021-11-12 上海宝藤生物医药科技股份有限公司 Primer probe composition and kit for detecting 16p11.2 microdeletion and application thereof
CN112614548B (en) * 2020-12-25 2021-08-03 北京吉因加医学检验实验室有限公司 Method for calculating sample database building input amount and database building method thereof
CN113462768B (en) * 2021-07-29 2023-05-30 中国医学科学院整形外科医院 Primer and kit for detecting copy number of ECR region of small ear deformity patient by ddPCR
CN114507904B (en) * 2022-04-19 2022-07-12 北京迅识科技有限公司 Method for preparing second-generation sequencing library

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100112575A1 (en) * 2008-09-20 2010-05-06 The Board Of Trustees Of The Leland Stanford Junior University Noninvasive Diagnosis of Fetal Aneuploidy by Sequencing
WO2011090559A1 (en) * 2010-01-19 2011-07-28 Verinata Health, Inc. Sequencing methods and compositions for prenatal diagnoses

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2001274869A1 (en) * 2000-05-20 2001-12-03 The Regents Of The University Of Michigan Method of producing a dna library using positional amplification
WO2002002772A2 (en) * 2000-06-30 2002-01-10 Incyte Genomics, Inc. Human extracellular matrix (ecm)-related tumor marker
AU2004254552B2 (en) * 2003-01-29 2008-04-24 454 Life Sciences Corporation Methods of amplifying and sequencing nucleic acids
AU2011221243B2 (en) * 2010-02-25 2016-06-02 Advanced Liquid Logic, Inc. Method of making nucleic acid libraries
CN102409043B (en) * 2010-09-21 2013-12-04 深圳华大基因科技服务有限公司 Method for constructing high-flux and low-cost Fosmid library, label and label joint used in method
CN102127818A (en) * 2010-12-15 2011-07-20 张康 Method for creating fetus DNA library by utilizing peripheral blood of pregnant woman

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100112575A1 (en) * 2008-09-20 2010-05-06 The Board Of Trustees Of The Leland Stanford Junior University Noninvasive Diagnosis of Fetal Aneuploidy by Sequencing
WO2011090559A1 (en) * 2010-01-19 2011-07-28 Verinata Health, Inc. Sequencing methods and compositions for prenatal diagnoses
US20110201507A1 (en) * 2010-01-19 2011-08-18 Rava Richard P Sequencing methods and compositions for prenatal diagnoses

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CHIU ET AL: "Noninvasive prenatal diagnosis of fetal chromosomal aneuploidy by massively parallel genomic sequencing of DNA in maternal plasma", 《PNAS》 *

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113528645A (en) * 2014-03-14 2021-10-22 凯尔迪克斯公司 Methods for monitoring immunosuppressive therapy in a transplant recipient
CN106795558A (en) * 2014-05-30 2017-05-31 维里纳塔健康公司 Detection fetus Asia chromosomal aneuploidy and copy number variation
CN106795558B (en) * 2014-05-30 2020-07-10 维里纳塔健康公司 Detection of fetal sub-chromosomal aneuploidy and copy number variation
CN104152553A (en) * 2014-07-21 2014-11-19 上海交通大学 Kit for auxiliary diagnosis of 21-trisomy syndrome of fetus to be tested
US11072814B2 (en) 2014-12-12 2021-07-27 Verinata Health, Inc. Using cell-free DNA fragment size to determine copy number variations
CN114181997A (en) * 2014-12-12 2022-03-15 维里纳塔健康股份有限公司 Determination of copy number variation using cell-free DNA fragment size
CN113462781A (en) * 2015-01-13 2021-10-01 香港中文大学 Detection of cancer using size and number aberrations of plasma DNA
CN107810502A (en) * 2015-05-18 2018-03-16 瑞泽恩制药公司 For copying the method and system of number variation detection
CN107810502B (en) * 2015-05-18 2022-02-11 瑞泽恩制药公司 Method and system for copy number variation detection
CN107960107A (en) * 2015-06-15 2018-04-24 默多克儿童研究所 The method for measuring chimerism
CN107922973A (en) * 2015-07-07 2018-04-17 远见基因组系统公司 Method and system for the modification detection based on sequencing
CN107922973B (en) * 2015-07-07 2019-06-14 远见基因组系统公司 Method and system for the modification detection based on sequencing
US10731149B2 (en) 2015-09-08 2020-08-04 Cold Spring Harbor Laboratory Genetic copy number determination using high throughput multiplex sequencing of smashed nucleotides
US11739315B2 (en) 2015-09-08 2023-08-29 Cold Spring Harbor Laboratory Genetic copy number determination using high throughput multiplex sequencing of smashed nucleotides
CN108348167A (en) * 2015-09-09 2018-07-31 优比欧迈公司 For the diagnosis of brain-cranium face health associated disease from microbial population and therapy and system
CN108026576B (en) * 2015-09-22 2022-06-28 香港中文大学 Accurate quantification of fetal DNA fraction by shallow depth sequencing of maternal plasma DNA
CN108026576A (en) * 2015-09-22 2018-05-11 香港中文大学 Pass through the shallow deep sequencing accurate quantitative analysis foetal DNA fraction of mother's plasma dna
CN108603228B (en) * 2015-12-17 2023-09-01 夸登特健康公司 Method for determining tumor gene copy number by analyzing cell-free DNA
CN108603228A (en) * 2015-12-17 2018-09-28 夸登特健康公司 The method for determining oncogene copy number by analyzing Cell-free DNA
CN113096726A (en) * 2016-02-03 2021-07-09 维里纳塔健康公司 Use of cell-free DNA fragment size to determine copy number variation
US11430541B2 (en) 2016-02-03 2022-08-30 Verinata Health, Inc. Using cell-free DNA fragment size to determine copy number variations
CN113096726B (en) * 2016-02-03 2024-04-26 维里纳塔健康公司 Determination of copy number variation using cell-free DNA fragment size
CN109689891A (en) * 2016-07-06 2019-04-26 夸登特健康公司 The method of segment group spectrum analysis for cell-free nucleic acid
RU2674700C2 (en) * 2016-12-30 2018-12-12 Общество с ограниченной ответственностью "Научно-производственная фирма ДНК-Технология" (ООО "НПФ ДНК-Технология") Method of determining the source of aneuploid cells on the blood of a pregnant woman
US11990208B2 (en) 2017-06-20 2024-05-21 Illumina, Inc. Methods for accurate computational decomposition of DNA mixtures from contributors of unknown genotypes
AU2018288772B2 (en) * 2017-06-20 2022-02-24 Illumina, Inc. Methods and systems for decomposition and quantification of dna mixtures from multiple contributors of known or unknown genotypes
CN112639120A (en) * 2018-07-24 2021-04-09 阿费梅特里克斯公司 Array-based methods and kits for determining copy number and genotype of pseudogenes
CN112823391A (en) * 2019-06-03 2021-05-18 Illumina公司 Quality control metrics based on detection limits
CN111105844A (en) * 2019-11-22 2020-05-05 广州金域医学检验集团股份有限公司 Somatic cell variation classification method, device, equipment and readable storage medium
CN111105844B (en) * 2019-11-22 2023-06-06 广州金域医学检验集团股份有限公司 Somatic cell mutation classification method, apparatus, device, and readable storage medium
CN111948394A (en) * 2020-08-10 2020-11-17 山西医科大学 Application of TSTA3 and LAMP2 as target objects in esophageal squamous cell carcinoma metastasis detection and drug screening
CN111948394B (en) * 2020-08-10 2023-07-28 山西医科大学 Application of TSTA3 and LAMP2 as targets in esophageal squamous carcinoma cell metastasis detection and drug screening
CN113684277A (en) * 2021-09-06 2021-11-23 南方医科大学南方医院 Method for predicting ovarian cancer homologous recombination defect based on biomarker of genome copy number variation and application
CN113684277B (en) * 2021-09-06 2022-05-17 南方医科大学南方医院 Method for predicting ovarian cancer homologous recombination defect based on biomarker of genome copy number variation and application
CN114093417B (en) * 2021-11-23 2022-10-04 深圳吉因加信息科技有限公司 Method and device for identifying chromosomal arm heterozygosity loss
CN114093417A (en) * 2021-11-23 2022-02-25 深圳基因家科技有限公司 Method and device for identifying chromosomal arm heterozygosity loss

Also Published As

Publication number Publication date
CN108485940B (en) 2022-01-28
CN108485940A (en) 2018-09-04
CN103374518B (en) 2018-03-27
CN204440396U (en) 2015-07-01
CN107435070A (en) 2017-12-05

Similar Documents

Publication Publication Date Title
CN204440396U (en) For determining the kit of fetus mark
US11875899B2 (en) Analyzing copy number variation in the detection of cancer
US11697846B2 (en) Detecting and classifying copy number variation
US20200219588A1 (en) Detecting and classifying copy number variation
EP2877594B1 (en) Detecting and classifying copy number variation in a fetal genome
US9411937B2 (en) Detecting and classifying copy number variation
US9323888B2 (en) Detecting and classifying copy number variation
KR102184868B1 (en) Using cell-free dna fragment size to determine copy number variations
EP3230469B1 (en) Using cell-free dna fragment size to determine copy number variations
CN103003447A (en) Method for determining the presence or absence of different aneuploidies in a sample
AU2019200163B2 (en) Detecting and classifying copy number variation
AU2019200162B2 (en) Detecting and classifying copy number variation
US20240203601A1 (en) Analyzing copy number variation in the detection of cancer

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1187363

Country of ref document: HK

C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: WD

Ref document number: 1187363

Country of ref document: HK