CN103003447B - Method for determining the presence or absence of different aneuploidies in a sample - Google Patents

Method for determining the presence or absence of different aneuploidies in a sample Download PDF

Info

Publication number
CN103003447B
CN103003447B CN201180022958.5A CN201180022958A CN103003447B CN 103003447 B CN103003447 B CN 103003447B CN 201180022958 A CN201180022958 A CN 201180022958A CN 103003447 B CN103003447 B CN 103003447B
Authority
CN
China
Prior art keywords
chromosome
chromosomes
interest
sequence
normalized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201180022958.5A
Other languages
Chinese (zh)
Other versions
CN103003447A (en
Inventor
里查德·P·拉瓦
大卫·A·康斯托克
布莱恩·K·利思
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Verinata Health Inc
Original Assignee
Verinata Health Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Verinata Health Inc filed Critical Verinata Health Inc
Publication of CN103003447A publication Critical patent/CN103003447A/en
Application granted granted Critical
Publication of CN103003447B publication Critical patent/CN103003447B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • C12Q1/6872Methods for sequencing involving mass spectrometry
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B10/00ICT specially adapted for evolutionary bioinformatics, e.g. phylogenetic tree construction or analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B99/00Subject matter not provided for in other groups of this subclass
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material

Abstract

The present invention provides a method for determining Copy Number Variation (CNV) of a sequence of interest in a test sample comprising a mixture of nucleic acids that are known or suspected to differ in the amount of one or more sequences of interest. The method includes a statistical approach that takes into account cumulative variability arising from process-related, inter-chromosomal, and inter-sequence variability. The method is applicable to determining CNVs of any fetal aneuploidy, as well as CNVs known or suspected to be associated with a variety of medical conditions. CNVs that can be determined according to the methods of the invention include trisomies and monosomies of any one or more of chromosomes 1-22, X and Y, other chromosomal polysomies, and deletions and/or duplications of fragments of any one or more of these chromosomes, which can be detected by sequencing the nucleic acids of a test sample only once. Any aneuploidy can be determined from sequencing information obtained by sequencing the nucleic acid of one test sample only once.

Description

Method for determining the presence or absence of different aneuploidies in a sample
Technical Field
The present invention relates generally to the field of diagnostics and provides methods for determining variations in the amount of nucleic acid sequences in a mixture of nucleic acids derived from different genomes. In particular, the method is suitable for performing non-invasive prenatal diagnosis, and for diagnosing and monitoring metastatic progression in cancer patients.
Background
One of the key efforts in human medical research is the discovery of genetic abnormalities that are critically important to poor health outcomes. In many cases, specific genes and/or key diagnostic markers have been identified in various parts of the genome, which are present in abnormal copy numbers. For example, in prenatal diagnosis, extra or missing copies of the entire chromosome are frequently occurring genetic lesions. In cancer, deletion or multiplication of copies of entire chromosomes or chromosome fragments, and higher levels of amplification of specific regions of the genome are common.
Much of the information about copy number variation has been provided by cytogenetic discrimination ability allowing identification of structural abnormalities. Various conventional procedures for genetic screening and biodosimetry have utilized invasive procedures (e.g., amniocentesis) to obtain cells for karyotyping. Recognizing the need for more rapid testing methods that do not require cell culture, Fluorescent In Situ Hybridization (FISH), quantitative fluorescent PCR (QF-PCR), and array-comparative genomic hybridization (array-CGH) have been developed as molecular cytogenetic methods for analyzing copy number variations.
The advent of techniques that allow sequencing of the entire genome in a shorter time, and the discovery of circulating cell-free dna (cfdna), has provided the opportunity to compare chromosomes derived from one chromosomal genetic material to be compared to another without the risks associated with invasive sampling processes. However, the limitations of existing methods (which include inadequate sensitivity out of limited levels of cfDNA) and sequencing bias out of techniques of inherent nature of genomic information determine the continuing need for noninvasive methods that will provide any or all of specificity, sensitivity, and applicability to reliably diagnose copy number changes in a variety of clinical settings.
The present invention fulfills some of the above needs and in particular offers an advantage in providing a reliable method that is at least suitable for performing non-invasive prenatal diagnosis and for diagnosing and monitoring metastatic progression in cancer patients.
Summary of The Invention
The present invention provides a method for determining copy number variations of a sequence of interest in a test sample comprising a mixture of nucleic acids that are known or suspected to differ in the amount of one or more sequences of interest. This method includes a statistical approach that takes into account cumulative variability from process-related, inter-chromosomal, and inter-sequencing process variability. The method is applicable to determining CNVs of any fetal aneuploidy, as well as a variety of CNVs known to be or suspected of being associated with a variety of medical conditions. CNVs that can be determined according to the present methods include trisomies or monosomies of any one or more of chromosomes 1-22, X and Y, polysomies of other chromosomes, and deletions and/or duplications of fragments of any one or more of these chromosomes, which can be detected by sequencing the nucleic acids of the test sample only once. Any aneuploidy can be determined from sequencing information obtained by sequencing only once of the nucleic acids of the test sample.
In one embodiment, a method is provided for determining the presence or absence of any four or more different, intact fetal chromosomal aneuploidies in a maternal test sample comprising fetal and maternal nucleic acids. The method comprises the following steps: (a) obtaining sequence information of fetal and maternal nucleic acids in a maternal test sample; (b) using the sequence information to identify a number of sequence tags for each of any four or more chromosomes of interest selected from chromosomes 1-22, X, and Y, and a number of sequence tags for a normalized chromosome sequence for each of the any four or more chromosomes of interest; (c) calculating a single chromosome dose for each of any four or more chromosomes of interest using the number of sequence tags identified for each of the any four or more chromosomes of interest and the number of sequence tags identified for each of the normalized chromosome sequences; and (d) comparing each said single chromosome dose for each of said any four or more chromosomes of interest to a threshold value for each of said any four or more chromosomes of interest, and thereby determining the presence or absence of any four or more intact, distinct fetal chromosomal aneuploidies in the maternal test sample. Step (a) may comprise sequencing at least a portion of the nucleic acids of a test sample to obtain said sequence information for fetal and maternal nucleic acid molecules of the test sample. In some embodiments, step (c) comprises calculating a single chromosome dose for each of said chromosomes of interest as a ratio of the number of sequence tags identified for each of said chromosomes of interest to the number of such sequence tags identified for said normalized chromosome sequence for each of said chromosomes of interest. In some other embodiments, step (c) comprises: (i) calculating a sequence tag density ratio for each of said chromosomes of interest by correlating the number of sequence tags identified in step (b) for each of said chromosomes of interest with the length of each of said chromosomes of interest; (ii) calculating a sequence tag density ratio for each of said normalized chromosome sequences by correlating the number of sequence tags identified in step (b) for each of said normalized chromosome sequences with the length of each of said normalized chromosome sequences; and (iii) calculating a single chromosome dose for each of said chromosomes of interest using the sequence tag density ratios calculated in steps (i) and (ii), wherein the chromosome dose is calculated as a ratio of the sequence tag density ratio for each of said chromosomes of interest to the sequence tag density ratio of said normalized chromosome sequence for each of said chromosomes of interest.
In another embodiment, a method is provided for determining the presence or absence of any four or more different, intact fetal chromosomal aneuploidies in a maternal test sample comprising fetal and maternal nucleic acids. The method comprises the following steps: (a) obtaining sequence information for fetal and maternal nucleic acids in a maternal test sample; (b) using the sequence information to identify a number of sequence tags for each of any four or more chromosomes of interest selected from chromosomes 1-22, X, and Y, and a number of sequence tags for one normalized chromosome sequence for each of the any four or more chromosomes of interest; (c) calculating a single chromosome dose for each of any four or more chromosomes of interest using the number of sequence tags identified for each of the any four or more chromosomes of interest and the number of sequence tags identified for each of the normalized chromosome sequences; and (d) comparing each said single chromosome dose for each of said any four or more chromosomes of interest to a threshold value for each of said any four or more chromosomes of interest, and thereby determining the presence or absence of any four or more complete, different fetal chromosomal aneuploidies in the maternal test sample, wherein any four or more chromosomes of interest selected from chromosomes 1-22, X, and Y comprise at least twenty chromosomes selected from chromosomes 1-22, X, and Y, and wherein the presence or absence of at least twenty different complete fetal chromosomal aneuploidies is determined. Step (a) may comprise sequencing at least a portion of the nucleic acids of the test sample to obtain said sequence information for the fetal and maternal nucleic acid molecules of the test sample. In some embodiments, step (c) comprises calculating a single chromosome dose for each of said chromosomes of interest as a ratio of the number of sequence tags identified for each of said chromosomes of interest to the number of such sequence tags identified for said normalized chromosome sequence for each of said chromosomes of interest. In some other embodiments, step (c) comprises: (i) calculating a sequence tag density ratio for each of said chromosomes of interest by correlating the number of sequence tags identified in step (b) for each of said chromosomes of interest with the length of each of said chromosomes of interest; (ii) calculating a sequence tag density ratio for each of said normalized chromosome sequences by correlating the number of sequence tags identified in step (b) for each of said normalized chromosome sequences with the length of each of said normalized chromosome sequences; and (iii) calculating a single chromosome dose for each of said chromosomes of interest using the sequence tag density ratios calculated in steps (i) and (ii), wherein said chromosome dose is calculated as a ratio of the sequence tag density ratio for each of said chromosomes of interest to the sequence tag density ratio of said normalized chromosome sequence for each of said chromosomes of interest. <
In another embodiment, a method is provided for determining the presence or absence of any four or more different, intact fetal chromosomal aneuploidies in a maternal test sample comprising fetal and maternal nucleic acids. The method comprises the following steps: (a) obtaining sequence information for the fetal and maternal nucleic acids in a maternal test sample; (b) using the sequence information to identify a number of sequence tags for each of any four or more chromosomes of interest selected from chromosomes 1-22, X, and Y, and a number of sequence tags for a normalized chromosome sequence for each of the any four or more chromosomes of interest; (c) calculating a single chromosome dose for each of any four or more chromosomes of interest using the number of sequence tags identified for each of the any four or more chromosomes of interest and the number of sequence tags identified for each of the normalized chromosome sequences; and (d) comparing each said single chromosome dose for each of any four or more chromosomes of interest to a threshold value for each of said any four or more chromosomes of interest and thereby determining the presence or absence of any four or more complete, different fetal chromosomal aneuploidies in said sample, wherein any four or more chromosomes of interest selected from chromosomes 1-22, X, and Y are all chromosomes 1-22, X, and Y, and wherein the presence or absence of complete fetal chromosomal aneuploidies for all chromosomes 1-22, X, and Y is determined. Step (a) may comprise sequencing at least a portion of the nucleic acids of the test sample to obtain said sequence information for the fetal and maternal nucleic acid molecules of the test sample. In some embodiments, step (c) comprises calculating a single chromosome dose for each of said chromosomes of interest as a ratio of the number of sequence tags identified for each of said chromosomes of interest to the number of such sequence tags identified for said normalized chromosome sequence for each of said chromosomes of interest. In some other embodiments, step (c) comprises: (i) calculating a sequence tag density ratio for each of said chromosomes of interest by correlating the number of sequence tags identified in step (b) for each of said chromosomes of interest with the length of each of said chromosomes of interest; (ii) calculating a sequence tag density ratio for each of said normalized chromosome sequences by correlating the number of sequence tags identified in step (b) for each of said normalized chromosome sequences with the length of each of said normalized chromosome sequences; and (iii) calculating a single chromosome dose for each of said chromosomes of interest using the sequence tag density ratios calculated in steps (i) and (ii), wherein the chromosome dose is calculated as a ratio of the sequence tag density ratio for each of said chromosomes of interest to the sequence tag density ratio of said normalized chromosome sequence for each of said chromosomes of interest.
In any of the above embodiments, the normalizing chromosome sequence is a single chromosome selected from chromosomes 1-22, X, and Y. Alternatively, the normalizing chromosomal sequence is a set of chromosomes selected from chromosomes 1-22, X, and Y.
In another embodiment, a method is provided for determining the presence or absence of any one or more different, intact fetal chromosomal aneuploidies in a maternal test sample comprising fetal and maternal nucleic acids. The method comprises the following steps: (a) obtaining sequence information for the fetal and maternal nucleic acids in a sample; (b) using the sequence information to identify a number of sequence tags for each of any one or more chromosomes of interest selected from chromosomes 1-22, X, and Y, and a number of sequence tags for a normalizing chromosome sequence for each of the any one or more chromosomes of interest; (c) calculating a single chromosome dose for each of the any one or more chromosomes of interest using the number of the sequence tags identified for each of the any one or more chromosomes of interest and the number of the sequence tags identified for each of the normalized fragment sequences; and (d) comparing said single chromosome dose for each of said any one or more chromosomes of interest to a threshold value for each of said one or more chromosomes of interest, and thereby determining the presence or absence of any one or more intact, distinct fetal chromosomal aneuploidies in said sample. Step (a) may comprise sequencing at least a portion of the nucleic acids of the test sample to obtain said sequence information for the fetal and maternal nucleic acid molecules of the test sample. In some embodiments, step (c) comprises calculating a single chromosome dose for each of said chromosomes of interest as a ratio of the number of sequence tags identified for each of said chromosomes of interest to the number of such sequence tags identified for said normalized chromosome sequence for each of said chromosomes of interest. In some other embodiments, step (c) comprises: (i) calculating a sequence tag density ratio for each of said chromosomes of interest by correlating the number of sequence tags identified in step (b) for each of said chromosomes of interest with the length of each of said chromosomes of interest; (ii) calculating a sequence tag density ratio for each of said normalized fragment sequences by correlating the number of sequence tags identified in step (b) for each of said normalized fragment sequences with the length of each of said normalized chromosomes; and (iii) calculating a single chromosome dose for each of the chromosomes of interest using the sequence tag density ratios calculated in steps (i) and (ii), wherein the chromosome dose is calculated as a ratio of the sequence tag density ratio for each of the chromosomes of interest and the sequence tag density ratio of the normalized fragment sequence for each of the chromosomes of interest.
In another embodiment, a method is provided for determining the presence or absence of any one or more different, intact fetal chromosomal aneuploidies in a maternal test sample comprising fetal and maternal nucleic acids. The method comprises the following steps: (a) obtaining sequence information for fetal and maternal nucleic acids in a sample; (b) using the sequence information to identify a number of sequence tags for each of any one or more chromosomes of interest selected from chromosomes 1-22, X, and Y, and a number of sequence tags for a normalizing chromosome sequence for each of the any one or more chromosomes of interest; (c) calculating a single chromosome dose for each of the any one or more chromosomes of interest using the number of the sequence tags identified for each of the any one or more chromosomes of interest and the number of the sequence tags identified for each of the normalized fragment sequences; and (d) comparing each said single chromosome dose for each of said any one or more chromosomes of interest to a threshold value for each of said any one or more chromosomes of interest, and thereby determining the presence or absence of one or more complete, different fetal chromosomal aneuploidies in said sample, wherein said any one or more chromosomes of interest selected from chromosomes 1-22, X, and Y comprises at least twenty chromosomes selected from chromosomes 1-22, X, and Y, and wherein the presence or absence of at least twenty different complete fetal chromosomal aneuploidies is determined. Step (a) may comprise sequencing at least a portion of the nucleic acids of the test sample to obtain said sequence information for the fetal and maternal nucleic acid molecules of the test sample. In some embodiments, step (c) comprises calculating a single chromosome dose for each of said chromosomes of interest as a ratio of the number of sequence tags identified for each of said chromosomes of interest to the number of such sequence tags identified for said normalized chromosome sequence for each of said chromosomes of interest. In some other embodiments, step (c) comprises: (i) calculating a sequence tag density ratio for each of said chromosomes of interest by correlating the number of sequence tags identified in step (b) for each of said chromosomes of interest with the length of each of said chromosomes of interest; (ii) calculating a sequence tag density ratio for each of said normalized fragment sequences by correlating the number of sequence tags identified in step (b) for each of said normalized fragment sequences with the length of each of said normalized chromosomes; and (iii) calculating a single chromosome dose for each of said chromosomes of interest using the sequence tag density ratios calculated in steps (i) and (ii), wherein said chromosome dose is calculated as a ratio of the sequence tag density ratio for each of said chromosomes of interest to the sequence tag density ratio of said normalized fragment sequence for each of said chromosomes of interest.
In another embodiment, a method is provided for determining the presence or absence of any one or more different, intact fetal chromosomal aneuploidies in a maternal test sample comprising fetal and maternal nucleic acids. The method comprises the following steps: (a) obtaining sequence information for fetal and maternal nucleic acids in a sample; (b) using the sequence information to identify a number of sequence tags for each of any one or more chromosomes of interest selected from chromosomes 1-22, X, and Y, and a number of sequence tags for a normalized fragment sequence for each of the any one or more chromosomes of interest; (c) calculating a single chromosome dose for each of the any one or more chromosomes of interest using the number of the sequence tags identified for each of the any one or more chromosomes of interest and the number of the sequence tags identified for each of the normalized fragment sequences; and (d) comparing each said single chromosome dose for each of said any one or more chromosomes of interest to a threshold value for each of said any one or more chromosomes of interest and determining therefrom the presence or absence of one or more intact, distinct fetal chromosomal aneuploidies in said sample, wherein said any one or more chromosomes of interest selected from the group consisting of chromosomes 1-22, X, and Y are all chromosomes 1-22, X, and Y, and wherein the presence or absence of intact fetal chromosomal aneuploidies for all chromosomes 1-22, X, and Y is determined. Step (a) may comprise sequencing at least a portion of the nucleic acids of the test sample to obtain said sequence information for fetal and maternal nucleic acid molecules of the test sample in some embodiments, step (c) comprises calculating a single chromosome dose for each said chromosome of interest as a ratio of the number of sequence tags identified for each said chromosome of interest to the number of sequence tags identified for said normalized chromosome sequence for each said chromosome of interest. In some other embodiments, step (c) comprises: (i) calculating a sequence tag density ratio for each of said chromosomes of interest by correlating the number of sequence tags identified in step (b) for each of said chromosomes of interest with the length of each of said chromosomes of interest; (ii) calculating a sequence tag density ratio for each of said normalized fragment sequences by correlating the number of sequence tags identified in step (b) for each of said normalized fragment sequences with the length of each of said normalized chromosomes; and (iii) calculating a single chromosome dose for each of said chromosomes of interest using the sequence tag density ratios calculated in steps (i) and (ii), wherein said chromosome dose is calculated as a ratio of the sequence tag density ratio for each of said chromosomes of interest to the sequence tag density ratio of said normalized fragment sequence for each of said chromosomes of interest.
In any of the above embodiments, the different whole chromosome aneuploidies are selected from the group consisting of a whole chromosome trisomy, a whole chromosome monosomy, and a whole chromosome polysomy. These different chromosomal aneuploidies are selected from the complete aneuploidies of any of chromosomes 1-22, X, and Y. For example, the different intact fetal chromosomal aneuploidies are selected from trisomy 2, trisomy 8, trisomy 9, trisomy 21, trisomy 13, trisomy 16, trisomy 18, trisomy 22, 47, XXY, 47, XXX, 47, XYY, and monosomy X.
In any of the above embodiments, steps (a) - (d) are repeated for test samples from different maternal subjects, and the method comprises determining the presence or absence of a chromosomal aneuploidy of any four or more different whole fetuses in each test sample.
In any of the above embodiments, the method may further comprise calculating a Normalized Chromosome Value (NCV), wherein said NCV correlates said chromosome dose to an average of corresponding chromosome doses in a set of qualifying samples as:
Figure BDA00002366985700091
wherein
Figure BDA00002366985700092
And
Figure BDA00002366985700093
corresponding is the estimated mean and standard deviation for the jth chromosome dose in a set of qualifying samples, and xijIs the jth chromosome dose observed for test sample i.
In another embodiment, a method is provided for determining the presence or absence of a distinct, partial fetal chromosomal aneuploidy in a maternal test sample comprising fetal and maternal nucleic acids. The method comprises the following steps: (a) obtaining sequence information for fetal and maternal nucleic acids in a sample; (b) identifying a number of sequence tags for any one or more segments of any one or more chromosomes of interest each selected from chromosomes 1-22, X, and Y and a number of sequence tags for the normalized fragment sequence of any one or more segments of each of said any one or more chromosomes of interest using said sequence information; (c) calculating a single chromosome dose for each of any one or more segments of any one or more chromosomes of interest using the number of sequence tags identified for any one or more segments of each of the any one or more chromosomes of interest and the number of sequence tags identified for each of the normalized segment sequences; and (d) comparing each of said single fragment doses for each of any one or more fragments of each of said any one or more chromosomes of interest to a threshold value for each of any one or more fragments of any of said any one or more chromosomes of interest, and thereby determining the presence or absence of one or more different, partial fetal chromosomal aneuploidies in said sample. Step (a) may comprise sequencing at least a portion of the nucleic acids of the test sample to obtain said sequence information for the fetal and maternal nucleic acid molecules of the test sample.
In some embodiments, step (c) comprises calculating for each of any one or more segments of any of said any one or more chromosomes of interest a single fragment dose as a ratio of the number of sequence tags identified for each of any one or more segments of any of said any one or more chromosomes of interest to the number of sequence tags identified for said normalized fragment sequence for any one or more segments of each of said any one or more chromosomes of interest. In some other embodiments, step (c) comprises: (i) calculating a sequence tag density ratio for each of said segments of interest by correlating the number of sequence tags identified in step (b) for each of said segments of interest with the length of each of said segments of interest; (ii) calculating a sequence tag density ratio for each of said normalized fragment sequences by correlating the number of sequence tags identified in step (b) for each of said normalized fragment sequences with the length of each of said normalized fragment sequences; and (iii) calculating a single chromosome dose for each of said segments of interest using the sequence tag density ratios calculated in steps (i) and (ii), wherein said segment dose is calculated as a ratio of the sequence tag density ratio for each of said segments of interest to the sequence tag density ratio of said normalized segment sequence for each of said segments of interest. The method may further comprise calculating a normalized fragment value (NSV), wherein said NSV correlates said fragment dose to an average of corresponding fragment doses in a set of qualifying samples as:
Figure BDA00002366985700101
wherein
Figure BDA00002366985700102
And
Figure BDA00002366985700103
correspondingly
Figure BDA00002366985700104
Is for in a groupEstimated mean and standard deviation of the jth fragment dose in a qualifying sample, and xijIs the observed jth fragment dose for test sample i. <
In various embodiments of the illustrated method, chromosome or fragment doses are thus determined using a normalized fragment sequence, which can be a single fragment of any one or more of chromosomes 1-22, X, and Y. Alternatively, such a normalized fragment sequence may be a set of fragments of any one or more of chromosomes 1-22, X, and Y.
Repeating steps (a) - (d) of the method for determining the presence or absence of a portion of a fetal chromosomal aneuploidy for a plurality of test samples from different maternal subjects, and the method comprises determining the presence or absence of a different, portion of a fetal chromosomal aneuploidy in each of said samples. The partial fetal chromosomal aneuploidy that can be determined according to the method includes aneuploidy of a portion of any fragment of any chromosome. The aneuploidy of these moieties may be selected from the group consisting of replication of the moiety, multiplication of the moiety, insertion of the moiety, and deletion of the moiety. Examples of partial aneuploidies that can be determined according to this method include partial monomers of chromosome 1, partial monomers of chromosome 4, partial monomers of chromosome 5, partial monomers of chromosome 7, partial monomers of chromosome 11, partial monomers of chromosome 15, partial monomers of chromosome 17, partial monomers of chromosome 18, and partial monomers of chromosome 22.
In any of the above embodiments, the test sample may be a maternal sample selected from the group consisting of blood, plasma, serum, urine and saliva samples. In any of these embodiments, the test sample may be a plasma sample. These nucleic acid molecules of the maternal sample are fetal and maternal cell-free DNA molecules. Next Generation Sequencing (NGS) can be used to sequence these nucleic acids. In some embodiments, sequencing is massively parallel sequencing using sequencing by synthesis with reversible dye terminators. In other embodiments, the sequencing is ligation sequencing. In still other embodiments, the sequencing is single molecule sequencing. Optionally, an amplification step is performed prior to sequencing.
In another embodiment, a method is provided for determining the presence or absence of any twenty or more different, intact fetal chromosomal aneuploidies in a maternal plasma test sample comprising a mixture of fetal and maternal cell-free DNA molecules. The method comprises the following steps: (a) sequencing at least a portion of the cell-free DNA molecules to obtain sequence information for fetal and maternal cell-free DNA molecules in the sample; (b) using the sequence information to identify a number of sequence tags for any twenty or more chromosomes of interest selected from chromosomes 1-22, X, and Y and to identify a number of sequence tags for a normalizing chromosome of the twenty or more chromosomes of interest; (c) calculating a single chromosome dose for each of the twenty or more chromosomes of interest using the number of the sequence tags identified for each of the twenty or more chromosomes of interest and the number of the sequence tags identified for each of the normalized chromosomes; and (d) comparing each of said single chromosome doses for each of said twenty or more chromosomes of interest to a threshold value for each of said twenty or more chromosomes of interest, and thereby determining the presence or absence of any twenty or more different, intact fetal chromosomal aneuploidies in said sample.
In another embodiment, the invention provides a method for identifying Copy Number Variations (CNVs) of a sequence of interest (e.g., a clinically relevant sequence) in a test sample, the method comprising the steps of: (a) obtaining a test sample and a plurality of qualified samples, said test sample comprising a test nucleic acid molecule and said plurality of qualified samples comprising a qualified nucleic acid molecule; (b) obtaining sequence information of the fetal and maternal nucleic acids in the sample; (c) calculating a qualified sequence dose for the qualified sequence of interest in each of the plurality of qualified samples based on the sequencing of the qualified nucleic acid molecules, wherein the calculating a qualified sequence dose comprises determining parameters for the qualified sequence of interest and at least one qualified normalized sequence; (d) identifying at least one qualified normalized sequence based on the qualified sequence dose, wherein the at least one qualified normalized sequence has a minimum variability and/or a maximum resolvable in the plurality of qualified samples; (e) calculating a test sequence dose for the test sequence of interest based on the sequencing of the nucleic acid molecule in the test sample, wherein the calculating a test sequence dose comprises determining parameters for the test sequence of interest and at least one normalized test sequence corresponding to the at least one qualified normalized sequence; (f) comparing the test sequence dose to at least one threshold; and (g) assessing the copy number variation of the sequence of interest in the test sample based on the results of step (f). In one embodiment, the number of sequence tags mapped to the qualified sequences of interest is correlated with the number of such tags mapped to the qualified normalized sequences for parameters of the qualified sequences of interest and at least one qualified normalized sequence, and wherein the parameters of the test sequences of interest and at least one normalized test sequence correlate the number of sequence tags mapped to the test sequences of interest with the number of tags mapped to the normalized test sequences. In some embodiments, step (b) comprises sequencing at least a portion of the qualified and tested nucleic acid molecules, wherein sequencing comprises providing mapped sequence tags for testing and a qualified sequence of interest, and normalizing sequences for at least one test and at least one qualified; sequencing at least a portion of the nucleic acid molecules of the test sample to obtain sequence information of fetal and maternal nucleic acid molecules of the test sample. In some embodiments, a next generation sequencing method is used to perform this sequencing step. In some embodiments, the sequencing method can be a massively parallel sequencing method, wherein the sequencing method uses sequencing by synthesis with reversible dye terminators. In other embodiments, the sequencing method is ligation sequencing. In some embodiments, sequencing comprises one amplification. In other embodiments, the sequencing is single molecule sequencing. The CNV of the sequence of interest is an aneuploidy, which may be a chromosomal or a partial aneuploidy. In some embodiments, the chromosomal aneuploidy is selected from trisomy 2, trisomy 8, trisomy 9, trisomy 16, trisomy 21, trisomy 13, trisomy 18, trisomy 22, 47, XXY, 47, XXX, 47, XYY, and monomer X. In other embodiments, the partial aneuploidy is a partial chromosome deletion or a partial chromosome insertion. In some embodiments, the CNV identified by the method is a chromosomal or partial aneuploidy associated with cancer. In some embodiments, these tested and qualified samples are biological fluid samples, such as: a plasma sample obtained from a pregnant subject (e.g., a pregnant human subject). In other embodiments, the tested and qualified biological fluid sample (e.g., a plasma sample) is obtained from a subject known or suspected to have cancer.
Although the examples herein relate to humans and the language is primarily directed to human problems, the concepts of the present invention are also applicable to genomes from any plant or animal.
Documents incorporated by reference
All patents, patent applications, and other publications (including all sequences disclosed within such references) mentioned herein are expressly incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference. All cited documents are incorporated herein by reference in relevant part. However, the citation of any document is not to be construed as an admission that it is prior art with respect to the present invention.
Brief description of the drawings
The novel features believed characteristic of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings of which:
fig. 1 is a flow chart of a method 100 for determining the presence or absence of copy number variation in a test sample comprising a mixture of nucleic acids.
Figure 2 shows the distribution of chromosome dose for chromosome 21 determined from sequencing cfDNA extracted from a set of 48 blood samples obtained from human subjects each carrying a male or female fetus. The dose for eligible (i.e.: normal for chromosome 21 (. smallcircle)) chromosome 21, and the trisomy 21 test sample are shown as (. DELTA.) for chromosomes 1-12 and X (FIG. 2A), and for chromosomes 1-22 and X (FIG. 2B).
Figure 3 shows the distribution of chromosome dose for chromosome 18 determined from sequencing cfDNA extracted from a set of 48 blood samples obtained from human subjects each carrying a male or female fetus. Test samples for eligible (i.e., normal for chromosome 18 (. smallcircle)) chromosome 18, and trisomy 18 (. DELTA.) are shown for chromosomes 1-12 and X (FIG. 3A) and for chromosomes 1-22 and X (FIG. 3B).
Figure 4 shows the distribution of chromosome dose for chromosome 13 determined from sequencing cfDNA extracted from a set of 48 blood samples obtained from human subjects each carrying a male or female fetus. Test samples for eligible (i.e., normal for chromosome 13 (. smallcircle)) chromosome 13, and trisomy 13 (. DELTA.) are shown for chromosomes 1-12 and X (FIG. 4A), and for chromosomes 1-22 and X (FIG. 4B).
Figure 5 shows the distribution of chromosome dose for chromosome X determined from sequencing cfDNA extracted from a set of 48 test blood samples obtained from human subjects each carrying a male or female fetus. Chromosome X doses, monosomy X (45, X; (+)), and complex karyotype (Cplx (X)) samples are shown for chromosomes 1-12 and X (FIG. 5A), and for chromosomes 1-22 and X (FIG. 5B) for males (46, XY; (. smallcircle)), females (46, XX; (. DELTA)).
Figure 6 shows the distribution of chromosome dose for chromosome Y determined from sequencing cfDNA extracted from a set of 48 test blood samples obtained from human subjects each carrying a male or female fetus. Chromosome Y dose, haplotype X (45, X; (+)), and complex karyotype (Cplx (X)) -samples for males (46, XY; (. DELTA)), females (46, XX; (. smallcircle)) are shown for chromosomes 1-12 (FIG. 6A), and for chromosomes 1-22 (FIG. 6B).
Figure 7 shows the Coefficient of Variation (CV) for chromosomes 21(■), 18(●) and 13 (a) determined for the doses shown in figures 2, 3 and 4 respectively.
Fig. 8 shows the Coefficient of Variation (CV) for chromosomes X (■) and Y (●) determined from the doses shown in fig. 5 and 6, respectively.
Fig. 9 shows the cumulative distribution of GC portions of the human chromosome. The vertical axis represents the frequency of chromosomes having GC contents lower than the values shown on the horizontal axis.
Figure 10 shows the sequence dose (Y-axis) for fragments of chromosome 11 (81000082-. A sample from a subject pregnant with a fetus with a partial aneuploidy of chromosome 11(°) was identified.
Fig. 11 shows the distribution of normalized chromosome dose for chromosome 21(a), chromosome 18(B), chromosome 13(C), chromosome x (d), and chromosome Y (e) relative to the standard deviation of the mean (Y-axis) of the corresponding chromosomes in unaffected samples.
Fig. 12 shows the normalized chromosome values for chromosomes 21(∘), 18(Δ), and 13(□) determined in samples from training set 1 using the normalized chromosomes as described in example 6.
Fig. 13 shows the normalized chromosome values for chromosomes 21(∘), 18(Δ), and 13(□) determined in the samples from test set 1 using the normalized chromosomes as described in example 6.
FIG. 14 shows the normalized chromosome values for chromosomes 21 (. smallcircle.) and 18 (. DELTA.) determined in the samples from test set 1 using the normalization method of Chiu et al (normalization of the number of sequence tags identified for the chromosome of interest with the number of sequence tags obtained for the remaining chromosomes in the sample, see example 7 elsewhere in this application).
Fig. 15 shows the normalized chromosome values for chromosomes 21(∘), 18(Δ), and 13(□) determined in the samples from training set 1 using the systematically determined normalized chromosomes (as described in example 7).
Figure 16 shows the normalized chromosome values for chromosomes 21 (o), 18(Δ), and 13(□) determined in samples from test set 1 using the systematically determined normalized chromosomes (as described in example 7).
Fig. 17 shows the normalized chromosome values for chromosome 9(∘) determined in samples from test set 1 using the systematically determined normalized chromosomes (as described in example 7).
FIG. 18 shows normalized chromosome values for chromosomes X (X-axis) and Y (Y-axis). Arrows point to 5 (fig. 18A) and 3 (fig. 18B) monosomic X samples identified in the training and test groups, respectively, as described in example 7.
Figure 19 shows the normalized chromosome values for chromosomes 1-22 determined in samples from test set 1 using the systematically determined normalized chromosomes (as described in example 7).
Detailed description of the invention
The present invention provides a method for determining Copy Number Variation (CNV) of sequences of interest in a test sample comprising a mixture of nucleic acids which are known or suspected to differ in the amount of one or more sequences of interest. Sequences of interest include genomic sequences ranging from kilobases (kb) to megabases (Mb) to complete chromosomes that are known or suspected to be associated with genetic or disease conditions. Examples of sequences of interest include chromosomes associated with well-known aneuploidies (e.g., trisomy 21) and fragments of chromosomes that are increased in diseases such as cancer, such as partial trisomy 8 in acute myeloid leukemia. CNV that can be determined according to the present methods include monosomy and trisomy of autosomes 1-22, and any one or more of sex chromosomes X and Y (e.g., 45, X, 47, XXX, 47, XXY and 47, XYY), other chromosomal polysomy, i.e., tetrasomy and pentasomy (including but not limited to XXXX, XXXXX, XXXXXY, and XYYY), and deletions and/or duplications of fragments of any one or more of these chromosomes.
The method includes a statistical method that takes into account cumulative variability derived from process-related, inter-chromosomal (same batch) and inter-sequencing-process (batch-to-batch) variability. The method is applicable to determining CNVs of any fetal aneuploidy, as well as CNVs known or suspected to be associated with a variety of medical conditions.
The practice of the present invention involves, unless otherwise indicated, conventional techniques commonly used in the fields of molecular biology, microbiology, protein purification, protein engineering, protein and DNA sequencing, and recombinant DNA, which are within the skill of the art. Such techniques are known to those of ordinary skill in the art and are described in numerous documents and reference works (see, e.g., Sambrook et al, "Molecular Cloning: A Laboratory Manual," third edition (Cold Spring Harbor)), [2001 ]); and Ausubel (Otsubel) et al, "Current Protocols in molecular Biology (latest molecular Biology Protocols Association)" [1987 ].
Numerical ranges include the numbers defining the range. It is intended that every maximum numerical limitation given throughout this specification includes every lower numerical limitation, as if such lower numerical limitations were expressly written herein. Every minimum numerical limitation given throughout this specification will include every higher numerical limitation, as if such higher numerical limitations were expressly written herein. Every numerical range given throughout this specification will include every narrower numerical range that falls within such broader numerical range, as if such narrower numerical ranges were all expressly written herein.
The headings provided herein are not limitations of the various aspects or embodiments of the invention which can be had by reference to the specification as a whole. Thus, as indicated above, the terms defined immediately below are more fully defined by reference to the specification as a whole.
Unless defined otherwise herein, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Various scientific dictionaries containing terms contained herein are well known and available to those skilled in the art. Although any methods and materials similar or equivalent to those described herein find use in the practice or testing of the present invention, only some preferred methods and materials are described. Accordingly, the terms defined immediately below are more fully described by reference to the specification as a whole. It is to be understood that this invention is not limited to the particular methodology, protocols, and reagents described, as these may vary, and as such are used in accordance therewith by those skilled in the art.
Definition of
As used herein, the singular terms "a", "an", and "the" include plural references unless the context clearly dictates otherwise. Unless otherwise indicated, nucleic acids are written left to right in the 5 'to 3' direction and amino acid sequences are written left to right in the amino to carboxy direction, respectively.
The term "assessing" herein means characterizing the state of a chromosomal aneuploidy with one of three types of judgment, (i.e., "normal", "affected", and "no judgment"). For example, in the presence of a trisomy, the "normal" determination is determined by the value of a parameter (e.g., a test chromosome dose below a user-defined reliability threshold), the "affected" determination is determined by a parameter (e.g., a test chromosome dose above a user-defined reliability threshold), and the "no-determination" result is determined by a parameter (e.g., a test chromosome dose between the user-defined reliability thresholds for making "normal" or "affected" determinations).
The term "copy number variation" as used herein refers to a change in the copy number of a nucleic acid sequence of 1kb or greater that is present in a test sample as compared to the copy number of a nucleic acid sequence present in a qualified sample. "copy number variant (variant)" refers to a sequence of nucleic acid in which a difference in copy number of 1kb or greater is found by comparing a sequence of interest in a test sample with a sequence present in a qualified sample. Copy number variants/variations include deletions (including microdeletions), insertions (including microinsertions), duplications, inversions, translocations, and complex multi-site variants. CNV encompasses both chromosomal and partial aneuploidies.
The term "aneuploidy" herein refers to an imbalance of genetic material caused by the acquisition or loss of an entire chromosome, or a portion of a chromosome.
The terms "chromosomal aneuploidy" and "intact chromosomal aneuploidy" herein refer to an imbalance in genetic material caused by the acquisition or loss of an entire chromosome, and include germline aneuploidy and chimeric aneuploidy.
The terms "partial aneuploidy" and "partial chromosomal aneuploidy" herein refer to an imbalance of genetic material resulting from the acquisition or loss of a portion of a chromosome (e.g., partial monosomy and partial trisomy), and encompass imbalances resulting from translocations, deletions, and insertions.
The term "aneuploidy sample" herein refers to a sample that indicates that the chromosome content of a subject is not aneuploid, i.e.: the sample indicates that a subject has an abnormal copy number of the chromosome.
The term "aneuploidy chromosome" as used herein refers to a chromosome that is known or determined to be present in a sample of abnormal copy number.
The term "plurality/plurality" is used herein to refer to a number of nucleic acid molecules or sequence tags sufficient to identify significant differences in copy number variation (e.g., chromosomal dose) in test samples and qualified samples using the methods of the invention in some embodiments, at least about 3 × 10 inclusive between 20 and 40bp reads is obtained for each test sample6At least about 5 × 106At least about 8 × 106At least about 10 × 10 of sequence tags6At least about 15 × 10 of sequence tags6At least about 20 × 10 of sequence tags6At least about 30 × 10 of sequence tags6At least about 40 × 10 of sequence tags6A sequence tag, or at least about 50 × 106And (4) sequence tags.
The terms "polynucleotide", "nucleic acid" and "nucleic acid molecule" are used interchangeably and refer to a covalently linked sequence of nucleotides (i.e., ribonucleotides of RNA and deoxyribonucleotides of DNA) in which the 3 'position of the pentose of one nucleotide is linked to the 5' position of the pentose of the next nucleotide by a phosphodiester group, which includes sequences of nucleic acids in any form, including but not limited to RNA, DNA and cfDNA molecules. The term "polynucleotide" includes, but is not limited to, single-stranded and double-stranded polynucleotides.
The term "portion" is used herein to refer to the amount of sequence information of fetal and maternal nucleic acid molecules in a biological sample, which amounts to less (< 1) sequence information of a genome-like group.
The term "test sample" as used herein refers to a sample comprising a mixture of nucleic acids comprising at least one nucleic acid sequence whose copy number is suspected of having been altered. The nucleic acid present in a test sample is referred to as "test nucleic acid".
The term "qualifying sample" as used herein refers to a sample comprising a mixture of nucleic acids that are present at a known copy number to which the nucleic acids in a test sample are compared and which is a normal sample, i.e., not aneuploidy for the sequence of interest, e.g., a qualifying sample for a normalized chromosome used to identify chromosome 21 is a sample that is not a trisomy 21 sample.
The term "training set" as used herein refers to a set of samples, which may include both affected and unaffected samples. Unaffected samples in the training set are used as qualifying samples to identify normalizing sequences, e.g., normalizing chromosomes, while the chromosome dose of unaffected samples is used to set a threshold for each of these sequences of interest (e.g., chromosomes). The affected samples in a training set can be used to verify that the affected test samples can be readily distinguished from unaffected samples.
The term "qualified nucleic acid" is used interchangeably with "qualified sequence," which is a test sequence or sequence to which a test nucleic acid is compared. A qualified sequence is a sequence that is preferably present in a biological sample in a known expression (i.e., the amount of qualified sequence is known). A "qualified sequence of interest" is one for which the amount in a qualified sample is known, and it is a sequence that is associated with a difference in sequence expression of an individual with a medical condition.
The term "sequence of interest" as used herein refers to a nucleic acid sequence that is associated with a difference in sequence expression in healthy versus diseased individuals. A sequence of interest may be a sequence on a chromosome that is misexpressed under disease or genetic conditions, i.e.: over-or under-expression. A sequence of interest may also be a part of a chromosome (i.e., a chromosome fragment), or a chromosome. For example, a sequence of interest may be a chromosome (which is overexpressed in the case of aneuploidy), or a gene (which encodes a tumor suppressor that is underexpressed in cancer). Sequences of interest include sequences that are over-or under-expressed in the total population or subpopulation of cells of a subject. A "qualified sequence of interest" is a sequence of interest in a qualified sample. A "test sequence of interest" is a sequence of interest in a test sample.
The term "normalized sequence" refers herein to a sequence that exhibits a variability in the number of sequence tags mapped to it over multiple samples and over multiple sequencing runs that best approximate the sequence of interest (for which it is used as a normalization parameter), and that is able to best distinguish an affected sample from one or more unaffected samples. "normalizing chromosomes" or "normalizing chromosome sequences" are examples of "normalizing sequences". A "normalized chromosomal sequence" may consist of a single chromosome or a set of chromosomes. A "normalized fragment" is another example of a "normalized sequence". A "normalizing fragment sequence" may consist of a single fragment of a chromosome, or it may consist of two or more fragments of the same or different chromosomes.
The term "distinguishability" as used herein refers to a characteristic of a normalized chromosome that enables it to distinguish one or more unaffected (i.e., normal) samples from one or more affected (i.e., aneuploidy) samples.
The term "sequence dose" as used herein refers to a parameter that relates the sequence tag density of a sequence of interest to the tag density of a normalized sequence. "test sequence dose" is a parameter that relates the sequence tag density of a sequence of interest (e.g., chromosome 21) to the sequence tag density of a normalized sequence (e.g., chromosome 9) determined in a test sample. Similarly, a "qualified sequence dose" is a parameter that relates the sequence tag density of a sequence of interest to the tag density of a normalized sequence determined in a qualified sample.
The term "sequence tag density" herein refers to the number of sequence reads that map to a reference genome sequence, e.g., a sequence tag density for chromosome 21 is the number of sequence reads generated by a sequencing method that map back to chromosome 21 of the reference genome. The term "sequence tag density ratio" herein refers to the ratio of the number of sequence tags mapped to a chromosome of a reference genome (e.g., chromosome 21) to the length of the reference genome chromosome 21.
The term "Next Generation Sequencing (NGS)" herein refers to a sequencing method that allows massively parallel sequencing of clonally amplified and individual nucleic acid molecules. Non-limiting examples of NGS include sequencing-by-synthesis using reversible dye terminators, and sequencing-by-ligation.
The term "parameter" refers herein to a numerical value characterizing a quantized data set, and/or a numerical relationship between quantized data sets. For example, the ratio (or a function of the ratio) between the number of sequence tags mapped to a chromosome and the length of the chromosome on which these tags are mapped is a parameter.
The terms "threshold" and "eligibility threshold" are used herein only as any number calculated using the culled data set and are used as a limit for the diagnosis of copy number variations (e.g., aneuploidy) in organisms. If the results obtained from practicing the invention exceed a threshold, then the subject can be diagnosed with a copy number variation, e.g., trisomy 21. Appropriate thresholds for the methods described herein can be identified by analyzing the normalized values (e.g., chromosome dose, NCV or NSV) calculated for a training set of samples. Thresholds may be identified using qualified (i.e., unaffected) samples in a training set that includes qualified (i.e., unaffected) samples and affected samples. These samples in the training set known to have a chromosomal aneuploidy (i.e., affected samples) can be used to confirm that the threshold values for these selections are useful in distinguishing affected samples from unaffected samples in the test set (see these examples herein). The selection of the threshold depends on the confidence level that the user wishes to make the classification. In some embodiments, the training set used to identify the appropriate threshold comprises at least 10, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 200, at least 300, at least 400, at least 500, at least 600, at least 700, at least 800, at least 900, at least 1000, at least 2000, at least 3000, at least 4000, or more qualified samples. It may be advantageous to use a larger set of qualified samples to improve the diagnostic usefulness of the threshold.
The term "normalization value" herein refers to a numerical value that correlates the number of sequence tags identified for a sequence of interest (e.g., a chromosome or chromosome fragment) with the number of sequence tags identified for a normalization sequence (e.g., a normalization chromosome or a normalization chromosome fragment). For example, the "normalized value" may be the chromosome dose as described elsewhere in the application, or it may be the NCV (normalized chromosome value) as described elsewhere in the application, or it may be the NSV (normalized fragment value) as described elsewhere in the application.
The term "read" refers to a DNA sequence of sufficient length (e.g., at least 30bp) to identify larger sequences or regions, e.g., to align and target a chromosome or a genomic region or a gene.
The term "sequence tag" is used interchangeably herein with the term "mapped sequence tag" and refers to a sequence read that has been exactly assigned (i.e., mapped) to a larger sequence (e.g., a reference genome) by alignment. The mapped sequence tags are uniquely mapped onto the reference genome, i.e., they are assigned to a single location for the reference genome. Tags that can be mapped to more than one location in the reference genome (i.e., tags that are not uniquely mapped) are not included in the analysis.
As used herein, the terms "aligned," "aligning," or "to align" refer to one or more sequences that are identified as a match, in terms of the order of their nucleic acid molecules, to known sequences from a reference genome. Such alignments can be performed manually or by computer algorithms, examples include an Efficient Local Alignment (ELAND) computer program of nucleotide data distributed as part of the Illumina genomics analysis suite. A match to read a sequence in an alignment can be 100% sequence match or less than 100% (non-perfect match).
As used herein, the term "reference genome" refers to any specific known genomic sequence (whether partial or complete) of any organism or virus that can be used to reference a recognized sequence from a subject. For example, a reference genome for a human subject, along with many other organisms, can be found in the National center for Biotechnology Information (National center for Biotechnology Information), at www.ncbi.nlm.nih.gov. "genome" refers to the complete genetic information of an organism or virus, which is expressed in a nucleic acid sequence.
The term "clinically relevant sequence" as used herein refers to a nucleic acid sequence that is known to be, or suspected of being, associated with or implicated in a genetic or disease condition. Determining the presence or absence of clinically relevant sequences can be useful in determining or confirming the diagnosis of a medical condition, or providing a prediction of the progression of a disease.
When the term "derived" is used in the context of a nucleic acid or a mixture of nucleic acids, it is meant herein the manner in which the nucleic acid or nucleic acids are obtained from the source from which the nucleic acid or nucleic acids originate. For example, in one embodiment, a mixture of nucleic acids derived from two different genomes means that these nucleic acids (e.g., cfDNA) are naturally released by the cell through naturally occurring processes such as necrosis or apoptosis. In another embodiment, a mixture of nucleic acids derived from two different genomes means that the nucleic acids are extracted from two different types of cells from a subject.
The term "mixed sample" as used herein refers to a sample containing a mixture of nucleic acids, which are derived from different genomes.
The term "maternal sample" as used herein refers to a biological sample obtained from a pregnant subject (e.g., a woman).
The term "biological fluid" herein refers to a liquid taken from a biological source and includes, for example, blood, serum, plasma, sputum, lavage, cerebrospinal fluid, urine, semen, sweat, tears, saliva, and the like. As used herein, the terms "blood," "plasma," and "serum" expressly encompass isolated or processed portions thereof. Similarly, when a sample is taken from a biopsy, swab, smear, etc., the "sample" expressly encompasses a separation or portion derived from a processing of the biopsy, swab, smear, etc.
The terms "maternal nucleic acid" and "fetal nucleic acid" herein refer to nucleic acid of a pregnant female subject and nucleic acid of a fetus carried by the pregnant female, respectively.
As used herein, the term "corresponding" refers to a nucleic acid (e.g., a gene or chromosome) that is present in the genomes of different subjects, and which does not necessarily have the same sequence in all genomes, but is used to provide the identity of a sequence of interest (e.g., a gene or chromosome) rather than genetic information.
As used herein, the term "substantially cell-free" encompasses preparations of a desired sample from which components normally associated therewith have been removed. Plasma is rendered essentially cell-free, for example, by removing cells normally associated therewith (e.g., red blood cells). In some embodiments, the substantially acellular sample is processed to remove cells that would otherwise constitute the desired genetic material of the CNV to be tested.
As used herein, the term "fetal portion" refers to the fetal nucleic acid portion present in a sample comprising fetal and maternal nucleic acids.
As used herein, the term "chromosome" refers to a genetic-bearing vector of living cells derived from chromatin and comprising DNA and protein components (in particular histones). The conventional, internationally recognized individual human genome chromosome numbering system is employed herein.
As used herein, the term "polynucleotide length" refers to the absolute number of nucleic acid molecules (nucleotides) in a sequence or in a region of a reference genome. The term "chromosomal length" refers to the known length of a chromosome given in base pairs, for example, on the world wide web, in genome. The chromosome length is provided in the NCBI36/hg18 module of the human chromosome found at hgsid 167155613& chromolnfopage.
The term "subject" herein refers to a human subject as well as a non-human subject, such as mammals, invertebrates, vertebrates, fungi, yeasts, bacteria, and viruses. Although the examples herein relate to humans and the language is primarily directed to the human problem, the concepts of the present invention are applicable to genomes from any plant or animal and are useful in the fields of veterinary medicine, animal science, research laboratories and the like.
The term "condition" is used herein as a broad term "medical condition" and includes all diseases and disorders, but may include [ injuries ] and normal health conditions, such as pregnancy, which may affect the health of a person, benefit from medical assistance, or affect medical treatment.
The term "intact" as used herein with respect to a chromosomal aneuploidy refers to the acquisition or loss of an intact chromosome.
The term "part" when used in reference to a chromosomal aneuploidy refers herein to obtaining or losing a portion of a chromosome.
The term "chimerism" as used herein means the presence of two populations of cells having different karyotypes in an individual developing from a single fertilized egg. Chimerism can arise from a mutation in the developmental process that is propagated to only a subset of adult cells.
The term "non-chimeric" as used herein refers to an organism that is composed of cells of a karyotype, such as a human fetus.
The term "using one chromosome" when used in relation to determining chromosome dosage refers herein to using the sequence information obtained for one chromosome, i.e. the number of sequence tags obtained for one chromosome.
The term "sensitivity" as used herein is equal to the number of true positives divided by the sum of true positives and false negatives.
The term "specificity" as used herein is equal to the number of true negatives divided by the sum of true negatives and false positives.
The term "patient sample" refers to a biological sample obtained from a patient (i.e., a recipient of medical attention, care, or treatment). Such patient sample may be any of the samples described herein. Preferably, such patient samples are obtained by a non-invasive procedure, such as a peripheral blood sample or a stool sample.
The term "hypodiploid" as used herein refers to a chromosome number that is one or more less than the normal number of monomers characteristic of a chromosome set for that species.
Description of the invention
The present invention provides a method for determining Copy Number Variation (CNV) of different sequences of interest in a test sample comprising a mixture of nucleic acids derived from two different genomes, and which nucleic acids are known or suspected to differ in the amount of one or more sequences of interest. Copy number variations determined by the methods of the invention include obtaining or losing intact chromosomes, alterations involving microscopic, very large chromosome fragments, and sub-microscopic copy number variations of large numbers of DNA fragments ranging in size from kilobases (kb) to megabases (Mb). The method includes a statistical approach that takes into account cumulative variability derived from process-related, inter-chromosomal (same batch), and inter-sequencing process variability. The method is applicable to determining CNV of any fetal aneuploidy and CNV are known or suspected to be associated with a variety of medical conditions. CNVs that can be determined according to the methods of the invention include trisomies and monosomies of any one or more of chromosomes 1-22, X and Y, other chromosomal polysomies, and indeed and/or duplications of fragments of any one or more of these chromosomes, which can be detected by only one sequencing of the nucleic acids of the test sample. Any aneuploidy that can be determined from sequencing information obtained by sequencing only once the nucleic acid of the test sample.
CNV in the human genome significantly affects human diversity and susceptibility to disease (Redon et al, Nature 23: 444-, science 320: 539-543[2008]. CNV from genomic rearrangements rose primarily due to deletion, replication, insertion, and unbalanced translocation events.
The methods described herein employ next generation sequencing technology (NGS), in which clonally amplified DNA templates or single DNA molecules are sequenced in a massively parallel fashion within a flow cell (as described, for example, in Volkering (Walkertin) et al, Clin chem (clinical chemistry) 55: 641-658[2009 ]; Metzker (Metzsch) M, Nature Rev (Nature review) 11: 31-46[2010 ]). In addition to high throughput sequence information, NGS provides quantitative information in which each sequence read is a calculable "sequence tag" that represents an individual cloned DNA template or a single DNA molecule. Sequencing techniques for NGS include pyrosequencing, sequencing by synthesis with reversible dye terminators, sequencing by oligonucleotide probe ligation, and ion semiconductor sequencing. DNA from individual samples can be sequenced individually (i.e., single item (single) sequencing), or DNA from multiple samples can be pooled together and sequenced as an index genomic molecule in a single sequencing run (i.e., multiplex sequencing) to generate reads of up to several billion DNA sequences. Examples of various sequencing techniques that can be used to obtain sequence information according to the methods of the present invention are described below.
Sequencing method
Some sequencing technologies are commercially available, such as the sequencing platform by hybridization from Affymetrix Inc. (Onfo, USA) (Sunnyvale, Calif.), and from 454Life Sciences (Bradford )) CT), the sequencing by synthesis platform of Illumina/Solexa (Hayward, CA) and Helicos Biosciences (cambridge, MA), and the sequencing by ligation platform from Applied Biosystems (forster city, CA), as described below. In addition to single molecule sequencing using sequencing-by-synthesis from Helicos Biosciences (helical Biosciences), other single molecule sequencing techniques include SMCT from Pacific BiosciencesTMTechnique, ion TorrentTMTechniques, and nanopore sequencing, for example, being developed by oxfor nonophore technologies (oxford nanopore technologies). While automated Sanger methods are considered "first generation" technologies, Sanger sequencing, including automated Sanger sequencing, can also be employed by the methods of the invention. Additional sequencing methods include nucleic acid imaging techniques such as Atomic Force Microscopy (AFM) or Transmission Electron Microscopy (TEM). Various exemplary sequencing techniques are described below.
In one embodiment, the method comprises the use of Helicos true single molecule sequencing (tSMS) technology (e.g., as in Harris T.D. et al, Science, 320: 106-]The single molecule sequencing technique of (1) obtains sequence information of nucleic acids in the test sample (e.g., cfDNA in a maternal sample). In the tSMS technique, one DNA sample is cut into strands of about 100 to 200 nucleotides, and a polyA sequence is added to the 3' end of each DNA strand. Each strand is labeled by the addition of a fluorescently labeled adenylate. These DNA strands are then hybridized to a flow cell, which contains millions of oligo T capture sites immobilized to the surface of the flow cell. These templates may be in the range of about 1 hundred million templates/cm2The density of (c). The flow cell is then loaded into an instrument, such as a HeliScopeTMThe sequencer and a laser illuminates the flow cell surface to reveal the position of each template. The CCD camera can map the position of these templates on the flow cell surface. The template fluorescent label is then cleaved and washed away. The sequencing reaction is started by introducing a DNA polymerase and a fluorescently labeled nucleotide. oligo-T nucleic acids were used as primers. The polymerase binds the labeled nucleotide in a template-directed mannerTo which the primer is ligated. The polymerase and unbound nucleotides are removed. These templates, which direct the binding of fluorescently labeled nucleotides, are distinguished by imaging the flow cell surface. After imaging, one cleavage step removes the fluorescent label and repeats the process with other fluorescently labeled nucleotides until the desired read length is reached. Sequence information was collected with each nucleotide addition step. Whole genome sequencing by single molecule sequencing techniques precludes PCR-based amplification in preparing sequencing libraries, and the immediacy of sample preparation allows for direct measurement of the sample, rather than measurement of a copy of the sample.
In another embodiment, the methods of the invention comprise using 454 sequencing (Roche, inc.) to obtain sequence information of nucleic acids in a test sample (e.g., cfDNA in a maternal test sample) (e.g., as described in Margulies, m.et al, Nature, 437: 376-. 454 sequencing involves two steps. In the first step, the DNA is cleaved into fragments of about 300-800 base pairs, and these fragments are blunt-ended. Oligonucleotide aptamers are then ligated to the ends of these fragments. These aptamers are used as primers for amplification and sequencing of these fragments. These fragments can be attached to DNA capture beads (e.g., streptavidin-coated beads) using, for example, aptamer B (which contains a 5' biotin tag). The fragments attached to these beads were PCR amplified within the droplets of the oil-water emulsion. The result is multiple copies of clonally amplified DNA fragments on each bead. In the second step, 3 binds nucleotides in direct proportion. Pyrosequencing utilizes pyrophosphate (PPi) released upon nucleotide addition. PPi is converted to ATP by ATP sulfurylase in the presence of adenosine 5' phosphosulfate. Luciferase uses ATP to convert luciferin to oxyluciferin and this reaction produces light that is measured and analyzed.
In another embodiment, the method comprises using SOLiDTMTechniques (Applied Biosystems, USA Applied Biosystems) to obtain sequence information for nucleic acids in a test sample (e.g., cfDNA in a maternal test sample)TMJoining methodIn sequencing, genomic DNA is cut into fragments and aptamers are attached to the 5 'and 3' ends of these fragments to create a pool of fragments. Alternatively, internal aptamers can be introduced by ligating aptamers to the 5 'and 3' ends of the fragments, delivering the fragments, digesting the delivered fragments to produce internal aptamers, and attaching aptamers to the 5 'and 3' ends of the generated fragments to produce a paired library. Next, a clonal bead population is prepared in microreactors containing beads, primers, templates, and PCR components. After PCR, the template is denatured and the beads are concentrated to isolate beads with extended template. The template on the selected beads undergoes 3' modification allowing binding to the slide. The sequence can be determined by sequentially hybridizing and ligating portions of the random oligonucleotide to centrally determined bases (or base pairs) identified by a particular fluorophore. After the color was recorded, the ligated oligonucleotide was cleaved and removed, and the process was then repeated.
In another embodiment, the method comprises the use of single molecule, real-time (SMRT) from Pacific BiosciencesTM) Sequencing techniques to obtain sequence information of nucleic acids in a test sample (e.g., cfDNA in a maternal test sample). In SMRT sequencing, the sequential binding of dye-labeled nucleotides during DNA synthesis is imaged. A single DNA polymerase molecule is attached to the bottom surface of an individual zero mode wavelength detector (ZMW detector) that obtains sequence information while the phosphate-linked nucleotides are incorporated into the growing primer strand. A ZMW is a closed structure that allows single nucleotide binding by a DNA polymerase to be observed in the background of fluorescent nucleotides that diffuse rapidly into and out of the ZMW (in microseconds). This takes several milliseconds to incorporate a nucleic acid into the growing strand. During this time, the fluorescent marker is excited and a fluorescent signal is generated, and the fluorescent label is cut off. Measurement of the corresponding fluorescence of the dye indicates which base is bound. The process is repeated.
In another embodiment, the methods of the invention comprise using nanopore sequencing to obtain sequence information of nucleic acids in a test sample (e.g., cfDNA in a maternal test sample) (e.g., as described in Soni GV and Meller a., Clin Chem (clinical chemistry) 53: 1996-2001[2007 ]). Nanopore sequencing DNA analysis techniques are being developed industrially by a number of companies, including Oxford nanopore technologies (Oxford nanopore technologies) (Oxford, uk). Nanopore sequencing is a single molecule sequencing technique whereby a single molecule of DNA is directly sequenced as it passes through a nanopore. A nanopore is a small hole, on the order of 1 nanometer in diameter. The nanopore is immersed in a conducting fluid and an electrical potential (voltage) is applied across it resulting in a slight current passing through the nanopore due to ionic conduction. The amount of current flowing is sensitive to the size and shape of the nanopore. As a DNA molecule passes through a nanopore, each nucleotide on the DNA molecule blocks the nanopore to a different degree, thereby varying the magnitude of the current passing through the nanopore to a different degree. Thus, this change in current as the DNA molecule passes through the nanopore represents one reading of the DNA sequence.
In another embodiment, the methods of the invention include using a chemically sensitive field effect transistor (chemFET) array to obtain sequence information of nucleic acids in a test sample (e.g., cfDNA in a parent test sample) (e.g., as described in U.S. patent application publication No. 20090026082). In one example of this technique, a DNA molecule may be placed in a reaction chamber and a template molecule may be hybridized to a sequencing primer that is bound to a polymerase. Binding of one or more triphosphates to a new nucleic acid strand at the 3' end of the sequencing primer can be discerned by a change in current with the chemFET. An array may have a plurality of chemFET sensors. In another example, single nucleotides can be attached to beads, and nucleic acids can be amplified on the beads, and individual beads can be transferred to individual reaction chambers on a chemFET array, wherein each chamber has a chemFET sensor, and the nucleic acids can be sequenced.
In another embodiment, the method comprises using the technique of Halcyon Molecular, which uses Transmission Electron Microscopy (TEM), to obtain sequence information of nucleic acids in a test sample (e.g., cfDNA in a maternal test sample). This method, known as single molecule Placement rapid nano Transfer (IMPRNT), involves imaging with a single atom resolution transmission electron microscope of high molecular weight (150kb or greater) DNA selectively labeled with heavy atom labels, and arranging these molecules on ultrathin membranes in ultra-high density (3nm chain-to-chain) parallel arrays with uniform base-to-base spacing. Molecules on the membrane were imaged using electron microscopy to determine the location of heavy atom markers and extract base sequence information from the DNA. This process is further described in PCT patent publication WO 2009/046445. The method allows sequencing of the entire human genome in less than ten minutes.
In another embodiment, the DNA sequencing technology is Ion Torrent single molecule sequencing, which pairs semiconductor technology with simple sequencing chemistry to translate chemically encoded information (A, C, G, T) directly into digital information (0, 1) on a semiconductor chip. In nature, when a nucleotide is bound to a DNA strand by a polymerase, a hydrogen ion is released as a by-product. Ion Torrent (Ion Torrent) uses a high density array of micro-machined holes to perform such biochemical processes in a massively parallel manner. Each well contains a different DNA molecule. Below the aperture is an ion sensitive layer and below it is an ion sensor. When a nucleotide (e.g., a C) is added to a DNA template and then incorporated into a DNA strand, a hydrogen ion will be released. The charge from the Ion will change the pH of the solution, which can be detected by Ion sensors of Ion Torrent (Ion Torrent). The sequencer-essentially the smallest solid state pH meter in the world-makes base calls, directly from chemical to digital. This Ion persona Machine (PGM)TM) The sequencer then sequentially floods the chip with nucleotides one after the other. If the next nucleotide to flood the chip is not a match. Then no voltage change will be recorded and the base will not be determined. If there are two on the DNA strandLike bases, the voltage will be doubled and the chip will record that two identical bases were determined. Direct detection allows recording of nucleotide binding in seconds.
In another embodiment, the method comprises using sequencing by hybridization to obtain sequence information of nucleic acids in the test sample (e.g., cfDNA in a maternal test sample). Sequencing by hybridization includes contacting a plurality of polynucleotide sequences with a plurality of polynucleotide probes, wherein each of the plurality of polynucleotide probes can optionally be tethered to a substrate. The substrate may be a planar surface comprising an array of known nucleotide sequences. Hybridization patterns to an array can be used to determine the sequence of polynucleotides present in a sample. In another embodiment, each probe is tethered to a bead, such as a magnetic bead or the like. Hybridization to the beads can be determined and can be used to identify the plurality of polynucleotide sequences within the sample.
In another embodiment, the method comprises obtaining sequence information of nucleic acids in a test sample (e.g., cfDNA in a maternal test sample) by massively parallel sequencing of millions of DNA fragments using synthetic sequencing and reversible terminator-based sequencing chemistry by Illumina corporation (e.g., as described in Bentley et al, Nature 6: 53-59[2009 ]). The template DNA may be genomic DNA, such as cfDNA. In some embodiments, genomic DNA from an isolated cell is used as a template and is fragmented into lengths of hundreds of base pairs. In other embodiments, cfDNA is used as a template, and fragmentation is not required, as cfDNA is present in short fragments. For example, fetal cfDNA circulates in the bloodstream as fragments that are about 170 base pairs (bp) in length (Fan et al, Clin Chem (clinical chemistry), 56: 1279-. The Illumina sequencing technology relies on attaching fragmented genomic DNA to a planar, optionally transparent surface to which oligonucleotide anchors are attached. The template DNA was end-repaired to produce a 5 'phosphorylated blunt end, and the polymerase activity of the Klenow fragment was used to add a single a base to the 3' end of the flat phosphorylated DNA fragment. This addition prepared DNA fragments for ligation to the oligonucleotide aptamers and had a single T base protruding at their 3' end to increase ligation efficiency. These aptamer oligonucleotides are complementary to the flow cell anchor. Aptamer-modified, single-stranded template DNA is added to the flow cell under limiting dilution conditions and immobilized to these anchors by hybridization. The attached DNA fragments were extended and bridge amplified to create ultra-high density flow cells with hundreds of millions of clusters, each containing about 1000 copies of the same template. In one embodiment, randomly fragmented genomic DNA (e.g., cfDNA) is amplified using PCR before undergoing cluster amplification. Alternatively, a library preparation of non-amplified genomes is used, and clustered amplification is used alone to enrich randomly fragmented genomic DNA (Kozarewa (Kozarawa, et al, Nature Methods (Nature methodology), 6: 291-295[2009 ]). these templates are sequenced using a robust four-color DNA synthesis sequencing technique employing reversible terminators with removable fluorescent dyes. Only reads uniquely mapped to the reference genome are counted. After the first reading is complete, the templates can be regenerated in situ to enable a second reading from the opposite end of the fragments. Thus, sequencing of either single-ended or paired ends of these DNA fragments can be used. Partial sequencing of the DNA fragments present in the sample is performed and the sequence tags mapped to known reference genomes comprising reads of a predetermined length (e.g. 36bp) are counted. In one embodiment, the reference genomic sequence is the NCBI36/hg18 sequence, which is on the world wide web, in genome. org & db & hg18& hgsid 166260105 are available. Alternatively, this reference genomic sequence is GRCh37/hg19, which is available on the world Wide Web in genome. Other sources of published sequence information include GenBank, dbEST, dbSTS, EMBL (European molecular biology laboratories), and DDBJ (Japanese DNA database). A variety of computer algorithms can be used to perform sequence alignments, including, without limitation: BLAST (Altschul et al, 1990), BLITZ (MPsrc) (Sturrock (Storoke) & Collins (Corlins), 1993), FASTA (Person) & Lipman (Ripman), 1988), BOWTIE (Langmead et al, genomic biology 10: R25.1-R25.10[2009]), or ELAND (Illumina, Inc., SanDiego (san Diego, Calif., USA). In one embodiment, one end of a clonally expanded copy of the plasma cfDNA molecules is sequenced and processed for Illumina genome analyzer by bioinformatics alignment, which uses efficient large scale alignment (ELAND) software of nucleotide databases.
In some embodiments of the methods described herein, the mapped sequence tags comprise sequence reads that are about 20bp, about 25bp, about 30bp, about 35bp, about 40bp, about 45bp, about 50bp, about 55bp, about 60bp, about 65bp, about 70bp, about 75bp, about 80bp, about 85bp, about 90bp, about 95bp, about 100bp, about 110bp, about 120bp, about 130bp, about 140bp, about 150bp, about 200bp, about 250bp, about 300bp, about 350bp, about 400bp, about 450bp, or about 500 bp. It is expected that technological advances will enable single-ended reads of greater than 500bp, which when generated paired-end reads enable reads of greater than about 1000 bp. In some embodiments, the mapped sequence tags comprise sequence reads that are 36 bp. Mapping of sequence tags is achieved by comparing tag sequences to reference sequences to determine the chromosomal origin of sequenced nucleic acid (e.g., cfDNA) molecules, and no specific genetic sequence information is required. A small degree of mismatch (0-2 mismatches per sequence tag) can be allowed to account for small polymorphisms that may exist between the reference genome and the genomes in the mixed sample.
Multiple sequence tags were obtained for each sample. In some embodiments, from mapping the reading toA reference genome for each sample to obtain at least about 3 × 10 comprising between 20 and 40bp reads (e.g., 36bp)6At least about 5 × 106At least about 8 × 106At least about 10 × 10 of sequence tags6At least about 15 × 10 of sequence tags6At least about 20 × 10 of sequence tags6At least about 30 × 10 of sequence tags6At least about 40 × 10 of sequence tags6A sequence tag, or at least about 50 × 106And (4) sequence tags. In one embodiment, all sequence reads are mapped to all regions of the reference genome. In one embodiment, tags that have been mapped to all regions of the reference genome (e.g., all chromosomes) are enumerated and the CNV (i.e., over-or under-expressed) of a sequence of interest (e.g., a chromosome or portion thereof) in the mixed DNA sample is determined. This method does not require a distinction between the two genomes.
The accuracy required to correctly determine whether a CNV is present or absent in a sample (e.g., aneuploidy) is predicted on the variation in the number of sequence tags mapped onto a reference genome in one sequencing run (inter-chromosomal variability) across samples and on the variation in the number of sequence tags mapped onto the reference genome in a different sequencing run (inter-chromosomal variability). For example, these variations may be particularly significant for tags mapped to GC-rich or GC-poor reference sequences. Other variations can be derived from the use of different protocols for extraction and purification of nucleic acids, preparation of sequencing libraries, and the use of different sequencing platforms. The present method uses sequence doses (chromosome doses, or fragment doses) based on knowledge of the normalized sequence (normalized chromosome sequence or normalized fragment sequence) to essentially account for cumulative variability derived from inter-chromosome (same batch), and inter-sequence (batch-to-batch) and platform-dependent variability. Chromosome dosage is based on knowledge of the normalized chromosome sequence, which may consist of a single chromosome, or two or more chromosomes selected from chromosomes 1-22, X, Y. Alternatively, the normalizing chromosomal sequence may consist of a single chromosomal segment, or of two or more segments of one chromosome or of two or more chromosomes. Fragment doses are based on knowledge of the normalized fragment sequence, and may consist of a single fragment from any one chromosome, or two or more fragments from any two or more of chromosomes 1-22, X, Y.
Determination of normalized sequences in qualified samples: normalizing chromosome sequences and normalizing fragment sequences
The normalized sequence is identified using a panel of qualified samples from subjects known to include a normal copy number with any sequence of interest (e.g., a chromosome or fragment thereof). The determination of the normalization sequence is outlined in steps 100, 120, 130, 140, and 145 of the method embodiment depicted in fig. 1. Sequence information obtained from qualified samples is also used to determine a statistically significant identification of chromosomal aneuploidies in the test samples (step 155 of fig. 1, and examples).
Fig. 1 provides a flow diagram of one embodiment of a method 100 of the present invention for determining a CNV of a sequence of interest (e.g., a chromosome or fragment thereof) in a biological sample. In some embodiments, a biological sample is obtained from a subject and the sample comprises a mixture of nucleic acids consisting of different genomes. Different genomes can be constructed from samples of two individuals, for example, from a fetus and a mother carrying the fetus. Alternatively, the genome may be composed of a sample of aneuploid cancer cells and normal euploid cells from the same subject (e.g., a plasma sample from a cancer patient).
A set of qualifying samples is obtained to identify a qualifying normalized sequence and to provide variance values for determining a statistically significant identification of CNVs in the test sample. In step 110, a plurality of biologically qualified samples are obtained from a plurality of subjects known to include cells having a normal copy number for any one sequence of interest. In one embodiment, a qualified sample is obtained from a maternal host carrying a fetus that has been confirmed to have a positive chromosome copy number using cytogenetic means. The biologically acceptable samples may be a biological fluid, such as plasma, or any suitable sample as described below. In some embodiments, one qualified sample contains a mixture of nucleic acid molecules (e.g., cfDNA molecules). In some embodiments, the qualified sample is a maternal plasma sample containing a mixture of fetal and maternal cfDNA molecules. Sequence information for the normalized chromosome and/or portions thereof is obtained by sequencing at least a portion of these nucleic acids (e.g., fetal and maternal nucleic acids) using any known sequencing method. Preferably, any of the Next Generation Sequencing (NGS) methods described elsewhere in this application are used to sequence fetal and maternal nucleic acids as single or clonally amplified molecules.
At step 120, at least a portion of each of all of the qualified nucleic acids contained within the qualified samples is sequenced to generate millions of sequence reads, e.g., 36bp reads, that are aligned with a reference genome (e.g., hg 18. in some embodiments, the sequence reads comprise about 20bp, about 25bp, about 30bp, about 35bp, about 40bp, about 45bp, about 50bp, about 55bp, about 60bp, about 65bp, about 70bp, about 75bp, about 80bp, about 85bp, about 90bp, about 95bp, about 100bp, about 110bp, about 120bp, about 130bp, about 140bp, about 150bp, about 200bp, about 250bp, about 300bp, about 350bp, about 400bp, about 450bp, or about 500 bp., advances in technology will enable single-ended reads of greater than 500bp to be made when matching reads are generated, the reads can be used for reads greater than about 1000bp6A qualified sequence tag of at least about 5 × 106A qualified sequence tag of at least about 8 × 106A qualified sequence tag of at least about 10 × 106A qualified sequence tag of at least about 15 × 106A qualified sequence tag of at least about 20 × 106Is qualifiedSequence tag, at least about 30 × 106A qualified sequence tag of at least about 40 × 106A qualified sequence tag, or at least about 50 × 106(ii) qualified sequence tags comprising reads between 20 and 40 bp.
At step 130, all tags from the nucleic acids in the sequencing-qualified sample are counted to determine a qualified sequence tag density. In one embodiment, the sequence tag density is determined as the number of qualified sequence tags that map onto the sequence of interest of the reference genome. In another embodiment, the qualified sequence tag density is the number of qualified sequence tags determined to map onto a sequence of interest that is normalized to the length of the qualified sequence to which they map. Sequence tag densities determined as a ratio of tag density relative to the length of the sequence of interest are referred to herein as tag density ratios. Normalization to the length of the sequence of interest is not required and can be included as a step to reduce the number of bits in a number to simplify it for human interpretation. Because all qualified sequence tags are mapped and counted in each qualified sample, the sequence tag densities for sequences of interest (e.g., clinically relevant sequences) in these qualified samples are determined, as are these sequence tag densities for additional sequences from which normalized sequences are subsequently identified.
In some embodiments, the sequence of interest is a chromosome associated with a full chromosomal aneuploidy, e.g., chromosome 21, and the qualified normalized sequence is the full chromosome not associated with a chromosomal aneuploidy, and its variation in the sequence tag best approximates that of the sequence of interest (i.e., chromosome) e.g., chromosome 21. Any one or more of chromosomes 1-22, X and Y can be a sequence of interest, and the chromosome or chromosomes can be identified as the normalizing sequence for each of any one of chromosomes 1-22, X, Y in the qualified sample. This normalization chromosome may be a separate chromosome, or it may be a set of chromosomes as described elsewhere in this application.
In another embodiment, the sequence of interest is a fragment of a chromosome associated with a partial aneuploidy (e.g., a chromosome deletion or insertion, or an unbalanced chromosome translocation) and the normalizing sequence is a fragment of a chromosome not associated with a partial aneuploidy, and its variation in sequence tag density best approximates that of the chromosome fragment associated with such partial aneuploidy. Any one or more of the fragments of any one or more of chromosomes 1-22, X, and Y can be a sequence of interest.
In all embodiments, whether a single sequence or a group of sequences is identified in the qualified samples as a normalized sequence for any one or more sequences of interest, the qualified normalized sequence has a variation in sequence tag density that best approximates the variation of the sequence of interest as determined in the qualified samples. For example, a qualified normalized sequence is the sequence with the least variability, i.e., the variability of the normalized sequence that is closest to the variability of the sequence of interest.
In some embodiments, the normalizing sequence is a sequence that best distinguishes one or more qualified samples from one or more affected samples, meaning that the normalizing sequence is the sequence with the greatest discernability, i.e., the discernability of the normalizing sequence is such that it provides the optimal distinction for the sequence of interest in an affected test sample, so as to readily distinguish the affected test sample from other unaffected samples. In other embodiments, the normalized sequence is a sequence with minimal variability and maximal resolvability. The level of distinguishability can be determined as the statistical difference between the sequence dose (e.g., chromosome dose or fragment dose) in a population of qualified samples and the chromosome dose or chromosome doses in one or more test samples, as described below and shown in these examples. For example, distinguishability can be expressed numerically as a T-test value representing the statistical difference between the chromosome dose(s) in a population of qualifying samples and the chromosome dose(s) in one or more test samples. Alternatively, discriminability can be expressed numerically as a Normalized Chromosome Value (NCV), which is the z-score for chromosome dose as long as the distribution of NCV is normal. Similarly, discriminability can be expressed numerically as a T-test value representing the statistical difference between the dose of fragments in a population of qualifying samples and the dose or doses of fragments in one or more test samples. Alternatively, the resolvability of the fragment dose can be expressed numerically as a normalized fragment value (NSV), which is the z-score for the chromosome dose as long as the distribution of NSV is normal. In determining z-scores, the mean and standard deviation of chromosomal or fragment doses in a set of qualifying samples can be used. Alternatively, the mean and standard deviation of the chromosomal or fragment dose in a training set comprising the qualifying and affected samples may be used. In other embodiments, the normalized sequence is a sequence with minimal variability and maximal resolvability.
The method identifies a plurality of sequences that inherently have similar characteristics and are susceptible to similar variations across multiple samples and sequence runs, and this is useful for determining sequence doses in test samples.
Determination of sequence dose (i.e., chromosome dose or fragment dose) in a qualified sample
At step 140, based on the calculated qualified tag densities, a qualified sequence dose (i.e., chromosome dose or fragment dose) for one sequence of interest is determined as the ratio of the sequence tag density for this sequence of interest to the qualified sequence tag density for the additional sequences from which the normalized sequence is subsequently identified at step 145. The identified normalized sequences are then used to determine the sequence dose in the test sample.
In one embodiment, the sequence dose in the qualifying samples is a chromosome dose calculated as the ratio of the number of sequence tags of a chromosome of interest to the number of sequence tags of a normalized chromosome sequence in a qualifying sample. The normalized chromosome sequence may be a single chromosome, a set of chromosomes, a fragment of a chromosome, or a set of fragments from different chromosomes. Thus, the chromosome dose for a chromosome of interest in a qualified sample is determined as: (i) the ratio of the number of tags for a chromosome of interest to the number of tags of a normalized chromosome sequence consisting of a single chromosome, (ii) the ratio of the number of tags for a chromosome of interest to the number of tags of a normalized chromosome sequence consisting of two or more chromosomes, or (iii) the ratio of the number of tags for a chromosome of interest to the number of tags of a normalized fragment sequence consisting of a single fragment of a chromosome, (iv) the ratio of the number of tags for a chromosome of interest to the number of tags of a normalized fragment sequence consisting of two or more fragments from a chromosome, or (v) the ratio of the number of tags for one chromosome of interest to the number of tags of a normalized fragment sequence consisting of two or more fragments of two or more chromosomes. Examples of chromosome dosages for determining a chromosome of interest according to (i) - (v) are as follows: the chromosome dose for a chromosome of interest (e.g., chromosome 21) is determined as a ratio (i) of the sequence tag density of chromosome 21 to the sequence tag density of each of the entire remaining chromosomes (i.e., chromosomes 1-20, chromosome 22, chromosome X, and chromosome Y); the chromosome dose for a chromosome of interest (e.g., chromosome 21) is determined as a ratio of the sequence tag density of chromosome 21 to the sequence tag density of all possible combinations of two or more remaining chromosomes (ii); the chromosome dose for a chromosome of interest (e.g., chromosome 21) is determined as the ratio (iii) of the sequence tag density of chromosome 21 to the sequence tag density of a fragment of another chromosome (e.g., chromosome 9); the chromosome dose for a chromosome of interest (e.g., chromosome 21) is determined as a ratio (iv) of the sequence tag density of chromosome 21 to the sequence tag density of two fragments of another chromosome (e.g., two fragments of chromosome 9); and the chromosome dose for a chromosome of interest (e.g., chromosome 21) is determined as a ratio of the sequence tag density of chromosome 21 to the sequence tag densities of two segments of two different chromosomes (e.g., a segment of chromosome 9 and a segment of chromosome 14).
In another embodiment, the sequence dose in the qualifying samples is a fragment dose calculated as the ratio of the number of sequence tags for a fragment of interest to the number of sequence tags of the normalized fragment sequences in the qualifying samples. The normalized fragment sequence may be a fragment of one chromosome, or a set of fragments from different chromosomes. Thus, in a qualified sample, the fragment dose for a fragment of interest is determined as the ratio of (i) the number of tags for a fragment of interest to the number of tags of a normalized fragment sequence composed of a single fragment of a chromosome, (ii) the number of tags for a fragment of interest to the number of tags of a normalized fragment sequence composed of two or more fragments of a chromosome, or (iii) the number of tags for a fragment of interest to the number of tags of a normalized fragment sequence composed of two or more fragments of two or more chromosomes.
The chromosome dose for one or more chromosomes of interest is determined in all of the qualifying samples and a normalized chromosome sequence is identified in step 145. Similarly, the fragment dose for one or more fragments of interest is determined in all qualifying samples, and a normalized fragment sequence is identified in step 145.
Identification of normalized sequences from doses of qualified sequences
At step 145, a normalized sequence is identified for the sequence of interest as the sequence based on the calculated sequence dose, i.e., the sequence results in the least variability in sequence dose across all qualifying samples for the sequence of interest. The method identifies sequences that inherently have similar characteristics and which are susceptible to similar variation across multiple samples and sequence runs, and this is useful for determining sequence dose in test samples.
Normalized sequences for one or more sequences of interest can be identified in a set of qualifying samples, and the sequences identified in these qualifying samples can then be used to calculate sequence doses for one or more sequences of interest in each test sample (step 150) to determine the presence or absence of aneuploidy in each test sample. The normalizing sequence identified for a chromosome or fragment of interest may differ when different sequencing platforms are used and/or when there is a difference in the purification of the nucleic acid to be sequenced and/or in the preparation of a sequencing library. The use of a normalised sequence according to the method of the present invention provides a specific and sensitive measure of variation in copy number of a chromosome or fragment thereof, irrespective of the sample preparation and/or sequencing platform used.
In some embodiments, more than one normalized sequence is identified, i.e., different normalized sequences can be determined for one sequence of interest, and multiple sequence doses can be determined for one sequence of interest. For example, when using the sequence tag density of chromosome 14, the variation (e.g., coefficient of variation) in chromosome dose for chromosome 21 of interest is minimal. However, two, three, four, five, six, seven, eight, or more normalized sequences can be identified for use in determining sequence doses for sequences of interest in a test sample. As an example, using chromosome 7, chromosome 9, chromosome 11, or chromosome 12 as the normalizing chromosome sequence, the second dose of chromosome 21 can be determined in any one of the test samples because all of these chromosomes have a CV that is close to the CV of chromosome 14 (see example 2, table 2). Preferably, when a single chromosome is selected as the normalizing chromosome sequence for a chromosome of interest, this normalizing chromosome will be the chromosome that results in the chromosome dose for the chromosome of interest having the least variability across all test samples (e.g., qualifying samples).
Normalizing chromosomal sequences as normalizing sequences for one or more chromosomes
In other embodiments, a normalizing chromosomal sequence may be a single sequence, or it may be a set of sequences. For example, in some embodiments, a normalizing sequence is a set of sequences, e.g., a set of chromosomes, identified as normalizing sequences for any one or more of chromosomes 1-22, X, and Y. The set of chromosomes that make up the normalizing sequence (i.e., the normalizing chromosome sequence) for the chromosome of interest can be a set of two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty-one, or twenty-two chromosomes, and includes or excludes one or both of chromosomes X and Y. The set of chromosomes identified as such a normalized chromosome sequence is the set of chromosomes that results in the chromosome dose for the chromosome of interest having the least variability across all test samples (i.e., qualifying samples). Preferably, individual or groups of chromosomes are tested together for their ability to optimally mimic the sequence of interest, for which they are selected as the normalizing chromosome sequence.
In one embodiment, the normalizing sequence for chromosome 21 is selected from chromosome 9, chromosome 1, chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 8, chromosome 10, chromosome 11, chromosome 12, chromosome 13, chromosome 14, chromosome 15, chromosome 16, and chromosome 17. In another embodiment, the normalizing sequence for chromosome 21 is selected from chromosome 9, chromosome 1, chromosome 2, chromosome 11, chromosome 12, and chromosome 14. Alternatively, the normalizing sequence for chromosome 21 is a set of chromosomes selected from chromosome 9, chromosome 1, chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 8, chromosome 10, chromosome 11, chromosome 12, chromosome 13, chromosome 14, chromosome 15, chromosome 16, and chromosome 17. In another embodiment, the set of chromosomes is a set selected from chromosome 9, chromosome 1, chromosome 2, chromosome 11, chromosome 12, and chromosome 14.
In some embodiments, the method is further improved by using a normalization sequence determined by systematically calculating the total chromosome dose using each chromosome individually and in all possible combinations with all remaining chromosomes (see example 7). For example, a systematically determined normalized chromosome can be determined for each chromosome of interest by using any of chromosomes 1-22, X, and Y, and a combination of two or more of chromosomes 1-22, X, and Y, to determine which individual or set of chromosomes is the normalized chromosome that results in the least variability in chromosome dose for the chromosome of interest across a set of qualifying samples, whereby the system calculates all possible chromosomes (see example 7). Thus, in one embodiment, the systematically calculated normalizing sequence for chromosome 21 is a set of chromosomes consisting of chromosome 4, chromosome 14, chromosome 16, chromosome 20, and chromosome 22. For all chromosomes in the genome, a single or set of chromosomes can be determined.
In one embodiment, the normalizing sequence for chromosome 18 is selected from chromosome 8, chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 9, chromosome 10, chromosome 11, chromosome 12, chromosome 13, and chromosome 14. Preferably, the normalizing sequence for chromosome 18 is selected from chromosome 8, chromosome 2, chromosome 3, chromosome 5, chromosome 6, chromosome 12, and chromosome 14. Alternatively, the normalizing sequence for chromosome 18 is a set of chromosomes selected from chromosome 8, chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 9, chromosome 10, chromosome 11, chromosome 12, chromosome 13, and chromosome 14. Preferably, the set of chromosomes is a set selected from chromosome 8, chromosome 2, chromosome 3, chromosome 5, chromosome 6, chromosome 12, and chromosome 14.
In another embodiment, the normalizing sequence for chromosome 18 (as explained elsewhere in this application) is determined by systematically calculating the total possible chromosome doses using each possible normalizing chromosome individually and in total possible combinations of normalizing chromosomes. Thus, in one embodiment, the normalizing sequence for chromosome 18 is a normalizing chromosome consisting of a set of chromosomes, the set including chromosome 2, chromosome 3, chromosome 5, and chromosome 7.
In one embodiment, the normalizing sequence for chromosome X is selected from chromosome 1, chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 8, chromosome 9, chromosome 10, chromosome 11, chromosome 12, chromosome 13, chromosome 14, chromosome 15, and chromosome 16. Preferably, the normalizing sequence for chromosome X is selected from chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, and chromosome 8. Alternatively, the normalizing sequence for chromosome X is a set of chromosomes selected from chromosome 1, chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 8, chromosome 9, chromosome 10, chromosome 11, chromosome 12, chromosome 13, chromosome 14, chromosome 15, and chromosome 16. Preferably, the set of chromosomes is a set selected from chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, and chromosome 8.
In another embodiment, the normalizing sequence for chromosome X (as explained elsewhere in this application) is determined by systematically calculating the total possible chromosome doses using each possible normalizing chromosome individually and in total possible combinations of normalizing chromosomes. Thus, in one embodiment, the normalizing sequence for chromosome X is a normalizing chromosome consisting of the set of chromosome 4 and chromosome 8.
In one embodiment, the normalizing sequence for chromosome 13 is one chromosome selected from chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 8, chromosome 9, chromosome 10, chromosome 11, chromosome 12, chromosome 14, chromosome 18, and chromosome 21. Preferably, the normalizing sequence for chromosome 13 is one chromosome selected from chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, and chromosome 8. In another embodiment, the normalizing sequence for chromosome 13 is a set of chromosomes selected from chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 8, chromosome 9, chromosome 10, chromosome 11, chromosome 12, chromosome 14, chromosome 18, and chromosome 21. Preferably, the set of chromosomes is a set selected from chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, and chromosome 8.
In another embodiment, the normalizing sequence for chromosome 13 is determined by systematically calculating all possible chromosome doses using each possible normalizing chromosome individually and all possible combinations of normalizing chromosomes (as explained elsewhere in this application). Thus, in one embodiment, the normalizing sequence for chromosome 13 is a normalizing chromosome of a set comprising chromosome 4 and chromosome 5. In another embodiment, the normalizing sequence for chromosome 13 is a normalizing chromosome consisting of the set of chromosome 4 and chromosome 5.
The variation in chromosome dose for chromosome Y is greater than 30, independent of which normalization chromosome is used in determining chromosome Y dose. Thus, a set of two or more chromosomes selected from chromosomes 1-22 and chromosome X can be used as a normalizing sequence for chromosome Y. In one embodiment, the at least one normalizing chromosome is a set of chromosomes consisting of chromosomes 1-22, and chromosome X. In another embodiment, the set of chromosomes consists of chromosome 2, chromosome 3, chromosome 4, chromosome 5, and chromosome 6.
In another embodiment, the normalizing sequence for chromosome Y is determined (as explained elsewhere in this application) by systematically calculating the total possible chromosome doses using each possible normalizing chromosome individually and in total possible combinations of normalizing chromosomes. Thus, in one embodiment, the normalizing sequence for chromosome Y is a normalizing chromosome comprising a set of chromosomes consisting of chromosome 4 and chromosome 6. In another embodiment, the normalizing sequence for chromosome Y is a normalizing chromosome comprised of a set of chromosomes consisting of chromosome 4 and chromosome 6.
The normalizing sequence used to calculate the dose for different chromosomes of interest or different fragments of interest may be the same or it may be a different normalizing sequence for different chromosomes or fragments, respectively. For example, the normalizing sequence for chromosome a of interest, e.g., a normalizing chromosome(s), may be the same, or it may be different from the normalizing sequence for chromosome B of interest, e.g., a normalizing chromosome(s).
The normalizing sequence for a complete chromosome may be a complete chromosome or a set of complete chromosomes, or it may be a fragment of a chromosome, or a set of fragments of one or more chromosomes.
Normalized fragment sequences as normalized sequences for one or more chromosomes
In another embodiment, the normalizing sequence for a chromosome may be a normalizing fragment sequence. This normalized fragment sequence may be a single fragment, or it may be a set of fragments of one chromosome, or they may be multiple fragments from two or more different chromosomes. A normalized fragment sequence can be determined by systematic calculation of all combinations of fragment sequences in the genome. For example, the normalized fragment for chromosome 21 can be a single fragment of a size greater or less than chromosome 2, which is about 47Mbp (megabase pairs) from chromosome 9, which is about 140Mbp for chromosome 9. Alternatively, the normalizing sequence for chromosome 21 may be a combination of the sequence from chromosome 1 and the sequence from chromosome 12.
In one embodiment, the normalizing sequence for chromosome 21 is a sequence of a fragment or a set of two or more fragments of chromosomes 1-20, 22, X, and Y. In another embodiment, the normalizing sequence for chromosome 18 is a fragment or sets of fragments of chromosomes 1-17, 19-22, X, and Y. In another embodiment, the normalizing sequence for chromosome 13 is a fragment or sets of fragments of chromosomes 1-12, 14-22, X, and Y. In another embodiment, the normalizing sequence for chromosome X is a fragment or sets of fragments of chromosomes 1-22, and Y. In another embodiment, the normalizing sequence for chromosome Y is a fragment or set of fragments of chromosomes 1-22, and X. The normalized sequences of single or multiple sets of fragments can be determined for all chromosomes in a genome. Two or more segments of the normalized fragment sequence may be segments from one chromosome, or the two or more segments may be segments of two or more different chromosomes. As illustrated for the normalized chromosome sequence, one normalized fragment sequence may be the same for two or more different chromosomes.
Normalized fragment sequences as normalized sequences for one or more chromosome fragments
When the sequence of interest is a fragment of a chromosome, the presence or absence of CNV of the sequence of interest can be determined. Variations in the copy number of the chromosome fragments allow the determination of the presence or absence of a partial chromosomal aneuploidy. Described below are examples of partial chromosomal aneuploidies associated with different fetal abnormalities and conditions. The fragments of the chromosome may be of any length. For example, it may range from kilobases to hundreds of megabases. The human genome, which comprises only more than 30 hundred million DNA bases, can be divided into tens, thousands, hundreds of thousands and millions of fragments of varying sizes, the number of copies of which can be determined according to the method of the present invention. A chromosome fragment normalizing sequence is a normalizing fragment sequence that can be a single fragment from any of chromosomes 1-22, X, and Y, or it can be a set of fragments from any of chromosomes 1-22, X, and Y.
A normalized sequence for a fragment of interest is a sequence that has variability across multiple chromosomes and across multiple samples that is closest to the variability of the fragment of interest. Where the normalizing sequence is a set of fragments of any one or more of chromosomes 1-22, X and Y, the determination of the normalizing sequence can be made as described for determining the normalizing sequence for the chromosome of interest. By calculating the fragment dose using one and all possible combinations of two or more fragments as normalized sequences for the fragment of interest in each sample of a set of qualifying samples (i.e., samples that are known to be diploid for the fragment of interest), the normalized fragment sequence for one or a set of fragments can be identified and this normalized sequence determined to be the normalized sequence that provides a fragment dose with the lowest variability for this fragment of interest across all qualifying samples, as explained above for the normalized chromosomal sequence.
For example, it is 1Mb (megabase) for the fragment of interest, and the remaining 3 million fragments in the approximately 3Gb human genome (minus 1mg of fragment of interest) can be used alone or in combination with each other to calculate the fragment dose for the fragment of interest in a qualified set of samples to determine which fragment or fragments will serve as the normalized fragment sequence for the qualified and tested samples. The fragments of interest can vary from about 1000 bases to tens of millions of bases. The normalized fragment sequence may be composed of one or more fragments of the same size as the sequence of interest. In other embodiments, the normalized fragment sequence may be comprised of fragments that differ from the sequence of interest, and/or from each other. For example, a normalized sequence for a sequence of 10,000 bases in length can be 20,000 bases in length and can include combinations of sequences of different lengths, e.g., at 7,000+8,000+5,000 bases. As explained elsewhere in this application for the normalized chromosome sequences, the normalized fragment sequences can be determined (as explained elsewhere in this application) by systematically calculating all possible chromosome and/or fragment doses using each possible normalized chromosome fragment, independently and in all possible combinations of normalized fragments. For all fragments and/or chromosomes in the genome, individual or groups of fragments can be determined.
The normalizing sequence used to calculate the dose for the different chromosome segments of interest may be the same, or it may be a different normalizing sequence for different chromosome segments of interest. For example, the normalizing sequence for chromosome segment a of interest, e.g., a normalizing segment(s), may be the same, or it may be different from the normalizing sequence for chromosome segment B of interest, e.g., a normalizing segment(s).
Determination of aneuploidy in test samples
A sequence dose is determined for a sequence of interest in a test sample based on one or more normalization sequences identified in a qualified sample, the sample comprising a mixture of nucleic acids derived from genomes that differ in one or more sequences of interest.
At step 115, a test sample is obtained from a subject suspected or known to carry a clinically relevant CNV of the sequence of interest. The test sample may be a biological fluid (e.g. plasma) or any suitable sample as described below. In some embodiments, the test sample contains a mixture of nucleic acid molecules (e.g., cfDNA molecules). In some embodiments, the test sample is a maternal plasma sample containing a mixture of fetal and maternal cfDNA molecules.
At step 125, at least a portion of the test nucleic acids in the test sample are sequenced to generate millions of sequence reads (e.g., 36bp reads), as illustrated for a qualified sample, reads generated from sequencing the nucleic acids in the test sample are uniquely mapped onto a reference genome, as in step 120, at least about 3 × 10 is obtained from reads that uniquely map the reference genome, as in step 1206A qualified sequence tag of at least about 5 × 106A qualified sequence tag of at least about 8 × 106A qualified sequence tag of at least about 10 × 106A qualified sequence tag of at least about 15 × 106A qualified sequence tag of at least about 20 × 106A qualified sequence tag of at least about 30 × 106A qualified sequence tag of at least about 40 × 106A qualified sequence tag, or at least about 50 × 106Qualified sequence tags comprising reads between 20 and 40 bp.
At step 135, all tags resulting from sequencing of nucleic acids in the test sample are counted to determine the test sequence tag density. In one embodiment, the number of sequence tags mapped to a sequence of interest is normalized to the known length of a sequence of interest to which they are mapped to provide a test sequence tag density ratio. As described for these qualifying samples, normalization to a known length of a sequence of interest is not necessarily required, and this may be included as a step to reduce the number of digits in a number to simplify it for human interpretation. As all mapped test sequence tags are counted in the test samples, the sequence tag density for sequences of interest (e.g., clinically relevant sequences) in the test samples is determined, as is the sequence tag density for additional sequences corresponding to at least one normalized sequence identified in the qualifying samples.
At step 150, based on the identification of at least one normalized sequence in the qualified samples, the relevant test sequence dose is determined for one of the sequences of interest in the test sample. As explained elsewhere in this application, the at least one normalizing sequence may be a single sequence or a set of sequences. The sequence dose for a sequence of interest in a test sample is a ratio of the sequence tag density determined for the sequence of interest in the test sample to the sequence tag density of at least one normalized sequence determined in the test sample, wherein the normalized sequence in the test sample corresponds to the normalized sequence identified for the particular sequence of interest in the qualifying samples. For example, if the normalized sequence identified for chromosome 21 in the qualifying samples is determined to be a chromosome (e.g., chromosome 14), then the test sequence dose for chromosome 21 (the sequence of interest) is determined as a ratio of the sequence tag density for chromosome 21 to the sequence tag density for chromosome 14, each determined in the test sample. Similarly, chromosome doses were determined for chromosome 13, 18, X, Y, and other chromosomes associated with various chromosomal aneuploidies. The normalizing sequence for the chromosome of interest may be one or a set of chromosomes, or one or a set of chromosome fragments. As described above, a sequence of interest can be a portion of a chromosome, such as a chromosome fragment. Thus, the dose for a chromosome fragment can be determined as the ratio of the sequence tag density determined for this fragment in the test sample to the sequence tag density for the normalized chromosome fragment in the test sample, where the normalized fragment in the test sample corresponds to the normalized fragment(s) (single or set of fragments) identified for the particular fragment of interest in the qualifying samples. Chromosome fragments can range in size from kilobases (kb) to megabases (Mb).
At step 155, a plurality of thresholds are derived from the standard deviation established for the qualified sequence doses determined for the plurality of qualified samples and the sequence doses determined for samples known to be aneuploid to the sequence of interest. The exact classification depends on the difference between the probability distributions for different classes (i.e., aneuploidy types). Preferably, multiple thresholds are selected from the empirical distribution for each type of aneuploidy (e.g., trisomy 21). As described in the examples, the classification used to classify trisomy 13, trisomy 18, trisomy 21, and monosomy X aneuploidy establishes possible thresholds that illustrate the use of the method for determining chromosomal aneuploidy by sequencing cfDNA extracted from a maternal sample that includes a mixture of fetal and maternal nucleic acids. The threshold determined for distinguishing samples affected for a chromosome aneuploidy may be the same or different from the threshold determined for distinguishing samples affected for a different aneuploidy. As shown in these examples, the threshold for each chromosome of interest is determined from variability in the dose of chromosomes of interest across multiple samples and multiple sequencing runs. The smaller the variability of chromosome dose for any chromosome of interest, the narrower the dispersion in dose for chromosomes of interest across all unaffected samples that were used to set the thresholds for determining different aneuploidies.
At step 160, copy number variations of the sequence of interest are determined in the test sample by comparing the test sequence dose for the sequence of interest to at least one threshold established from the qualifying sample doses.
At step 165, the calculated dose for the test sequence of interest is compared to the dose set as thresholds selected according to a user-defined reliability threshold, thereby classifying the sample as "normal", "affected" or "no call". These "no-decision" samples are samples for which a reliable, definitive diagnosis cannot be made.
Another embodiment of the invention provides a method for providing prenatal diagnosis of fetal aneuploidy in a biological sample comprising fetal and maternal nucleic acid molecules. This diagnosis is made based on the following steps: obtaining sequence information for sequencing at least a portion of a mixture of fetal and maternal nucleic acid molecules derived from a biological test sample (e.g., a maternal plasma sample); calculating from the sequencing data a normalized chromosome dose for one or more chromosomes of interest, and/or a normalized fragment dose for one or more fragments of interest; and determining a statistically significant difference between the chromosome dose and/or fragment dose for the chromosome of interest and/or fragment dose for the fragment of interest in the test sample, respectively, and a threshold established in qualified (normal) samples, and providing a prenatal diagnosis based on the statistical difference. A normal or affected diagnosis is made as described in step 165 of the method. In the event that a normal or affected diagnosis cannot be made with confidence, a "no decision" is provided.
Sample (I)
Samples for determining CNVs (e.g., chromosomal and partial aneuploidy) include nucleic acids present in cells or "cell-free" nucleic acids. In some embodiments of the invention, it is advantageous to obtain cell-free nucleic acid, e.g., cell-free dna (cfdna). Cell-free nucleic acids, including cell-free DNA, can be obtained from biological samples including, but not limited to, plasma and serum by various methods known in the art (Chen et al, Nature Med. (Nature medicine), 2: 1033-. For the isolation of cell-free DNA from cells, fractionation, centrifugation (e.g.density gradient centrifugation), DNA-specific precipitation (DNA-specific precipitation), or high-throughput cell sorting and/or separation methods may be used.
The sample comprising the nucleic acid mixture to which the method described herein is applied is a biological sample, such as a tissue sample, a biological fluid sample, or a cell sample. In some embodiments, the nucleic acid mixture is purified or isolated from the biological sample by any one of known methods. A sample may consist of purified or isolated polynucleotides, or it may comprise a biological sample, such as a tissue sample, a biological fluid sample, or a cell sample. Biological fluids include, as non-limiting examples, blood, plasma, serum, sweat, tears, sputum, urine, sputum, ear flow (ear flow), lymph, saliva, cerebrospinal fluid, lavage (ravages), bone marrow suspension (bone marrow subspension), vaginal flow (vaginal flow), transcervical lavage, cerebral fluid, ascites, milk, secretions of the respiratory, intestinal and genitourinary tracts, amniotic fluid and leukapheresis samples. In some embodiments, such a sample is one that is readily obtainable by a non-invasive process, such as blood, plasma, serum, sweat, tears, sputum, urine, sputum, ear exudates, saliva, or stool. Preferably, such biological sample is a peripheral blood sample, or a plasma or serum fraction. In other embodiments, such a biological sample is a cotton swab or smear, a biopsy specimen, or a cell culture. In another embodiment, such a sample is a mixture of two or more biological samples, for example a biological sample may include two or more biological fluid samples, tissue samples, and cell culture samples. As used herein, the terms "blood", "plasma" and "serum" expressly encompass fractionated or processed portions thereof. Similarly, when a sample is taken from a biopsy, swab, smear, etc., the "sample" expressly encompasses the isolated portion or portion derived from the processing of such biopsy, swab, smear, etc.
In some embodiments, samples may be obtained from multiple sources, including, but not limited to, samples from different individuals, different stages of development of the same or different individuals, different diseased individuals (e.g., individuals with cancer or suspected of having a genetic disorder), samples from normal individuals, samples obtained at different stages of a disease in an individual, samples from individuals undergoing different treatments for a disease, samples from individuals undergoing different environmental factors, or individuals susceptible to a condition, or individuals exposed to an infectious disease factor (e.g., HIV).
In one embodiment, such a sample is a maternal sample obtained from a pregnant female (e.g., pregnant woman). In such cases, the sample can be analyzed using the methods described herein to provide a prenatal diagnosis of potential chromosomal abnormalities in the fetus. Such a maternal sample may be a tissue sample, a biological fluid sample, or a cell sample. Biological fluids include (as non-limiting examples): blood, plasma, serum, sweat, tears, sputum, urine, sputum, ear discharge, lymph, saliva, cerebrospinal fluid, lavage (ravages), bone marrow suspension, vaginal discharge, transcervical lavage, cerebral fluid, ascites, milk, secretions of the respiratory, intestinal and genito-urinary tracts, and leukapheresis samples. In another embodiment, the maternal sample is a mixture of two or more biological samples, for example, a biological sample may include two or more biological fluid samples, tissue samples, and cell culture samples. In some embodiments, such a sample is one that is readily obtainable by a non-invasive process, e.g., blood, plasma, serum, sweat, tears, sputum, urine, sputum, ear exudates, saliva, and stool. In some embodiments, such a biological sample is a peripheral blood sample, or a plasma or serum fraction. In other embodiments, such a biological sample is a cotton swab or smear, biopsy specimen, or cell culture. As disclosed above, the terms "blood", "plasma" and "serum" expressly encompass isolated portions or processed portions thereof. Similarly, when a sample is taken from a biopsy, swab, smear, etc., this "sample" expressly encompasses the isolated portion or portion derived from the processing of the biopsy, swab, smear, etc.
The sample may also be a tissue, cell, or other polynucleotide-containing source obtained from in vitro culture. These cultured samples may be taken from a variety of sources, including, but not limited to, cultures (e.g., tissues or cells) maintained under different media and conditions (e.g., pH, pressure, or temperature), cultures (e.g., tissues or cells) maintained for periods of different lengths, cultures (e.g., tissues or cells) treated with different factors or agents (e.g., drug candidates, or modulators), or cultures of different types of tissues or cells.
Methods for isolating nucleic acids from biological sources are well known and will vary depending on the nature of the source. One of ordinary skill in the art can readily isolate nucleic acids from a source as needed for the methods described herein. In some cases, it may be advantageous to fragment the nucleic acid molecules in the nucleic acid sample. Fragmentation may be random or it may be specific, as is the case for example with digestion with restriction enzymes. Methods for random fragmentation are well known in the art and include, for example, restriction DNase digestion, alkaline treatment, and physical shearing. In one embodiment, the sample nucleic acid is obtained as cfDNA, which does not undergo fragmentation. In other embodiments, the sample nucleic acid is obtained as genomic DNA, which undergoes fragmentation into fragments of about 500 or more base pairs, and to which NGS methods can be readily applied.
Determination of CNV for prenatal diagnosis
Cell-free fetal DNA and RNA circulating in maternal blood can be used for early non-invasive prenatal diagnosis (NIPD) of an increasing number of genetic conditions, both for pregnancy management and to aid in reproductive decision-making. The presence of cell-free DNA circulating in the bloodstream has been known for over 50 years. Recently, the presence of small amounts of circulating fetal DNA was found in the maternal blood stream during pregnancy (Lo et al Lancet 350: 485- "487 [1997 ]). Cell-free fetal dna (cfdna), which is believed to be derived from dying placental cells, has been shown to consist of short fragments typically less than 200bp in length, (Chan (old et al), clinical chemistry, 50: 88-92[2004]), which can be identified as Early as only 4 weeks gestation (Illanes et al, Early Human Dev (Early Human development), 83: 563-566[2007]), and is known to be cleared from the maternal circulation within hours of delivery (Lo et al, Am J Hum Genet (U.S. J.Genet., 64: 218-224[1999]). In addition to cfDNA, fragments of cell-free fetal RNA (cfRNA) can be identified in maternal blood flow, which are derived from genes transcribed in the fetus or placenta. The extraction and subsequent analysis of these fetal genetic elements from maternal blood samples provides a new opportunity for NIPD.
The present method is a polymorphism-independent method, which is for use in NIPD and which does not require discrimination of fetal cfDNA from maternal cfDNA in order to be able to determine fetal aneuploidy. In some embodiments, the aneuploidy is a complete chromosomal trisomy or monosomy, or a partial trisomy or monosomy. Partial aneuploidy is caused by obtaining or losing parts of chromosomes and encompasses chromosomal imbalances generated from unbalanced translocations, unbalanced inversions, deletions and insertions. To date, the most common known aneuploidy that coexists with vital energy is trisomy 21, Down Syndrome (DS), which is caused by the presence of some or all of chromosome 21. Rarely, DS can be caused by a genetic or sporadic defect whereby an extra copy of all or part of chromosome 21 becomes attached to another chromosome (usually chromosome 14) to form a single aberrated chromosome. DS is associated with mental impairment, severe learning difficulties, and excessive mortality due to long-term health problems (e.g., heart disease). Other aneuploidies of known clinical significance include edward's syndrome (trisomy 18) and pata's syndrome (trisomy 13), which are often fatal in the first months of life. Aneuploidy related to the number of sex chromosomes is also known and includes monomer X, such as turner syndrome (XO) and triploid syndrome (XXX) in female neonates, and guillian syndrome (XXY) and XYY syndrome in male neonates, all of which are associated with different phenotypes including infertility and reduced intellectual skills. The methods of the invention may be used for prenatal diagnosis of these and other chromosomal abnormalities.
According to some embodiments of the invention, trisomies determined by the invention include, without limitation: trisomy 21 (T21; Down syndrome), trisomy 18 (T18; Edward syndrome), trisomy 16(T16), trisomy 22 (T22; cat eye syndrome), trisomy 15 (T15; Prider-Willi syndrome), trisomy 13 (T13; Pata syndrome), trisomy 8 (T8; Warkanny syndrome) and XXY (Coriolis syndrome), XYY, or XXX trisomy. It is understood that other complete trisomies and partial trisomies can be determined in fetal cfDNA according to the teachings of the present invention. Examples of partial trisomies include, but are not limited to, partial trisomies 1q32-44, trisomies 9p, trisomy 4 chimerism, trisomy 17p, partial trisomy 4q26-qter, trisomy 9, partial 2p trisomy, partial trisomy 1q, and/or partial trisomy 6 p/monosomy 6 q.
The method of the invention may also be used to determine chromosomal monosomy X, as well as partial monosomy, such as monosomy 13, monosomy 15, monosomy 16, monosomy 21, and monosomy 22, which are known to be involved in pregnancy abortions. Partial monosomy of chromosomes typically involving complete aneuploidy can also be determined by the methods of the invention. Monosomy 18p is a rare chromosomal disorder in which all or part of the short arm (p) of chromosome 18 (single chromosomal) is deleted. This disease is typically characterized by short stature, variable mental retardation, language retardation, deformity of the cranial and facial (craniofacial) regions, and/or additional physical abnormalities. The associated craniofacial defects may vary widely in scope and severity from case to case. Conditions arising from changes in the structure and number of chromosome 15 include angman syndrome and pride-willi syndrome, which involve loss of gene activity in the same part of chromosome 15 (the 15q11-q13 region). It is understood that in a parent carrier, several translocations and microdeletions may be asymptomatic, but may still cause major genetic disease in the offspring. For example, a healthy mother carrying 15q11-q13 microdeletions may give rise to a child with angeman syndrome, a severe neurodegenerative disease. Thus, the present invention can be used to identify such partial deletions and other deletions in the fetus. Partial monosomy 13q is a rare chromosomal disorder that occurs when a segment of the long arm (q) of chromosome 13 is missing (monomeric). Infants born with partial monosomy 13q can exhibit low birth weight, deformity of the head and face (craniofacial area), skeletal abnormalities (particularly hands and feet), and other physical abnormalities. Mental retardation is a characteristic of this condition. Mortality during infancy is high in individuals born with this disease. Almost all cases of partial monosomy 13q occur randomly (sporadically) with no apparent cause. The 22q11.2 deletion syndrome, also known as degranger syndrome, is a syndrome caused by the deletion of a small segment of chromosome 22. The deletion (22q11.2) occurred near the middle of the chromosome on the long arm of one of the pair of chromosomes. The characteristics of this syndrome vary widely even among members of the same family and affect many parts of the body. Characteristic signs and symptoms may include birth defects, such as congenital heart disease, defects in the jaw, neuromuscular problems most commonly related to closure (palatopharyngeal insufficiency), learning disorders, minor differences in facial features, and recurrent infections. A microdeletion in chromosome region 22q11.2 is associated with a 20 to 30 fold increased risk of schizophrenia. In one embodiment, the method of the present invention is used to determine partial monosomy, including but not limited to: monosomy 18p, partial monosomy of chromosome 15 (15q11-q13), partial monosomy 13q, and partial monosomy of chromosome 22 can also be determined using the methods of the invention.
The method of the invention can also be used to determine any aneuploidy if one of the parents is a carrier of such aneuploidy known. These include, but are not limited to: chimerization of chromosomes (SMCs) for small extra markers; t (11; 14) (p 15; p13) translocations; unbalanced translocations t (8; 11) (p 23.2; p 15.5); 11q23 microdeletion; smith-magenis syndrome 17p11.2 deletion; 22q13.3 deletion; xp22.3 microdeletions; 10p14 deletion; 20p microdeletion; deguelger syndrome [ del (22) (q11.2q11.23) ]; williams syndrome (7q11.23 and 7q36 deleted); 1p36 deletion; 2p microdeletion; neurofibroma type 1(17q11.2 microdeletion), Yq deletion; Wolf-Hirschhom syndrome (WHS, 4p16.3 microdeletion); 1p36.2 microdeletions; 11q14 deletion; 19q13.2 microdeletion; robins-frain ratio syndrome (16p13.3 microdeletion); 7p21 microdeletion; miller-dicke syndrome (17p13.3), 17p11.2 deletion; and 2q37 microdeletion.
Determination of complete fetal chromosomal aneuploidy
In one embodiment, the invention provides a method for determining the presence or absence of any one or more different, intact fetal chromosomal aneuploidies in a maternal test sample comprising fetal and maternal nucleic acid molecules. Preferably, the method determines the presence or absence of any four or more different, intact fetal chromosomal aneuploidies. The method comprises the following steps: (a) obtaining sequence information for fetal and maternal nucleic acids in a maternal test sample; and (b) using the sequence information to identify a number of sequence tags for each of any one or more chromosomes of interest selected from the group consisting of chromosomes 1-22, X, and Y, and to identify a number of sequence tags for a normalized chromosome sequence for each of the any one or more chromosomes of interest. This normalizing chromosomal sequence may be a single chromosome, or it may be a set of chromosomes selected from chromosomes 1-22, X, and Y. The method further calculates in step (c) a single chromosome dose for each of said any one or more chromosomes of interest using the number of said sequence tags identified for each of said any one or more chromosomes of interest and the number of said sequence tags identified for each of said normalizing chromosome sequences; and (d) comparing each said single chromosome dose for each of said any one or more chromosomes of interest to a threshold value for each of said any one or more chromosomes of interest, thereby to determine the presence or absence of any one or more intact, distinct fetal chromosomal aneuploidies in the maternal test sample.
In some embodiments, step (c) comprises calculating for each of said chromosomes of interest a single chromosome dose as a ratio of the number of sequence tags identified for each of said chromosomes of interest to the number of sequence tags identified for said normalized chromosome sequences for each of said chromosomes of interest.
In other embodiments, step (c) comprises calculating a single chromosome dose for each of said chromosomes of interest as a ratio of the number of sequence tags identified for each of said chromosomes of interest to the number of sequence tags identified for said normalized chromosomes of each of said chromosomes of interest. In other embodiments, step (c) comprises: by correlating the number of sequence tags obtained for a chromosome of interest with the length of the chromosome of interest and the number of tags for the corresponding normalized chromosome sequence for the chromosome of interest with the length of the normalized chromosome sequence, a sequence tag ratio is calculated for one chromosome of interest and one chromosome dose is calculated for this chromosome of interest as the ratio of the sequence tag density for the chromosome of interest to the sequence tag density for the normalized chromosome sequence. This calculation is repeated for each of all sequences of interest. Steps (a) - (d) may be repeated for test samples from different maternal subjects.
Four or more intact fetal chromosomal aneuploidies are determined in a maternal test sample comprising a mixture of fetal and maternal cell-free DNA molecules by an example of this embodiment, including: (a) sequencing at least a portion of the cell-free DNA molecules to obtain sequence information for the fetal and maternal cell-free DNA molecules in the test sample; (b) using the sequence information to identify a number of sequence tags for any twenty or more chromosomes of interest selected from each of chromosomes 1-22, X, and Y and to identify a number of sequence tags for a normalized chromosome of twenty or more chromosomes of interest; (c) calculating a single chromosome dose for each of twenty or more chromosomes of interest using the number of sequence tags identified for each of the twenty or more chromosomes of interest and the number of sequence tags identified for each of the normalized chromosomes; and (d) comparing each single chromosome dose for each of the twenty or more chromosomes of interest to a threshold value for each of the twenty or more chromosomes of interest, and thereby determining the presence or absence of any twenty or more different, intact fetal chromosomal aneuploidies in the test sample.
In another embodiment, the method for determining the presence or absence of any one or more different, intact fetal chromosomal aneuploidies in a maternal test sample, as described above, uses a normalized fragment sequence for determining the dose of a chromosome of interest. In this case, the method comprises: (a) obtaining sequence information for fetal and maternal nucleic acids in the sample; and (b) using the sequence information to identify a number of sequence tags for each of any one or more chromosomes of interest selected from the group consisting of chromosomes 1-22, X, and Y, and to identify a number of sequence tags for a normalized chromosome sequence for each of the any one or more chromosomes of interest. The normalized fragment sequence may be a single segment of a chromosome, or it may be a set of fragments from one or more different chromosomes. The method further calculates in step (c) a single chromosome dose for each of the any one or more chromosomes of interest using the number of sequence tags identified for each of the any one or more chromosomes of interest and the number of sequence tags identified for the normalized fragment sequences; and (d) comparing each said single chromosome dose for each of said any one or more chromosomes of interest to a threshold value for each of said one or more chromosomes of interest, and thereby determining the presence or absence of one or more different, intact fetal chromosomal aneuploidies in said sample.
In some embodiments, step (c) comprises calculating for each of said chromosomes of interest a single chromosome dose as a ratio of the number of sequence tags identified for each of said chromosomes of interest to the number of sequence tags identified for said normalized fragment sequences of each of said chromosomes of interest.
In other embodiments, step (c) comprises: by correlating the number of sequence tags obtained for a chromosome of interest with the length of the chromosome of interest and the number of tags for the corresponding normalized fragment sequences for the chromosome of interest with the length of the normalized fragment sequences, a sequence tag ratio is calculated for one chromosome of interest and one chromosome dose is calculated for this chromosome of interest as the ratio of the sequence tag density for the chromosome of interest to the sequence tag density for the normalized fragment sequences. This calculation is repeated for each of all sequences of interest. Steps (a) - (d) may be repeated for test samples from different maternal subjects.
Determining the Normalized Chromosome Value (NCV) provides a means for comparing chromosome dosages for different sample sets, which correlates chromosome dosages in test samples to the average of the corresponding chromosome dosages in a set of qualifying samples. This NCV was calculated as:
Figure BDA00002366985700561
wherein
Figure BDA00002366985700562
And
Figure BDA00002366985700563
respectively, the estimated mean and standard deviation for the jth chromosome dose in a set of qualifying samples, and xijIs the jth chromosome dose observed for test sample i.
In some embodiments, the presence or absence of at least one intact fetal chromosomal aneuploidy is determined. In other embodiments, the presence or absence of at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least petro-two, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty-one, at least twenty-two, at least twenty-three, or twenty-four intact fetal chromosomal aneuploidies is determined in a sample, wherein twenty-two of the intact fetal chromosomal aneuploidies correspond to intact chromosomal aneuploidies of any one or more autosomes; the twenty-third and twenty-fourth chromosomal aneuploidies correspond to complete fetal chromosomal aneuploidies of chromosomes X and Y. Because the aneuploidy of a sex chromosome can include tetrasomy, pentasomic, and other polysomy, the number of different intact chromosomal aneuploidies that can be determined according to the present methods can be at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, or at least 30 intact chromosomal aneuploidies. Thus, the number of different intact chromosomal aneuploidies determined is correlated with the number of chromosomes of interest selected for analysis.
In one embodiment, determining the presence or absence of any one or more different, intact fetal chromosomal aneuploidies in a maternal test sample as described above uses a normalized fragment sequence for a chromosome of interest selected from chromosomes 1-22, X, and Y. In other embodiments, the two or more chromosomes of interest are selected from any two or more of chromosomes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, or Y. In one embodiment, any one or more chromosomes of interest selected from the group consisting of chromosomes 1-22, X, and Y includes at least twenty chromosomes selected from the group consisting of chromosomes 1-22, X, and Y, and wherein the presence or absence of at least twenty different, intact fetal chromosomal aneuploidies is determined. In other embodiments, any one or more of the chromosomes of interest selected from the group consisting of chromosomes 1-22, X, and Y is all of chromosomes 1-22, X, and Y, and wherein the presence or absence of a complete fetal chromosomal aneuploidy of all of chromosomes 1-22, X, and Y is determined. The different fetal chromosomal aneuploidies that can be determined are intact chromosome trisomies, intact chromosome monosomies and intact chromosome polysomies. Examples of intact fetal chromosomal aneuploidies include, but are not limited to: any one or more autosomal trisomies, e.g., trisomy 2, trisomy 8, trisomy 9, trisomy 21, trisomy 13, trisomy 16, trisomy 18, trisomy 22; trisomies of sex chromosomes, such as 47, XXY, 47XXX, and 47 XYY; tetragons of sex chromosomes, such as 48, XXYY, 48, XXXY, 48, XXXX, and 48, XYYY; the pentasomal character of sex chromosomes, e.g., 49, XXXYY, 49, xxxxxy, 49, xxxxxx, 49, xyyy; and single chromosome X. Other complete fetal chromosomal aneuploidies that can be determined according to the present method are described below.
Determination of partial fetal chromosomal aneuploidy
In another embodiment, the invention provides a method for determining the presence or absence of any one or more different, partial fetal chromosomal aneuploidies in a maternal test sample comprising fetal and maternal nucleic acid molecules. The method comprises the following steps: (a) obtaining sequence information for fetal and maternal nucleic acids in the sample; and (b) using the sequence information to identify a number of sequence tags for each of any one or more chromosomes of interest selected from the group consisting of chromosomes 1-22, X, and Y, and a number of sequence tags for one normalized fragment sequence for each of the any one or more fragments in any one or more chromosomes of interest. The normalized fragment sequence may be a single fragment of one chromosome, or it may be a set of fragments from one or more different chromosomes. The method further calculates in step (c) a single fragment dose for each of any one or more fragments of any one or more chromosomes of interest using the number of sequence tags identified for any one or more fragments of each of said any one or more chromosomes of interest and the number of sequence tags identified for each of said normalized fragment sequences; and (d) comparing each said single chromosome dose in any one or more segments for each said any one or more chromosomes of interest to a threshold for each said any one or more chromosome segments for any said any one or more chromosomes of interest, and thereby determining the presence or absence of one or more different, partial fetal chromosomal aneuploidies in said sample.
In some embodiments, step (c) comprises calculating a single fragment dose for any one or more fragments of each any one or more chromosomes of interest as a ratio of the number of sequence tags identified for any one or more fragments of each any one or more chromosomes of interest to the number of sequence tags identified for the normalized fragment sequences for any one or more fragments of each said any one or more chromosomes of interest.
In other embodiments, step (c) comprises: by correlating the number of sequence tags obtained for a segment of interest with the length of the segment of interest and the number of tags of the corresponding normalized segment sequence for the segment of interest with the length of the normalized segment sequence, a sequence tag ratio is calculated for one segment of interest and one segment dose is calculated for this segment of interest as the ratio of the sequence tag density of the segment of interest to the sequence tag density of the normalized segment sequence. This calculation is repeated for each of all sequences of interest. Steps (a) - (d) may be repeated for test samples from different maternal subjects.
Determining a normalized fragment value (NSV) provides a means for comparing fragment doses for different sample sets, which correlates fragment doses in a test sample to the average of the corresponding fragment doses in a set of qualifying samples. NSV was calculated as:
Figure BDA00002366985700581
wherein
Figure BDA00002366985700582
And
Figure BDA00002366985700583
corresponding is the estimated mean and standard deviation for the jth fragment dose in a set of qualifying samples, and xijIs the jth fragment dose observed for test sample i.
In some embodiments, the presence or absence of a portion of a fetal chromosomal aneuploidy is determined. In other embodiments, the presence or absence of two, three, four, five, six, seven, eight, nine, ten, fifteen, twenty-five, or more portions of fetal chromosomal aneuploidies is determined in a sample. In one embodiment, a fragment of interest selected from any one of chromosomes 1-22, X, and Y is selected from chromosomes 1-22, X, and Y. In another embodiment, the two or more fragments of interest selected from chromosomes 1-22, X, and Y are selected from chromosomes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, or Y. In one embodiment, any one or more fragments of interest selected from chromosomes 1-22, X, and Y includes at least one, five, ten, 15, 20, 25, or more fragments selected from chromosomes 1-22, X, and Y, and wherein the presence or absence of at least one, five, ten, 15, 20, 25 different, partial fetal chromosomal aneuploidies is determined. Different, partial fetal chromosomal aneuploidies that can be determined include partial replication, partial doubling, partial insertion, and partial deletion. Examples of partial fetal chromosomal aneuploidies include monosomy of a portion of an autosome and trisomy of a portion. The monosomy of the autosomal segment includes monosomy of the segment of chromosome 1, monosomy of the segment of chromosome 4, monosomy of the segment of chromosome 5, monosomy of the segment of chromosome 7, monosomy of the segment of chromosome 11, monosomy of the segment of chromosome 15, monosomy of the segment of chromosome 17, monosomy of the segment of chromosome 18, and monosomy of the segment of chromosome 22. Other parts of fetal chromosomal aneuploidy that can be determined according to the present method will be described below.
In any of the above embodiments, such test sample is a maternal sample selected from the group consisting of blood, plasma, serum, urine and saliva samples. In some embodiments, the maternal test sample is a plasma sample. The nucleic acid molecules of the maternal sample are a mixture of fetal and maternal cell-free DNA molecules. Sequencing of nucleic acids can be performed using Next Generation Sequencing (NGS) as described elsewhere in the application. In some embodiments, sequencing is massively parallel sequencing using sequencing by synthesis with reversible dye terminators. In other embodiments, the sequencing is ligation sequencing. In still other embodiments, the sequencing is single molecule sequencing. Optionally, an amplification step is performed prior to sequencing.
Determination of CNV for clinical conditions
In addition to early determination of neonatal defects, the methods described herein can also be applied to determine any abnormalities in the expression of genetic sequences within a genome.
It has been demonstrated that plasma and serum DNA from cancer patients contains measurable amounts of tumor DNA, it can be recovered and used as a surrogate source of tumor DNA, and tumors are characterized by aneuploidy, or inappropriate numbers of gene sequences or even intact chromosomes. Determining the difference in the amount of a given sequence (i.e., the sequence of interest) in a sample from an individual can therefore be used to diagnose a medical condition. In some embodiments, the method can be used to determine the presence or absence of a chromosomal aneuploidy in a patient suspected or known to have cancer. The method can also be applied to: determining the presence or absence of a disease state; determining the presence or absence of a nucleic acid of a pathogen (e.g., a virus); determining a chromosomal abnormality associated with Graft Versus Host Disease (GVHD); and determining the constitution of the individual in the forensic analysis.
Embodiments of the invention provide a method for assessing copy number variation of a sequence of interest (e.g., a clinically relevant sequence) in a test sample comprising a mixture of nucleic acids derived from two different genomes and which are known or suspected to differ in the amount of one or more sequences of interest. The mixture of nucleic acids is derived from two or more types of cells. In one embodiment, the nucleic acid mixture is derived from normal and cancerous cells derived from a subject having a medical condition (e.g., cancer).
The development of cancer is usually accompanied by a change in the number of whole chromosomes (i.e. complete chromosomal aneuploidy), and/or a change in the number of segments of chromosomes (i.e. partial aneuploidy), caused by a process known as Chromosome Instability (CIN) (Thoma et al, Swiss Med Weekly, Swiss medical journal, 2011: 141: 13170). It is believed that many solid tumors, such as breast cancer, progress from initiation to metastasis through the accumulation of several genetic aberrations. [ Sato et al, Cancer Res, (Cancer study), 50: 7184-; jongsma (jongsman) et al, JClin Pathol (journal of clinical pathology): mol Path (molecular pathology) 55: 305-309[2002])]. Such genetic aberrations, as they accumulate, can result in reproductive advantage, genetic instability and the concomitant ability to rapidly develop resistance, as well as enhanced angiogenesis, proteolysis and metabolism. These genetic aberrations can either affect a recessive "tumor suppressor gene" or a dominant open oncogene. Deletions and recombinations leading to loss of heterozygosity (LOH) are thought to play a major role in tumor progression by uncovering mutated tumor suppressor alleles.
cfDNA has been found in the circulation of patients diagnosed with malignancies including, but not limited to, lung Cancer (Pathak et al, Clin Chem (clinical chemistry), 52: 1833-. The identification of genomic instability associated with cancer that can be determined in circulating cfDNA of cancer patients is a potential diagnostic and prognostic tool. In one embodiment, the methods of the invention assess CNV of a sequence of interest in a sample comprising a mixture of nucleic acids derived from a subject known or suspected to have cancer, e.g., carcinoma, sarcoma, lymphoma, leukemia, germ cell tumor, and blastoma. In one embodiment, this sample is a plasma sample derived (processed) from peripheral blood, and it comprises a mixture of cfDNA derived from normal and cancerous cells. In another embodiment, the biological sample for which the presence or absence of CNV is to be determined is a mixture derived from cancerous and non-cancerous cells from other biological fluids including, but not limited to: serum, sweat, tears, sputum, urine, sputum, ear discharge, lymph, saliva, cerebrospinal fluid, lavage (ravages), bone marrow suspensions, vaginal discharge, transcervical lavage, cerebral fluid, ascites, breast milk, secretions of the respiratory, intestinal, and genitourinary tracts, and leukopheresis samples, or in tissue biopsies, swabs, or smears. In other embodiments, such a biological sample is a stool (fecal) sample.
The sequence of interest is a nucleic acid sequence that is known or suspected to play a role in the development and/or progression of cancer. Examples of sequences of interest include nucleic acid sequences, i.e., complete chromosomes and/or fragments of chromosomes, which are amplified or deleted in cancer cells as described below.
In one embodiment, the method can be used to determine the presence or absence of chromosomal amplification. In some embodiments, such chromosome amplification is the obtaining of one or more whole chromosomes. In other embodiments, the chromosomal amplification is the obtaining of one or more fragments of a chromosome. In still other embodiments, such chromosomal amplification is the obtaining of two or more fragments of two or more chromosomes. Such chromosomal amplification may involve obtaining one or more oncogenes.
Dominant open genes associated with human solid tumors typically exert their effects through overexpression or altered expression. Gene amplification is a common mechanism that results in the upregulation of gene expression. Evidence from cytogenetic studies indicates that significant expansion occurs in more than 50% of human breast carcinomas. Most notably, amplification of the proto-oncogene, human epidermal growth factor receptor 2(HER2), located on chromosome 17(17(17q21-q22), resulted in overexpression of the HER2 receptor on the cell surface, leading to excessive and dysregulated signaling in Breast Cancer and other malignancies (Park et al, Clinical Breast Cancer, 8: 392-. A variety of oncogenes have been found to be amplified in other human malignancies. Examples of cellular oncogene amplification in human tumors include amplification of: the promyelocytic leukemia cell line HL60, as well as c-myc in small cell lung cancer, primary neuroblastoma (stages III and IV), neuroblastoma cell lines, retinoblastoma cell lines and primary tumors, and N-myc in small cell lung cancer cell lines and tumors, L-myc in small cell lung cancer cell lines and tumors, c-myb in acute myelogenous leukemia and colon cancer cell lines, epidermoid carcinoma cells, and c-erbb in primary gliomas, c-K-ras-2 in primary cancers of the lung, colon, bladder, and rectum, N-ras in breast cancer cell lines (Varmus H., Ann Rev Genetics, 18: 553- "612 (1984), [ cited in Watson et al, Molecular Biology of the gene (4 th edition; Benjamin/Cummings Publishing Co 1987) ].
In one embodiment, the method may be used to determine the presence or absence of a chromosomal deletion. In some embodiments, such a chromosomal deletion is a loss of one or more entire chromosomes. In other embodiments, such a chromosomal deletion is a loss of one or more segments of a chromosome. In still other embodiments, such a chromosome deletion is the loss of two or more fragments of two or more chromosomes. Such chromosomal deletions may involve the loss of one or more tumor suppressor genes.
Chromosomal deletions involving tumor suppressor genes can play an important role in the development and progression of solid tumors. The retinoblastoma tumor suppressor gene (Rb-1) (located on chromosome 13q14) is the most widely characterized tumor suppressor gene. The Rb-1 gene product, a 105kDa nuclear phosphoprotein, appears to play an important role in cell cycle regulation (Howe et al, Proc Natl Acad Sci (Proc. Natl. Acad. Sci.) (USA), 87: 5883-. Altered or lost expression of the Rb protein results from inactivation of alleles of both genes by a point mutation or chromosomal deletion. Rb-i gene alterations have been found not only in retinoblastoma, but also in other malignancies, such as osteosarcoma, small cell lung carcinoma (Rygaard et al, Cancer Res (Cancer research), 50: 5312-5317[1990) ]) and breast Cancer. Restriction Fragment Length Polymorphism (RFLP) studies have shown that this type of tumor often loses heterozygosity at 13q, suggesting that one of the alleles of the Rb-1 gene has been lost due to total chromosomal deletions (Bowcock et al, Am J Hum Genet (J.U.S.A., J.Man Genet), 46: 12[1990 ]). Chromosomal 1 abnormalities, including those involving duplications, deletions and unbalanced translocations of chromosome 6 and other companion chromosomes, indicate that regions of chromosome 1, particularly 1q21-1q32 and 1p11-13, may harbor oncogenes or tumor suppressor genes involved in the development of chronic and advanced stages of myeloproliferative tumors (Caramazza et al, Eur J Hematol (J. Eur. Hematology), 84: 191-200[2010 ]). Myeloproliferative tumors are also associated with a loss of chromosome 5. An intact or intermediate loss of chromosome 5 is the most common karyotypic abnormality in myelodysplastic syndrome (MDS). Isolated del (5q)/5q-MDS patients have a more favorable prognosis than those with additional karyotypic defects, and they are predisposed to developing myeloproliferative neoplasms (MPN) and acute myelogenous leukemia. The frequency of unbalanced chromosome 5 deletions has led to the idea that: 5q accommodate one or more tumor suppressor genes that play a fundamental role in the growth control of hematopoietic stem/progenitor cells (HSC/HPC). Cytogenetic mapping of the normally deleted regions (CDRs) focused on candidate tumor suppressor genes identified at 5q31 and 5q32, including ribosomal subunit RPS14, transcription factor Egr1/Krox20 and cytoskeleton remodeling proteins, alpha-catenin (Eisenmann, Oncogene, 28: 3429-. Cytogenetic and allelic studies of fresh tumors and tumor cell lines have demonstrated that loss of alleles from several defined regions on chromosome 3p (including 3p25, 3p21-22, 3p21.3, 3p12-13, and 3p14) is the earliest and most common genomic abnormality involved in a broad spectrum of major epithelial cancers of lung, breast, kidney, head and neck, ovary, cervix, colon, pancreas, esophagus, bladder, and other organs. Several tumor suppressor genes have been mapped to the chromosomal 3p region, and it is believed that the intervening deletion or promoter hypermethylation precedes the loss of 3p or the entire chromosome 3 in the development of cancer (Angeloni D., Briefings Functional Genomics, 6: 19-39[2007 ]).
Newborns and children with Down Syndrome (DS) often present with congenital transient leukemia and have an increased risk of acute myeloid leukemia and acute lymphoblastic leukemia. Chromosome 21 (containing about 300 genes) may be involved in a variety of structural aberrations, such as translocations, deletions, and amplifications in leukemias, lymphomas, and solid tumors. In addition, the important role played by genes located on chromosome 21 in tumorigenesis has been identified. The number of entities of chromosome 21, as well as structural aberrations, are associated with leukemia, and specific genes include RUNX1, TMPRSS2, and TFF, which are located at 21q and play a role in tumorigenesis (Fonatsch (von natsch) C, Gene Chromosomes Cancer, 49: 497-.
In one embodiment, the method provides a means to assess the correlation between gene amplification and the extent of tumor evolution. The association between amplification and/or deletion and cancer stage or grade can be important for prognosis, as such information can constitute a definition of genetic tumor grade, which would better predict the future course of more advanced tumors with worst prognosis. In addition, information about early amplification and/or deletion events can be useful in correlating these events as predictors of subsequent disease progression. Gene amplification and deletion identified by the present methods can be correlated with other known parameters such as tumor grade, medical history, Brd/Urd marker index, hormonal status, lymph node metastasis, tumor size, survival time, and other tumor characteristics available from epidemiological and biometric studies. For example, tumor DNA to be tested by the present method may include atypical hyperplasia, ductal carcinoma in situ, stage I-III cancer, and metastatic lymph nodes, in order to allow identification of associations between amplifications and deletions and stages. The association made may enable effective therapeutic intervention. For example, a consistently amplified region may contain an overexpressed gene, the product of which may be capable of receiving therapeutic attachment (e.g., growth factor receptor tyrosine kinase p185HER2)。
By determining copy number variations of those nucleic acid sequences from the primary cancer to cells that have metastasized to other sites, the method can be used to identify amplification and/or deletion events associated with drug resistance. If gene amplification and/or deletion is a manifestation of karyotypic instability that allows rapid development of drug resistance, more amplification and/or deletion in the primary tumor from chemotherapy-resistant patients would be expected compared to tumors from chemotherapy-sensitive patients. For example, if amplification of a particular gene results in the development of drug resistance, a consistent amplification of the region surrounding those genes would be expected in tumor cells from chemotherapy-resistant patients, rather than in the primary tumor. The discovery of associations between gene amplifications and/or deletions and drug resistance development may allow for the identification of patients who will or will not benefit from adjuvant therapy.
In a manner similar to that described for determining the presence or absence of a complete and/or partial fetal chromosomal aneuploidy in a maternal sample, the methods of the invention can be used to determine the presence or absence of a complete and/or partial chromosomal aneuploidy in any patient sample (including patient samples that are not maternal samples) that comprises nucleic acids (e.g., DNA or cfDNA). Such patient sample may be any biological sample type as described elsewhere in the application. Preferably, such a sample is obtained by a non-invasive procedure. Such a sample may be, for example, a blood sample, or serum and plasma fractions thereof. Alternatively, such a sample may be a urine sample or a feces sample. In still other embodiments, the sample is a tissue biopsy sample. In all cases, such samples include nucleic acids, such as cfDNA or genomic DNA, which is purified and sequenced using any of the NGS sequencing methods described above.
Both complete and partial chromosomal aneuploidies associated with the development and progression of cancer can be determined according to the present methods.
Determination of intact chromosomal aneuploidy in patient samples
In one embodiment, the invention provides a method for determining the presence or absence of any one or more distinct, intact chromosomal aneuploidies in a patient test sample comprising nucleic acid molecules. In some embodiments, the method determines the presence or absence of any one or more different, intact chromosomal aneuploidies. The method comprises the following steps: (a) obtaining sequence information for patient nucleic acids in a patient test sample; and (b) using the sequence information to identify a number of sequence tags for each of any one or more chromosomes of interest selected from chromosomes 1-22, X, and Y, and a number of sequence tags for a normalized chromosome sequence for each of the any one or more chromosomes of interest. The normalizing chromosomal sequence may be a single chromosome, or it may be a set of chromosomes selected from chromosomes 1-22, X, and Y. The method further calculates in step (c) a single chromosome dose for each of any one or more chromosomes of interest using the number of sequence tags identified for each of the any one or more chromosomes of interest and the number of sequence tags identified for each of the normalizing chromosome sequences; and (d) comparing each said single chromosome dose for each of said any one or more chromosomes of interest to a threshold value for each of said any one or more chromosomes of interest, thereby to determine the presence or absence of any one or more different, intact patient chromosomal aneuploidies in the patient test sample.
In some embodiments, step (c) comprises calculating for each of said chromosomes of interest a single chromosome dose as a ratio of the number of sequence tags identified for each of said chromosomes of interest to the number of sequence tags identified for said normalized chromosome sequences for each of said chromosomes of interest.
In other embodiments, step (c) comprises calculating a single chromosome dose for each of said chromosomes of interest as a ratio of the number of sequence tags identified for each of said chromosomes of interest to the number of sequence tags identified for said normalized chromosomes of each of said chromosomes of interest. In other embodiments, step (c) comprises: by correlating the number of sequence tags obtained for a chromosome of interest with the length of the chromosome of interest and correlating the number of tags for the corresponding normalized chromosome sequences for the chromosome of interest with the length of the normalized chromosome sequences, a sequence tag ratio is calculated for one chromosome of interest and one chromosome dose is calculated for this chromosome of interest as the ratio of the sequence tag density for the chromosome of interest to the sequence tag density for the normalized chromosome sequences. This calculation is repeated for each of all sequences of interest. Steps (a) - (d) may be repeated for test samples from different patients.
One or more intact chromosomal aneuploidies are determined in a cancer patient test sample comprising a cell-free DNA molecule by an example of this embodiment, which includes: (a) sequencing at least a portion of the cell-free DNA molecules so as to obtain sequence information for the patient cell-free DNA molecules in the test sample; (b) using the sequence information to identify a number of sequence tags for any twenty or more chromosomes of interest selected from chromosomes 1-22, X, and Y and to identify a number of sequence tags for a normalizing chromosome of each of the twenty or more chromosomes of interest; (c) calculating a single chromosome dose for each of twenty or more chromosomes of interest using the number of sequence tags identified for each of the twenty or more chromosomes of interest and the number of sequence tags identified for each of the normalized chromosomes; and (d) comparing each single chromosome dose for each of said twenty or more chromosomes of interest to a threshold value for each of twenty or more chromosomes of interest, and therefrom determining the presence or absence of any twenty or more different, intact chromosomal aneuploidies in the patient test sample.
In another embodiment, the method for determining the presence or absence of any one or more distinct, intact chromosomal aneuploidies in a patient test sample as described above uses a normalized fragment sequence to determine the dose of a chromosome of interest. In this example, the method includes: (a) obtaining sequence information for nucleic acids in the sample; and (b) using the sequence information to identify a number of sequence tags for each of any one or more chromosomes of interest selected from chromosomes 1-22, X, and Y, and a number of sequence tags for a normalized fragment sequence for each of the any one or more chromosomes of interest. The normalized fragment sequence may be a single fragment of one chromosome, or it may be a set of fragments from one or more different chromosomes. The method further calculates in step (c) a single chromosome dose for each of the any one or more chromosomes of interest using the number of sequence tags identified for each of the any one or more chromosomes of interest and the number of sequence tags identified for the normalized fragment sequences; and (d) comparing each said single chromosome dose for each of said any one or more chromosomes of interest to a threshold value for each of said one or more chromosomes of interest, and thereby determining the presence or absence of one or more different, intact chromosomal aneuploidies in the patient sample.
In some embodiments, step (c) comprises calculating for each of said chromosomes of interest a single chromosome dose as a ratio of the number of sequence tags identified for each of said chromosomes of interest to the number of sequence tags identified for said normalized fragment sequences of each of said chromosomes of interest.
In other embodiments, step (c) comprises: by correlating the number of sequence tags obtained for a chromosome of interest with the length of the chromosome of interest and the number of tags for the corresponding normalized fragment sequences for the chromosome of interest with the length of the normalized fragment sequences, a sequence tag ratio is calculated for one chromosome of interest and one chromosome dose is calculated for this chromosome of interest as the ratio of the sequence tag density for the chromosome of interest to the sequence tag density for the normalized fragment sequences. This calculation is repeated for each of all sequences of interest. Steps (a) - (d) may be repeated for test samples from different patients.
Determining a Normalized Chromosome Value (NCV) provides a means for comparing chromosome dosages for different sample sets, which correlates chromosome dosages in test samples with the average of the corresponding chromosome dosages in a set of qualifying samples. NCV was calculated as:
Figure BDA00002366985700671
wherein
Figure BDA00002366985700681
And
Figure BDA00002366985700682
corresponding is the estimated mean and standard deviation for the jth chromosome dose in a set of qualifying samples, and xijIs the jth chromosome dose observed for test sample i.
In some embodiments, the presence or absence of an intact chromosomal aneuploidy is determined. In other embodiments, the presence or absence of two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty-one, twenty-two, twenty-three, or twenty-four intact chromosomal aneuploidies is determined in a sample, wherein twenty-two intact chromosomal aneuploidies correspond to the intact chromosomal aneuploidies of any one or more autosomes; the twenty-third and twenty-fourth chromosomal aneuploidies correspond to the complete chromosomal aneuploidies of chromosomes X and Y. Since aneuploidies may include trisomy, tetrasomy, pentasomy, and other polysomies, and the number of intact chromosomal aneuploidies varies in different diseases and in different stages of the same disease, the number of intact chromosomal aneuploidies determined according to the present methods is at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30 intact, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, or more chromosomal aneuploidies. Phylogenetic karyotyping of tumors has revealed that the number of chromosomes in cancer cells is highly variable, ranging from hypodiploid (considerably less than 46 chromosomes) to tetraploid and supertetraploid (up to 200 chromosomes) (Storchova and Kuffer, J Cell Sci (J. Cell Sci), 121: 3859-3866[2008 ]). In some embodiments, the method comprises determining the presence or absence of up to 200 or more chromosomal aneuploidies in a sample from a patient suspected or known to have cancer (e.g., colon cancer). These chromosomal aneuploidies include the loss of one or more intact chromosomes (hypodiploids), resulting in intact chromosomes that include trisomies, tetrasomy, pentasomies, and other polysomies. The acquisition and/or loss of chromosome fragments may also be determined as explained elsewhere in the application. The method is suitable for determining the presence or absence of different aneuploidies in a sample from a patient suspected or known to have a cancer as specified elsewhere in the application.
In some embodiments, any of chromosomes 1-22, X, and Y can be the chromosome of interest in determining the presence or absence of any one or more different, intact chromosomal aneuploidies in a patient test sample as described above. In other embodiments, the two or more chromosomes of interest are selected from any two or more of chromosomes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, or Y. In one embodiment, any one or more chromosomes of interest selected from the group consisting of chromosomes 1-22, X, and Y comprises at least twenty chromosomes selected from the group consisting of chromosomes 1-22, X, and Y, and wherein the presence or absence of at least twenty different, intact chromosomal aneuploidies is determined. In other embodiments, any one or more of the chromosomes of interest selected from the group consisting of chromosomes 1-22, X, and Y is all of chromosomes 1-22, X, and Y, and wherein the presence or absence of an intact chromosomal aneuploidy of all of chromosomes 1-22, X, and Y is determined. Intact, different chromosomal aneuploidies that can be determined include an intact chromosomal monosomy of any one or more of chromosomes 1-22, X and Y; a complete chromosomal trisomy of any one or more of chromosomes 1-22, X, and Y; a complete chromosomal tetrasomy of any one or more of chromosomes 1-22, X and Y; a complete chromosomal pentasomal of any one or more of chromosomes 1-22, X and Y; and other complete chromosomal polysomy of any one or more of chromosomes 1-22, X and Y.
Determination of chromosomal aneuploidy in a portion of a patient sample
In another embodiment, the invention provides a method for determining the presence or absence of any one or more different, partial chromosomal aneuploidies in a patient test sample comprising nucleic acid molecules. The method comprises the following steps: (a) obtaining sequence information for patient nucleic acids in the sample; and (b) using the sequence information to identify a number of sequence tags for each of any one or more chromosomes of interest selected from the group consisting of chromosomes 1-22, X, and Y, and a number of sequence tags for one normalized fragment sequence for each of the any one or more fragments in any one or more chromosomes of interest. The normalized fragment sequence may be a single fragment of one chromosome, or it may be a set of fragments from one or more different chromosomes. The method further uses in step (c) the number of said sequence tags identified for any one or more segments of each of said any one or more chromosomes of interest and the number of said sequence tags identified for each of said normalized segment sequences to calculate a single segment dose for each of any one or more segments of said any one or more chromosomes of interest; and (d) comparing each said single chromosome dose in any one or more fragments for each said any one or more chromosomes of interest to a threshold value for each said any one or more chromosome fragments for any said any one or more chromosomes of interest, and thereby determining the presence or absence of one or more different, partial chromosomal aneuploidies in said sample.
In some embodiments, step (c) comprises: calculating a single fragment dose for any one or more fragments of each any one or more chromosomes of interest as a ratio of the number of sequence tags identified for any one or more fragments of each any one or more chromosomes of interest to the number of sequence tags identified for the normalized fragment sequence for any one or more fragments of each said any one or more chromosomes of interest.
In other embodiments, step (c) comprises: by correlating the number of sequence tags obtained for a segment of interest with the length of the segment of interest and the number of tags of the corresponding normalized segment sequence for the segment of interest with the length of the normalized segment sequence, a sequence tag ratio is calculated for one segment of interest and a segment dose is calculated for this segment of interest as the ratio of the sequence tag density of the segment of interest to the sequence tag density of the normalized segment sequence. This calculation is repeated for each of all sequences of interest. Steps (a) - (d) may be repeated for test samples from different patients.
Determining the normalized fragment value (NSV) provides a means for comparing the fragment doses for different sample sets, which correlates the fragment dose in a test sample to the average of the corresponding fragment doses in a set of qualifying samples. NSV was calculated as:
Figure BDA00002366985700701
wherein
Figure BDA00002366985700702
And
Figure BDA00002366985700703
corresponding is the estimated mean and standard deviation for the jth fragment dose in a set of qualifying samples, and xijIs the jth fragment dose observed for test sample i.
In some embodiments, the presence or absence of a fraction of a chromosomal aneuploidy is determined. In other embodiments, the presence or absence of two, three, four, five, six, seven, eight, nine, ten, fifteen, twenty-five, or more fractions of a chromosomal aneuploidy is determined in a sample. In one embodiment, a fragment of interest selected from any one of chromosomes 1-22, X, and Y is selected from chromosomes 1-22, X, and Y. In other embodiments, the two or more fragments of interest selected from chromosomes 1-22, X, and Y are selected from any two or more of chromosomes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, or Y.
In one embodiment, any one or more fragments of interest selected from chromosomes 1-22, X, and Y include at least one, five, ten, 15, 20, 25, 50, 75, 100, or more fragments selected from chromosomes 1-22, X, and Y, and wherein the presence or absence of at least one, five, ten, 15, 20, 25, 50, 75, 100, or more different, partial chromosomal aneuploidies is determined. Different, partial chromosomal aneuploidies that can be determined include partial replication, partial doubling, partial insertion, and partial deletion.
The sample that can be used to determine the presence or absence of a chromosomal aneuploidy (partial or complete) in a patient can be any biological sample described elsewhere in this application. The type of sample or samples that may be used to determine aneuploidy in a patient will depend on the type of disease the patient is known to or suspected of having. For example, a fecal sample can be selected as a source of DNA to determine the presence or absence of aneuploidy associated with colorectal cancer. The method is also applicable to the tissue samples described herein. Preferably, the sample is a biological sample obtained by non-invasive means, such as a plasma sample. Sequencing of nucleic acids in patient samples can be performed using Next Generation Sequencing (NGS) as described elsewhere in the application, as described elsewhere in the application. In some embodiments, sequencing is massively parallel sequencing using sequencing by synthesis with reversible dye terminators. In other embodiments, the sequencing is ligation sequencing. In still other embodiments, the sequencing is single molecule sequencing. Optionally, an amplification step is performed prior to sequencing.
In some embodiments, the presence or absence of aneuploidy in a patient suspected of having a cancer as described elsewhere in this application, such as lung, breast, kidney, head and neck, ovary, cervix, colon, pancreas, esophagus, bladder, and other organ cancers, as well as hematological cancers, is determined. Hematologic cancers include cancers of the bone marrow, blood, and lymphatic system, including lymph nodes, lymphatic vessels, tonsils, thymus, spleen, and gut lymphoid tissue. Leukemias and myelomas (which begin in the bone marrow), and lymphomas (which begin in the lymphatic system) are the most common types of hematological cancers.
A determination may be made in a patient sample of the presence or absence of one or more chromosomal aneuploidies without limitation to: determining a patient's susceptibility to a particular cancer, determining the presence or absence of a cancer of interest as part of routine screening among patients known or unknown to be susceptible to a cancer, providing a prognosis for the disease, assessing the need for adjuvant therapy, and determining the progression or regression of the disease.
Apparatus and system for determining CNV
Analysis of the sequencing data and the diagnostics derived therefrom are typically performed using different computer algorithms and programmers. In one embodiment, the present invention provides a computer program product for generating an output indicating the presence or absence of a fetal aneuploidy in a test sample. The computer product includes a computer-readable medium having computer-executable logic recorded thereon for enabling a processor to diagnose a fetal aneuploidy, comprising: a receiving program for receiving sequencing data from at least a portion of nucleic acid molecules from a maternal biological sample, wherein said sequencing data comprises a calculated chromosome; computer-assisted logic for analyzing fetal aneuploidy from the received data; and an output routine for generating an output indicative of the presence, absence or type of the fetal aneuploidy.
The method of the present invention may be performed using a computer-readable medium having computer-readable instructions stored thereon to perform a method for identifying any CNV (e.g., chromosomal or partial aneuploidy). Accordingly, in one embodiment, the present invention provides a computer-readable medium having computer-readable instructions stored thereon for performing a method for identifying full and partial chromosomal aneuploidies (e.g., fetal aneuploidies).
The method of the invention may also be performed using a computer processing system adapted or configured to perform a method for identifying any CNV (e.g. chromosomal or partial aneuploidy). Accordingly, in one embodiment, the invention provides a computer processing system adapted or configured to perform a method as described herein. In one embodiment, the apparatus comprises a sequencing device adapted or configured to sequence at least a portion of nucleic acid molecules in a sample to obtain sequence information of the type as described elsewhere in the application.
The invention will be illustrated in more detail in the following examples, which are not intended to limit the scope of the invention as claimed in any way. The accompanying drawings are intended to be considered an integral part of the specification and description of the invention. The following examples are provided to illustrate, but not to limit, the claimed invention.
Experiment of
Example 1
Sample processing and DNA extraction
Peripheral blood samples were collected from a plurality of pregnant women who were considered at risk for fetal aneuploidy during their first and second trimesters of pregnancy. Consent was obtained from each participant prior to blood draw. Blood is collected prior to amniocentesis or chorionic villus sampling. Karyotyping is performed using chorionic villus or amniocentesis samples to confirm karyotyping of the fetus.
Peripheral blood drawn from each subject was collected in ACD tubes. One tube of blood sample (approximately 6-9 mL/tube) was transferred to a 15-mL slow centrifuge tube. Blood was centrifuged at 2640rpm at 4 ℃ for 10 minutes using a Beckman Allegra 6R centrifuge and rotor model GA 3.8.
For cell-free plasma extraction, the upper plasma was transferred to a 15-mL high-speed centrifuge tube and centrifuged at 16000x g at 4 ℃ for 10 minutes using a Beckman Coulter Avanti J-E centrifuge and JA-14 rotor. These two centrifugation steps were performed within 72h after plasma collection. Cell-free plasma was stored at-80 ℃ and thawed only once prior to DNA extraction.
Cell-free DNA was extracted from cell-free plasma by using the QIAamp DNA Blood Mini kit (Qiagen) according to the manufacturer's instructions. Five milliliters of buffer AL and 500 μ l of Qiage protease were added to 4.5ml to 5ml of cell-free plasma. The volume was adjusted to 10ml with Phosphate Buffered Saline (PBS) and the mixture was incubated at 56 ℃ for 12 minutes. The precipitated cfDNA was separated from the solution by centrifugation at 8,000RPM in a Beckman microcentrifuge using multiple columns. The columns were washed with AW1 and AW2 buffers and the cfDNA was eluted with 55 μ l of nuclease-free water. Approximately 3.5 to 7ng of cfDNA was extracted from this plasma sample.
All sequencing libraries were prepared from approximately 2ng of purified cfDNA extracted from maternal plasma. Library preparation was performed using NEBNextTMDNA Sample Prep DNA Reagent Set 1 (article edition)Number E6000L; new England Biolabs, Ipshich, MA) (hereinafter referred to as
Figure BDA00002366985700731
) The reagent of (1). Since cell-free plasma DNA is fragmented in nature, no further fragmentation of the plasma DNA sample is performed by spraying or sonication. Overhang of approximately 2ng of purified cfDNA fragment contained in 40. mu.l was determined according to
Figure BDA00002366985700741
End repair module (
Figure BDA00002366985700742
End Repair Module) into phosphorylated blunt ends by applying cfDNA in nebnexnext in a 1.5ml microcentrifuge tubeTMMu.l of 10 Xphosphorylation buffer provided in DNA Sample Prep DNA Reagent Set 1, 2. mu.l of a mixture of deoxynucleotide solutions (10 mM per dNTP), 1. mu.l of a 1: 5 dilution of DNA polymerase I, 1. mu. l T4DNA polymerase and 1. mu. l T4 polynucleotide kinase were incubated at 20 ℃ for 15 minutes. The enzymes were then heat inactivated by incubating the reaction mixture at 75 ℃ for 5 minutes. The mixture was cooled to 4 ℃ and 10. mu.l of a gel containing Klenow fragment (3 'to 5' exo minus) (NEBNext) was usedTMDNA Sample Prep DNA RegentSet 1) completes the dA tail of the blunt-ended DNA in the dA-tail premix and incubates for 15 minutes at 37 ℃. Subsequently, the klenow fragments were heat inactivated by incubating the reaction mixture at 75 ℃ for 5 minutes. After inactivation of klenow fragment, the protein was used in NEBNextTMMu.l of T4DNA ligase provided in DNA Sample Prep DNA Reagent Set 1 were ligated to dA-tailed DNA using 1. mu.l of a 1: 5 dilution of Illumina genomic Adaptor Oligo Mix (article No.: 1000521; Illumina Inc., Hayward, Calif.) by incubating the mixture for 15 minutes at 25 ℃. The mixture was cooled to 4 ℃ and the aptamer was immobilized using magnetic beads provided in the Agencour AMPureXP PCR purification System (article number: A63881; Beckman Coulter Genomics, Danvers, Mass.)Ligated cfDNA is purified from unligated aptamers, aptamer dimers, and other reagents. Use of
Figure BDA00002366985700743
High fidelity premix liquid (
Figure BDA00002366985700744
High-Fidelity Master Mix) (Finnzymes, Woburn, MA) and an aptamer-complementary Illumina PCR primer (article No.: 1000537 and 1000537) to selectively enrich for aptamer-ligated cfDNA. Using Illumina genomic PCR primers (article Nos. 100537 and 1000538) and in NEBNextTMThe Phusion HF PCR premix supplied in DNA Sample Prep DNA Reagent Set 1 (according to the manufacturer's instructions) subjects the aptamer-ligated DNA to PCR (30 seconds at 98 ℃; 18 cycles at 98 ℃ for 10 seconds, 30 seconds at 65 ℃ and 30 seconds at 72 ℃; final extension at 72 ℃ for 5 minutes and held at 4 ℃). The amplified product was purified using the Agencour AMPure XP PCR purification System (Agilent technologies, Beverly, Mass.) according to the manufacturer's instructions (available at www.beckmangenomics.com/products/AMPureXP protocol000387v001. pdf). The purified amplification product was eluted in 40. mu.l Qiagen EB buffer and the concentration and size distribution of the amplified library was analyzed using the Agilent DNA 1000Kit from 2100Bioanalyzer (Agilent technologies Inc.), Santa Clara, Calif.
The amplified DNA was sequenced using the genomic analyzer II from Illumina to obtain single-ended reads of 36 bp. To identify a sequence as belonging to a particular human chromosome, only about 30bp of random sequence information is required. Longer sequences can uniquely identify more specific targets. In this case, a large number of 36bp reads were obtained, covering approximately 10% of the genome. Once sequencing of the sample was completed, Illumina "sequence control software (sequence control software)" transferred the image and base call files to a Unix server running Illumina "genome analyzer Pipeline (genome analyzer Pipeline)" software version 1.51. The Illumina "Gerald" program was run to align sequences with a reference Human genome, derived from the hg18 genome provided by the national center for Biotechnology Information (NCBI36/hg18 available at world web site http:// genome. ucsc. edge/cgi-bin/hgGateway. Sequence data generated from the above programs that uniquely aligned with the genome were read from the Gerald export (export. txt file) by running a program (c2c. pl) on a computer running the Linnux operating system. Sequences with base mismatches are allowed to align and are only included in the alignment count if they are only uniquely aligned with the genome. Sequence alignments (replicates) with identical start and stop coordinates were excluded.
Between about 500 and 1500 million 36bp tags with 2 or fewer mismatches are uniquely mapped onto this human genome. All mapped tags were counted and included in the calculation of chromosome dose for both the test and the qualified samples. Extension from base 0 to base 2X10 of chromosome Y6Base 10X106To base 13x106And base 23x106The regions to the end are explicitly excluded from the analysis, since the tags obtained from both male and female fetuses map to these regions of the Y chromosome.
It should be noted that some variation in the total number of sequence tags maps to individual chromosomes throughout the sample sequenced in the same round (inter-chromosome variability), but that substantially greater variations occur in different rounds of sequencing (variability between sequence sequencing processes).
Example 2
Dosage and variation for chromosomes 13, 18, 21, X, and Y
To examine the extent of inter-chromosomal and inter-sequencing variability in the number of mapped sequence tags for all chromosomes, plasma cfDNA obtained from peripheral blood of 48 volunteer pregnant subjects was extracted and sequenced as described in example 1, and analyzed as follows.
The total number of sequence tags mapped onto each chromosome (sequence tag density) is determined. Alternatively, the number of mapped sequence tags can be normalized to the length of the chromosome to produce a sequence tag density ratio. Normalization to the length of the chromosome is not a necessary step, but can be done separately to reduce the number of digits in a number, thereby simplifying it for human interpretation. The chromosome lengths that can be used to normalize these sequence tag counts may be those provided at the world web site gene. ucsc.
The sequence tag density obtained for each chromosome is correlated with the sequence tag density of each of the remaining chromosomes to derive a qualified chromosome dose, which is calculated as the ratio of the sequence tag density for the chromosome of interest (e.g., chromosome 21) to the sequence tag density for the remaining chromosomes (i.e., chromosomes 1-20, 22 and X). Table 1 provides an example of the qualified chromosome doses calculated for chromosomes 13, 18, 21, X, and Y of interest, which doses were determined in one of the qualified samples. Chromosome dosages were determined for all chromosomes in all samples, and the average dosages for chromosomes 13, 18, 21, X, and Y of interest in the qualifying samples are provided in tables 2 and 3 and depicted in fig. 2-6. Figures 2 to 6 also depict chromosome dosages for the test samples. The chromosome dose for each chromosome of interest in the qualifying sample provides a measure of the total number of mapped sequence tags for each chromosome of interest (relative to each remaining chromosome). Thus, a qualified chromosome dose can identify a chromosome or set of chromosomes that is the normalizing chromosome whose variability between samples is closest to that of the chromosome of interest, and that normalizing chromosome will serve as the ideal sequence for normalizing to further statistically evaluated values. Figures 7 and 8 depict calculated average chromosome doses determined in a qualified sample population for chromosomes 13, 18, and 21, and chromosomes X and Y.
In some cases, the best normalized chromosome may not have the least variability, but may have a distribution of qualifying doses that best distinguishes one or more test samples from those qualifying samples, i.e.: the best normalizing chromosome may not have the lowest variability but may have the greatest resolvability. Thus, the resolvability takes into account the variation in chromosome dose and the distribution of dose in the qualifying samples.
Tables 2 and 3 provide the coefficient of variation as a measure of variability and the t-test values as a measure of the resolvability of chromosomes 18, 21, X and Y, wherein the smaller the t-test value, the greater the resolvability. The resolvability for chromosome 13 was determined as the ratio of the difference between the mean chromosome dose in these qualifying samples and the dose for chromosome 13 in the T13-only test samples to the standard deviation of the mean of the qualifying doses.
These qualified chromosome doses also serve as a basis for determining thresholds when aneuploidy is identified in the test sample as explained below.
TABLE 1
Qualifying chromosome dose for chromosomes 13, 18, 21, X and Y (n ═ 1; sample #11342, 46XY)
Chromosome chr 21 chr 18 chr 13 chr X chrY
chr1 0.149901 0.306798 0.341832 0.490969 0.003958
chr2 0.15413 0.315452 0.351475 0.504819 0.004069
chr3 0.193331 0.395685 0.44087 0.633214 0.005104
chr4 0.233056 0.476988 0.531457 0.763324 0.006153
chr5 0.219209 0.448649 0.499882 0.717973 0.005787
chr6 0.228548 0.467763 0.521179 0.748561 0.006034
chr7 0.245124 0.501688 0.558978 0.802851 0.006472
chr8 0.256279 0.524519 0.584416 0.839388 0.006766
chr9 0.309871 0.634203 0.706625 1.014915 0.008181
chr10 0.25122 0.514164 0.572879 0.822817 0.006633
chr11 0.257168 0.526338 0.586443 0.8423 0.00679
chr12 0.275192 0.563227 0.627544 0.901332 0.007265
chr13 0.438522 0.897509 1 1.436285 0.011578
chr14 0.405957 0.830858 0.925738 1.329624 0.010718
chr15 0.406855 0.832697 0.927786 1.332566 0.010742
chr16 0.376148 0.769849 0.857762 1.231991 0.009931
chr17 0.383027 0.783928 0.873448 1.254521 0.010112
chr18 0.488599 1 1.114194 1.600301 0.0129
chr19 0.535867 1.096742 1.221984 1.755118 0.014148
chr20 0.467308 0.956424 1.065642 1.530566 0.012338
chr21 1 2.046668 2.280386 3.275285 0.026401
chr22 0.756263 1.547819 1.724572 2.476977 0.019966
chrX 0.305317 0.624882 0.696241 1 0.008061
chrY 37.87675 77.52114 86.37362 124.0572 1
TABLE 2
Qualified chromosome dose, variation and resolvability for chromosomes 21, 18 and 13
Figure BDA00002366985700781
Figure BDA00002366985700791
TABLE 3
Qualified chromosome dose, variation and resolvability for chromosomes 13, X and Y
Figure BDA00002366985700792
Figure BDA00002366985700801
Diagnostic examples for T21, T13, T18 and one tner syndrome case obtained using the normalized chromosome, chromosome dose and resolvability for each chromosome of interest are illustrated in example 3.
Example 3
Fetal aneuploidy using normalized chromosome diagnosis
To perform the assessment of aneuploidy in biological test samples using chromosome dosage, maternal blood test samples were obtained from pregnant volunteers and cfDNA was prepared and sequenced and analyzed as described in examples 1 and 2.
Trisomy 21
Table 4 provides the calculated dose for chromosome 21 in an exemplary test sample (# 11403). The calculated threshold for a positive diagnosis of T21 was set at > 2 standard deviations from the mean of these qualifying (normal) samples. The diagnosis of T21 is given based on the chromosome dose in the test sample being greater than a set threshold. Chromosomes 14 and 15 are used in separate calculations as normalization chromosomes to indicate that either the chromosome with the lowest variability (e.g., chromosome 14) or the chromosome with the greatest resolvability (e.g., chromosome 15) can be used to identify aneuploidy. Thirteen T21 samples were identified using the calculated chromosome dose, and these aneuploidy samples were confirmed by karyotype to be T21.
TABLE 4
Chromosome dose for T21 aneuploidy (sample #11403, 47XY +21)
Figure BDA00002366985700802
Figure BDA00002366985700811
Trisomy 18
Table 5 provides the calculated dose for chromosome 18 in one test sample (# 11390). The threshold calculated for this positive diagnosis of T18 was set at > 2 standard deviations from the mean of the qualifying (normal) samples. The diagnosis of T18 is given based on the chromosome dose in the test sample being greater than a set threshold. Chromosome 8 was used as the normalizing chromosome. In this example, chromosome 8 has the lowest variability and the greatest resolvability. Eighteen T18 samples were identified using chromosome dosage and were confirmed by karyotype to be T18.
These data indicate that a single normalization chromosome can have both the lowest variability and the greatest resolvability.
TABLE 5
Chromosome dose for T18 aneuploidy (sample #11390, 47XY +18)
Figure BDA00002366985700812
Trisomy 13
Table 6 provides the calculated dose for chromosome 13 in one test sample (# 51236). The calculated threshold for a positive diagnosis of T13 was set at 2 standard deviations from the mean of these qualified samples. The diagnosis of T13 is given based on the chromosome dose in the test sample being greater than a set threshold. Chromosome dosages were calculated for chromosome 13 using the chromosome set of chromosomes 5 or 3, 4, 5, and 6 as the normalization chromosomes. A T13 sample was identified.
TABLE 6
Chromosome dose for T13 aneuploidy (sample #51236, 47XY +13)
Figure BDA00002366985700821
The sequence tag density for chromosomes 3 through 6 is the average tag count for chromosomes 3 through 6.
This data indicates that the combination of chromosomes 3, 4, 5 and 6 provides less variability than chromosome 5, and greater maximum distinguishability than any of the other chromosomes.
Thus, a set of chromosomes can be used as a normalization chromosome to determine chromosome dose and identify aneuploidies.
Turner syndrome (monomer X)
Table 7 provides the calculated doses for chromosomes X and Y in one test sample (# 51238). The threshold values calculated for a positive diagnosis of turner's syndrome (monosomy X) were set at < -2 standard deviations from the mean of the eligible (normal) samples for the X chromosome and at < -2 standard deviations from the mean of the eligible (normal) samples for the absence of the Y chromosome.
TABLE 7
Chromosome dose for the triner (XO) aneuploidy (sample #51238, 45X)
Figure BDA00002366985700822
Samples with an X chromosome dose less than a set threshold are identified as having less than one X chromosome. The same sample was determined to have a Y chromosome dose less than the set threshold, indicating that the sample does not have Y chromosomes. Thus, combinations of X and Y chromosome doses were used to identify turner syndrome (monosomic X) samples.
Thus, the provided method enables determination of the CNV of a chromosome. In particular, the method enables the determination of over-representative and under-representative chromosomal aneuploidies by massively parallel sequencing of maternal plasma cfDNA and identification of normalized chromosomes for statistical analysis of the sequencing data. The sensitivity and reliability of the method allows accurate first and second trimester aneuploidy determinations to be made.
Example 4
Determination of partial aneuploidy
The use of sequence doses was applied to assess partial aneuploidy of cfDNA biological test samples, prepared from plasma and sequenced as described in example 1. The sample was confirmed by karyotyping from a subject having a partial deletion of chromosome 11.
Analysis of sequencing data for partial aneuploidy (chromosome 11, i.e., partial deletion of q21-q 23) was performed as described for the chromosomal aneuploidy in the previous examples. Mapping of sequence tags to chromosome 11 in a test sample revealed a dramatic loss of tag counts between base pairs 81000082-. The sequence tags of interest (810000082-. The mean sequence dose, standard deviation, and coefficient of variation were calculated for all 20 megabase segments in the entire genome, and the 20-megabase sequence with the smallest variability was identified as the normalized sequence on chromosome 5 (13000014 and 33000033bp) (see table 8) that was used to calculate the dose against the sequence of interest in the test sample (see table 9). Table 8 provides the sequence dose for the sequence of interest (810000082-. Fig. 10 shows the sequence dose for the sequence of interest in 7 pass samples (o) and the sequence dose for the corresponding sequence in this test sample (o). The mean values are shown by the solid lines and the threshold values calculated for positive diagnosis of partial aneuploidy are shown by the dashed lines, which are set at 5 standard deviations from the mean. The diagnosis of partial aneuploidy is given based on the sequence dose in the test sample being less than this set threshold. The test sample was confirmed by karyotyping to have deletions q21-q23 on chromosome 11.
Thus, in addition to identifying chromosomal aneuploidies, the methods of the invention can be used to identify partial aneuploidies.
TABLE 8
For the sequence Chr 11: 81000082-
Figure BDA00002366985700841
TABLE 9
Sequence doses (test sample 11206) against sequences of interest on chromosome 11 (81000082-
Figure BDA00002366985700842
Example 5
Demonstration of aneuploidy detection
The sequence data obtained for the samples illustrated in examples 2 and 3 and shown in figures 2 to 6 were further analyzed to demonstrate the sensitivity of the method in successfully identifying aneuploidies in maternal samples. The normalized chromosome doses for chromosomes 21, 18, 13, X and Y were analyzed as distributions (Y-axis) relative to the standard mean deviation and are shown in fig. 11. The normalization chromosomes used are shown as denominators (X-axis).
FIG. 11(A) shows the distribution of chromosome dose versus standard deviation from the mean for chromosome 21 doses in unaffected samples (o) and trisomy 21 samples (T21;. DELTA.) when chromosome 14 was used as the normalizing chromosome for chromosome 21. FIG. 11(B) shows the distribution of chromosome dose versus standard deviation from the mean for chromosome 18 doses in unaffected samples (o) and trisomy 18 samples (T18;. DELTA.) when chromosome 8 was used as the normalizing chromosome for chromosome 18. FIG. 11(C) shows the distribution of chromosome dose relative to the standard deviation from the mean for chromosome 13 doses in unaffected samples (o) and trisomy 18 samples (T13;. DELTA.), using the mean sequence tag densities for one chromosome set of 3, 4, 5 and 6 as normalization chromosomes to determine chromosome 13 chromosome dose. FIG. 11(D) shows the distribution of chromosome dose versus standard deviation from mean for chromosome X doses in unaffected female samples (o), unaffected male samples (Δ), and monosomy X samples (XO; +), when chromosome 4 was used as the normalizing chromosome for chromosome X. Fig. 11(E) shows the distribution of the standard deviation of chromosome Y dose from the distance mean for the chromosome Y doses in unaffected male samples (o), unaffected female samples (Δ), and monosomic X samples (+) when using the average sequence tag densities for one chromosome set of 1 to 22 and X as normalization chromosomes to determine chromosome dose for chromosome Y.
This data indicates that trisomy 21, trisomy 18, trisomy 13 are clearly distinguishable from the unaffected (normal) samples. When having a chromosome X dose significantly lower than that of the unaffected female sample (fig. 11(D)) and having a chromosome Y dose significantly lower than that of the unaffected male sample (fig. 11(E)), the monosomic X sample can be easily identified.
Thus, the provided methods are sensitive and specific for determining the presence or absence of a chromosomal aneuploidy in a maternal blood sample.
Example 6
Fetal chromosomal aneuploidy was determined using massively parallel DNA sequencing on cell-free fetal DNA from maternal blood: test set 1 independent of training set 1
The study was conducted by qualified fixed-point clinical researchers in 13 U.S. clinical areas between 2009-4 and 2010-10 according to a human subject scientific experimental program approved by the ethical review board (IRB) of each institution. Written consent was obtained from each subject prior to participation in the study. The scientific experimental program was designed to provide blood samples as well as clinical data to support the development of non-invasive prenatal genetic diagnostic methods. Pregnant women 18 years or older are eligible for participation. Blood was collected prior to the procedure for patients undergoing clinically indicated Chorionic Villus Sampling (CVS) or amnion puncture, and the results of fetal karyotypes were also collected. Peripheral blood samples (two tubes or about 20mL total) were drawn from all subjects and placed in Acid Citrate Dextrose (ACD) tubes (Becton Dickinson). All samples were de-identified and assigned an anonymous patient ID number. Blood samples were shipped to the laboratory overnight in temperature-controlled shipping containers provided for the study. The time taken between drawing blood and receiving the sample is recorded as part of the sample site.
The site-directed study coordinator entered clinical data relating to the patient's current pregnancy and history into a study Case Report Form (CRF) using the anonymous patient ID number. Cytogenetic analysis of fetal karyotypes was performed at each laboratory on samples from non-invasive prenatal procedures and the results were also recorded in the study CRF. All data obtained on the CRF is entered into the clinical database of the laboratory. Cell-free plasma was obtained from individual blood tubes using a two-step centrifugation method after 24 to 48 hours of venipuncture sampling. Plasma from a single blood tube is sufficient for sequencing analysis. Cell-free DNA was extracted from cell-free plasma by using the QIAampDNAblood Mini kit (Qiagen) according to the manufacturer's instructions. Since these cell-free DNA fragments are known to be about 170 base pairs (bp) in length (Fan et al, Clin Chem 56: 1279-1286[2010]), there is no need to fragment the DNA prior to sequencing.
For samples of this training set, cfDNA was sent to Prognosys Biosciences, Inc. (La Jolla, CA) for sequencing library preparation (cfDNA blunted and ligated onto common aptamers) and sequenced using the standard manufacturer scientific test program with the Illumina Genome Analyzer IIx instrument (http:// www.illumina.com /). Single-ended reads of 36 base pairs were obtained. After sequencing was completed, all base call files were collected and analyzed. For the test group samples, sequencing libraries were prepared and sequenced on the Illumina Genome Analyzer IIx instrument. The sequencing library was prepared as follows. The full-length scientific experimental program described is primarily the standard scientific experimental program provided by Illumina and differs from the Illumina scientific experimental program only in the purification of the amplified library. Illumina scientific experimental program indicates: the amplified library was purified using gel electrophoresis, while the scientific experiments described herein planned to use magnetic beads for the same purification steps. Preparation of a primary sequencing library using about 2ng of purified cfDNA extracted from maternal plasma, this mainly used
Figure BDA00002366985700871
NEBNext ofTMDNA Sample Prep DNAregent Set 1 (item No.: E6000L; New England Biolabs, Ipswich, Mass.) was performed according to the manufacturer's instructions. All steps were NEBNext accompanying sample preparation for genomic DNA libraries according to the scientific experimental plan, except that the aptamer ligated product was finally purified using Agencourt magnetic beads and reagents instead of purification columnsTMReagent (used)
Figure BDA00002366985700872
GAII sequencing). NEBNextTMThe protocol essentially follows the protocol provided by Illumina, which is available at grcf.
An overhang of approximately 2ng of the purified cfDNA fragment contained in 40. mu.l was passed over at 200. mu.lMu.l cfDNA in a l microcentrifuge tube were used in NEBNextTMMu.l of 10 Xphosphorylation buffer provided in DNA Sample Prep DNA Reagent Set 1, 2. mu.l of a mixture of deoxynucleotide solutions (10 mM per dNTP), 1. mu.l of a 1: 5 dilution of DNA polymerase I, 1. mu. l T4DNA polymerase and 1. mu. l T4 Polynucleotide kinase were incubated at 20 ℃ for 30 minutes, depending on the conditions of the reaction
Figure BDA00002366985700873
The end repair module is converted into a phosphorylated blunt end. The sample was cooled to 4 ℃ and purified using a QIA flash column provided in the QIAQuick pcr purification Kit (QIAGEN inc., Valencia, CA). Mu.l of the reaction was transferred to a 1.5ml centrifuge tube and 250. mu.l of Qiagen Buffer PB was added. The resulting 300. mu.l was transferred to a QIA flash column, which was centrifuged in a microcentrifuge for 1 minute at 13,000 RPM. The column was washed with 750. mu.l Qiagen Buffer PE and recentrifuged. Residual ethanol was removed by re-centrifugation at 13,000RPM for 5 minutes. The DNA was eluted by centrifugation in 39. mu.l QiagenBuffer EB. 16. mu.l of a plasmid containing Klenow fragment (3 '-5' exo minus) (NEBNext) was usedTMdA-tail premix of DNA Sample Prep DNA Reagent Set 1) 34. mu.l of dA tail of blunt-ended DNA was completed and prepared according to the manufacturer
Figure BDA00002366985700874
The dA-labeling Module was incubated at 37 ℃ for 30 minutes. The sample was cooled to 4 ℃ and purified using one column provided in the MinElute PCR Purification Kit (QIAGEN Inc., Valencia, Calif.). Mu.l of the reaction was transferred to a 1.5ml centrifuge tube and 250. mu.l of Qiagen Buffer PB was added. Mu.l was transferred to a MinElute column, which was centrifuged in a microcentrifuge for 1 minute at 13,000 RPM. The column was washed with 750. mu.l Qiagen Buffer PE and recentrifuged. Residual ethanol was removed by re-centrifugation at 13,000RPM for 5 minutes. The DNA was eluted by centrifugation in 15. mu.l Qiagen Buffer EB. According to
Figure BDA00002366985700881
Ten microliters of DNA eluate were incubated with 1. mu.l of a 1: 5 dilution of Illumina Genomic Adapter Oligo Mix (item No. 1000521), 15. mu.l of a 2 XQuickligation Reaction Buffer, and 4. mu.l of Rapid T4DNA ligase at 25 ℃ for 15 minutes. The sample was cooled to 4 ℃ and a MinElute column was used as follows. One hundred fifty microliters of Qiagen Buffer PE was added to 30 μ l of the reaction and the entire volume was transferred to a MinElute column, which was centrifuged in a microcentrifuge for 1 minute at 13,000 RPM. The column was washed with 750. mu.l Qiagen Buffer PE and recentrifuged. Residual ethanol was removed by centrifugation at 13,000RPM for an additional 5 minutes. The DNA was eluted by centrifugation in 28. mu.l Qiagen Buffer EB. Using Illumina genomic PCR primers (article Nos. 100537 and 1000538) and in NEBNextTMThe Phusion HF PCR premix supplied in DNA Sample PrepDNA Reagent Set 1 (according to the manufacturer's instructions) was subjected to twenty-three microliters of aptamer-ligated DNA eluate for 18 PCR cycles (98 ℃ for 30 seconds; 98 ℃ for 18 cycles for 10 seconds, 65 ℃ for 30 seconds, and 72 ℃ for 30 seconds; final extension at 72 ℃ for 5 minutes, and held at 4 ℃). The amplified product was purified using the AgencourtAmpure XP PCR purification System (Agencourt Bioscience Corporation, Beverly, Mass.) according to the manufacturer's instructions (available at www.beckmangenomics.com/products/AMPureXP protocol-000387 v001. pdf). The Agencourt AMPure XP PCR purification system removed unbound dntps, primers, primer dimers, salts, and other contaminants, and recovered amplicons greater than 100 bp. The purified amplification products were eluted from the Agencourt beads in 40. mu.l Qiagen EB buffer and the size distribution of these libraries was analyzed using the Agilent DNA 1000Kit from a 2100Bioanalyzer (Agilent technologies Inc., Santa Clara, Calif.). Single-ended reads of 36 base pairs were sequenced for both the training and test sample sets.
Data analysis and sample classification
Sequence reads 36 bases in length were aligned to the human genome module hg18 obtained from the UCSC database (http:// hgdownload. cse. UCSC. edu/goldenPath/hg18/bigZips /).Using a Bowtie short gene fragment aligner (version 0.12.5) that allows up to two base mismatches during alignment (Langmead et al, Genome Biol 10: R25[2009 ]]) To perform an alignment. Only reads that map clearly to a single genomic position are included. The genomic loci mapped by the reads were counted and included in the calculation of chromosome dose (see below). Regions on the Y chromosome where sequence tags from male and female fetuses mapped without any distinction were excluded from the analysis (specifically, from base 0 to base 2X106Base 10X106To base 13x106(ii) a And base 23x106To the end of the Y chromosome. )
Sequencing variations, both batch-to-batch, in the chromosomal distribution of sequence reads can make the distribution of fetal aneuploidy to mapped sequence sites less apparent. To correct for this variation, a chromosome dose is calculated because the counts for a given mapping site of a chromosome of interest are normalized to the counts observed for a pre-set normalized chromosome sequence. As previously explained, a normalized chromosomal sequence may consist of a single chromosome or of a set of chromosomes. In a subset of samples within the training set of unaffected (i.e., qualified) samples, the normalized chromosome sequences are first identified as diploid karyotypes with chromosomes of interest 21, 18, 13, and X, taking into account each autosome as a potential denominator in the ratio of counts with chromosomes of our interest. The denominator chromosome (i.e., the normalized chromosome sequence) is selected to minimize the variation in chromosome dose between sequencing batches. Each chromosome of interest was identified as having a significant normalized chromosome sequence (denominator) (table 10). No single chromosome can be identified as a normalizing chromosome sequence for chromosome 13 because no chromosome is determined to reduce the variation in dosage of chromosome 13 in the sample, i.e., the spread of NCV values for chromosome 13 is not reduced enough to allow for the correct identification of a T13 aneuploidy. Chromosomes 2 through 6 were randomly selected and tested as a group for their ability to mimic the behavior of chromosome 13. The set of chromosomes 2 to 6 was found to substantially reduce the variation in dose to chromosome 13 in the training set samples and was therefore selected as the normalising chromosome sequence for chromosome 13. As described above, the variation in chromosome dose for chromosome Y is greater than 30, and independently thereof, a single chromosome is used as a normalizing chromosome sequence in determining the dose for chromosome Y. The set of chromosomes 2 to 6 was found to substantially reduce the variation in dose for chromosome Y in the training set samples and was therefore selected as the normalising chromosome sequence for chromosome Y.
The chromosome dose for each chromosome of interest in the qualifying sample provides a measure of the change in the total number of mapped sequence tags for each chromosome of interest relative to the total number of mapped sequence tags for each remaining chromosome. Thus, a qualified chromosome dose can identify the chromosome or set of chromosomes, i.e., the normalized chromosome sequence that has one of the variability in the sample that is closest to the variability of the chromosome of interest, and that will be the ideal sequence for further statistical evaluation of the normalization value.
The chromosome dose for all samples in the training set (i.e., qualified and affected) is also used as a basis for determining a threshold value in identifying aneuploidies in the test sample as explained below.
Watch 10
Normalized chromosome sequences for determining chromosome dosage
Figure BDA00002366985700901
For each chromosome of interest in each sample of the test set, a normalized value is determined and used to determine the presence or absence of aneuploidy. The normalized value is calculated as a chromosome dose, which can be further calculated to provide a Normalized Chromosome Value (NCV).
Chromosome dosage
For the test set, one chromosome dose was calculated for each chromosome of interest 21, 18, 13, X and Y for each sample. As provided in table 10 above, the chromosome dose for chromosome 21 is calculated as the ratio of the number of tags in the test sample mapped to chromosome 21 in the test sample to the number of tags in the test sample mapped to chromosome 9 in the test sample; chromosome dose for chromosome 18 is calculated as the ratio of the number of tags in the test sample mapped to chromosome 18 in the test sample to the number of tags in the test sample mapped to chromosome 8 in the test sample; chromosome 13 chromosome dose is calculated as the ratio of the number of tags in the test sample mapped to chromosome 13 in the test sample to the number of tags in the test sample mapped to chromosomes 2 through 6 in the test sample; chromosome dose for chromosome X is calculated as the ratio of the number of tags in the test sample mapped to chromosome X in the test sample to the number of tags in the test sample mapped to chromosome 6 in the test sample; chromosome dose for chromosome Y is calculated as the ratio of the number of tags in the test sample mapped to chromosome Y in the test sample to the number of tags in the test sample mapped to chromosomes 2 through 6 in the test sample.
Normalized chromosome value
Using the chromosome dose for each chromosome of interest in each test sample and the corresponding chromosome dose determined in the qualifying samples of the training set, a Normalized Chromosome Value (NCV) is calculated using the following equation:
Figure BDA00002366985700911
wherein
Figure BDA00002366985700912
And
Figure BDA00002366985700913
corresponding to the estimated training set mean and standard deviation for the jth chromosome dose, and xijIs for the testThe jth chromosome dose observed for sample i. When the chromosomal doses were normalized distributed, the NCV corresponded to a statistical z-score for these doses. No significant deviation from linearity was observed in the quantile-quantile plots of NCV from unaffected samples. Furthermore, standard tests for the degree of normalization of NCVs fail to overrule the null hypothesis of normality.
For the test set, one NCV was calculated for each chromosome of interest 21, 18, 13, X and Y for each sample. To ensure a safe and efficient classification scheme, conservative boundaries are chosen for the aneuploidy classification. To classify the aneuploidy status of an autosome, NCV > 4.0 is required to classify a chromosome as affected (i.e., aneuploidy for that chromosome); and NCV < 2.5 to classify the chromosome as unaffected. Samples with autosomes having NCV between 2.5 and 4.0 were classified as "undetermined".
In the tests, the classification of sex chromosomes was carried out by applying NCV successively for both X and Y as follows:
1. a male sample is classified as male (XY) if the NCV Y is > -2.0 standard deviation from the mean of the sample.
2. A sample is classified as female (XX) if NCV Y is < -2.0 standard deviation from the mean of a male sample and NCV X is > -2.0 standard deviation from the mean of a female sample.
3. A sample is classified as monomeric X, i.e., Turner's syndrome, if the mean of NCV Y from a male sample is < -2.0 standard deviation, and the mean of NCV X from a female sample is < -3.0 standard deviation.
4. If the NCV does not meet any of the above criteria, the sample is classified as "undetermined" for the gender.
Results
Study demographics
A total of 1,014 patients were enrolled between months 4 and 7 in 2009 and 2010. Patient demographics, invasive procedure type, and karyotype results are summarized in table 11. The mean age of study participants was 35.6 years (ranging from 17 to 47 years) and gestational age ranged from 6 weeks 1 day to 38 weeks 1 day (mean 15 weeks 4 days). The overall incidence of abnormal fetal karyotype was 6.8%, with a 2.5% incidence of T21. Of 946 subjects with single-gestation and karyotype, 906 (96%) presented at least one clinically recognized risk factor for fetal aneuploidy during prenatal processing. Even with the exclusion of those subjects who only had a high gestational age as their sole indication, the data still demonstrated a very high false positive rate for the current screening modality. The results of the ultrasound examination with ultrasound were: increased neck translucency, water cystic lymphangioma, or other structural congenital abnormalities, which are the most predictable abnormal karyotypes in this age group.
TABLE 11
Patient demographics
Figure BDA00002366985700921
Figure BDA00002366985700931
Results including multiple gestation fetuses, assessed and reported by clinicians
Abbreviations: AMA (advanced age of pregnant woman) and NT (neck translucency)
The distribution of the diverse ethnic backgrounds exhibited in the study population is also shown in table 11. Overall, 63% of the patients in this study were caucasians, 17% were hispanic, 6% were asians, 5% were multi-ethnic, and 4% were african americans. Note that the ethnicity differences vary significantly from site to site. For example, one site enrolled 60% of spain and 26% of caucasian subjects, while three clinical sites located in the same state did not enroll spain subjects. As expected, no discernable difference was observed in our results for different ethnicities.
Training data set 1
The training set study picked 71 samples from 435 samples collected between 4 months 2009 and 12 months 2009, which were accumulated consecutively at early age. All subjects with affected fetuses (abnormal karyotypes) in the first series of subjects were included for sequencing, as well as one random pick and a random number of unaffected subjects with appropriate samples and data. The clinical characteristics of the patients in the training set were consistent with the demographics of the overall study shown in table 11. The gestational age range for samples within the training group ranged from 10 weeks 0 days to 23 weeks 1 days. Thirty-eight experienced CVS, 32 experienced amniocentesis and 1 patient did not have the type of invasive procedure specified (unaffected karyotype 46, XY). 70% of patients are caucasians, 8.5% are spain, 8.5% are asians, and 8.5% are multi-ethnic. Six sequenced samples were removed from this group for training purposes. 4 samples were from subjects with twins (discussed in detail below), 1 sample with T18, which was contaminated during the preparation process, and 1 sample with fetal karyotype 69, XXX, leaving 65 samples for the training set.
The number of single sequence sites (i.e., tags recognized with unique sites in the genome) varied from 2.2M in the early stage of the training set study to 13.7M in the later stage (due to improvements in sequencing technology over time). To monitor any potential changes in chromosome dose beyond this 6-fold range in unique loci, different, unaffected samples were run at the beginning and end of the study. For the round of the first 15 unaffected samples, the average number of unique loci was 3.8M and the average chromosome dose for chromosome 21 and chromosome 18 was 0.314 and 0.528, respectively. For the next 15 unaffected sample runs, the average number of unique loci was 10.7M and the average chromosome dose for chromosome 21 and chromosome 18 was 0.316 and 0.529, respectively. There were no statistical differences between chromosome 21 and chromosome 18 chromosome doses over time in the training set study.
The training set NCV for chromosomes 21, 18 and 13 is shown on figure 12. The results shown in fig. 12 are consistent with a normalization degree assumption that: approximately 99% of diploid NCVs will fall within ± 2.5 standard deviations of the mean. Of the 65 samples within this group, 8 samples with clinical karyotype indicative of T21 had NCV ranging from 6 to 20. Four samples with clinical karyotypes indicative of fetal T18 had NCVs ranging from 3.3 to 12, and two samples with clinical karyotypes indicative of fetal trisomy 13(T13) had NCVs of 2.6 and 4. The spread of NCVs in affected samples is due to their dependence on the percentage of fetal cfDNA in a single sample.
Similar to autosomes, mean and standard deviation of sex chromosomes were determined within the training set. The threshold for sex chromosomes allows 100% discrimination between male and female fetuses within the training set.
Test data set 1
After establishing the chromosomal dose mean and standard deviation from the training set, a test set of 48 samples was selected from the samples collected from a total of 575 samples between 1 month 2010 and 6 months 2010. One of the samples from the twins was removed from the final analysis, leaving 47 samples in the test group. Personnel preparing samples for sequencing and handling the equipment were blinded to clinical karyotype information. Gestational age ranges were similar to those seen in the training group (table 11). The 58% of invasive procedures were CVS, higher than the overall procedural demographics, but also similar to the training set. 50% of subjects were caucasians, 27% were hispanic, 10.4% were asians and 6.3% were african americans.
Within the test group, the number of unique sequence tags varied from about 13M to 26M. For unaffected samples, chromosome doses were 0.313 and 0.527 for chromosome 21 and chromosome 18, respectively. The test set of NCVs for chromosome 21, chromosome 18 and chromosome 13 is shown in fig. 13 and the classification is given in table 12.
TABLE 12
Test component class data
Figure BDA00002366985700951
Figure BDA00002366985700961
MX is a haplotype of the X chromosome, while the Y chromosome has no sign
Within the test group, 13/13 subjects with karyotype indicated as fetal T21 were correctly identified as having NCV ranging from 5 to 14. Eight/eight subjects with karyotype indicated as fetal T18 were correctly identified as having NCV ranging from 8.5 to 22. Within this test group, a single sample with a classification of T13 was classified as undetermined where the NCV was about 3.
For the test data set, all male samples were correctly identified, including samples with complex karyotypes 46, XY + marker chromosomes (unrecognized by cytogenetics) (table 3). For the three samples with kernel type 45, X in the test group, two of the three were correctly identified as monosomy X and 1 was classified as undetermined (table 12).
Double tire
Four of the samples initially selected for the training group and one within the test group was from a twin pregnancy. The thresholds used herein may be plagued by different amounts of cfDNA expected in the context of a twin pregnancy. Within the training set, the karyotype from one of the twinned samples was chorionic villus 47, XY + 21. A second double fetus sample was heterooogenic and each fetus was individually subjected to amniocentesis. In this twin pregnancy, one fetus has a karyotype of 47, XY +21 and the other has a normal karyotype of 46, XX. In both cases, cell-free classification based on the methods discussed above classified the sample as T21. The other two twins within the training set were correctly classified as unaffected for T21 (all twins showed diploid karyotypes for chromosome 21). For twins within the test group, the karyotype was established for only twins B (46, XX) and the algorithm was correctly classified as unaffected for T21.
Conclusion
The data indicate that massively parallel sequencing can be used to determine multiple abnormal fetal karyotypes from the blood of pregnant women. These data indicate that 100% correct classification of samples with trisomy 21 and trisomy 18 can be identified using independent test panel data. Even in the case of fetuses with abnormal sex chromosome karyotypes, no sample is misclassified with the algorithm of this method. Importantly, the algorithm also performed well in determining the presence or absence of T21 in the two groups of twins. Furthermore, the present study examined many consecutive samples from multiple centers, representing not only the range of abnormal karyotypes one might see in a commercial clinical setting, but also demonstrates the importance of accurately categorizing pregnancies unaffected by common trisomies to emphasize the high to unacceptable false positive rate present in current prenatal screening. This data provides valuable insight into the great potential of using the method in the future. Analysis of a subset of unique gene loci indicates an increase in variance-consistent poisson count statistics.
This data was based on the findings of Fan and Quake, which confirmed: the sensitivity of noninvasive determination of fetal aneuploidy from maternal plasma using massively parallel sequencing is limited only by counting statistics (Fan and Quake, PLos One 5, e10439[2010 ]). Because sequencing information is collected throughout the entire genome, this method enables the determination of any aneuploidy or other copy number variation, including insertions and deletions. The karyotype from one of the samples had a small deletion in chromosome 11 between q21 and q23, and when sequencing data were analysed in a 500k base data box, a reduction of about 10% in the relative number of tags within a 25Mb region starting at q21 was observed. Furthermore, within the training set, three of the samples had complex karyotypes due to chimerism in cytogenetic analysis. These karyotypes are: i)47, XXX 9/45, X6, ii)45, X3/46, XY 17, and iii)47, XXX 13/45, X7. Samples ii that exhibited some XY-containing cells were correctly classified as XY. Samples i (from the CVS process) and iii (from amniocentesis), both displaying a mixture of XXX and X cells by cytogenetic analysis (consistent with chimerict turner syndrome), were classified as undetermined and monomeric X, respectively.
When testing the algorithm, another interesting data point was observed for chromosome 21 from one sample of the test set (fig. 13) with an NCV between-5 and-6. Although the sample is diploid on chromosome 21 by cytogenetics, the karyotype exhibits chimerism with partial triploidy for chromosome 9: 47, XX +9[9]/46, XX [6 ]. This reduces the overall NCV value since chromosome 9 was used in the denominator to determine chromosome dose for chromosome 21 (table 10). The results provided in example 7 below demonstrate the ability to determine fetal trisomy 9 in this sample using the normalization chromosomes.
The conclusion of Fan et al on the sensitivity of these methods is only valid if the algorithm used is able to take into account any random or systematic deviations from the sequencing method. If the sequencing data is not properly normalized, the resulting analysis will be inferior to the count statistics. Chiu et al noted in their recent papers that their measurements of chromosomes 18 and 13 using massively parallel sequencing methods were inaccurate and concluded that more research was required to apply the method to the determination of T18 and T13 (Chiu et al, BMJ 342: c7401[ 2011). The method used in the Chiu et al paper simply uses the number of sequence tags for the chromosome of interest in their case chromosome 21, normalized by the total number of tags in the sequencing round. The challenges with this approach are: the distribution of tags on each chromosome can vary from sequencing round to sequencing round, and thus increases the overall variation in the aneuploidy measurement metric. To compare the results of the Chiu algorithm with the doses of chromosomes used in this example, the test data for chromosomes 21 and 18 were reanalyzed using the method recommended by Chiu et al, as shown in FIG. 14. Overall, compression in the range of NCV was observed for each of chromosomes 21 and 18, and a reduction in the certainty rate was observed, with the T18 samples of T21 and 5/8 of 10/13 correctly identified from our test set using the NCV threshold of 4.0 for aneuploidy classification.
Ehrich et al also focused on T21 only and used the same algorithm as Chiu et al (Ehrich et al, Am J Obstet Gynecol 204: 205e1-e11[2011 ]). In addition, after observing a shift in their test set z-score measure from the external reference data (i.e., the training set), they retrained the test set to establish classification boundaries. While this approach is feasible in principle, it would be challenging in practice to decide how many samples to train and how often to retrain to ensure that the classification data is correct. One way to alleviate this problem is to include controls in each sequencing run that measure the baseline and are calibrated for quantitative behavior.
Data obtained using the present method show that massively parallel sequencing is able to determine a variety of fetal chromosomal abnormalities from the plasma of pregnant women when the algorithm used to normalize the chromosome count data is optimized. The method is used for quantification not only to minimize random and systematic variations between sequencing rounds, but also to allow classification of aneuploidy throughout the entire genome, most notably T21 and T18. A larger sample collection is required to test the algorithm for determining T13. To this end, a promising, blind, multi-site clinical study is ongoing to further demonstrate the diagnostic accuracy of the present method.
Example 7
Determining the presence or absence of at least 5 different chromosomal aneuploidies in all chromosomes of a single test sample
To demonstrate the ability of the present method to determine the presence or absence of any chromosomal aneuploidy in each set of maternal test samples (test set 1; example 6), systematically determined normalized chromosome sequences were identified in the unaffected test set samples (training set 1; example 6) and used to calculate chromosome dosages for all chromosomes for each test sample. Determining the presence or absence of any one or more different intact fetal chromosomal aneuploidies in each test and training set sample is accomplished from sequencing information obtained from a single sequencing run performed on each individual sample.
Using chromosome density, i.e., the number of sequence tags identified for each chromosome in the samples of each test set as illustrated in example 6, a systematically determined normalized chromosome sequence consisting of a single chromosome or set of chromosomes was determined by calculating a single chromosome dose for each of chromosomes 1-22, X, and Y. By systematically calculating chromosome dosages for each chromosome using each possible chromosome combination as a molecule, a systematically determined normalized chromosome sequence for each of chromosomes 1-22, X, and Y is determined. For example, for chromosome 21 as the chromosome of interest, the chromosome dose is calculated as a ratio of (i) the number of sequence tags obtained for chromosome 21 (the chromosome of interest) and (ii) the number of sequence tags obtained for each remaining chromosome to the sum of the number of tags obtained for all possible combinations of remaining chromosomes (not including chromosome 21), i.e.: 1.2, 3, 4, 5, etc. up to 20, 21, 22, X and Y; 1+2, 1+3, 1+4, 1+5, etc. up to 1+20, 1+22, 1+ X, and 1+ Y; 1+2+3, 1+2+4, 1+2+5, and so on up to 1+2+20, 1+2+22, 1+2+ X, and 1+2+ Y; 1+3+4, 1+3+5, 1+3+6, and so on up to 1+3+20, 1+3+22, 1+3+ X, and 1+3+ Y; 1+2+3+4, 1+2+3+5, 1+2+3+6, and so on up to 1+2+3+20, 1+2+3+22, 1+2+3+ X, and 1+2+3+ Y; and so on, such that all possible combinations of all chromosomes 1-20, 22, X and Y are used as the normalized chromosome sequence (molecule) to determine all possible chromosome doses for each chromosome of interest in each of these qualified (aneuploidy) samples within the training set. Chromosome doses were determined in the same manner for chromosomes 21 in all training set samples, and these normalized chromosome sequences determined systematically for chromosomes 21 were determined as a single or set of chromosomes that resulted in a dose for 21 with minimal variability throughout all training samples. The same analysis is repeated to determine the single chromosome or combination of chromosomes that will be the systematically determined normalized chromosome sequence for each remaining chromosome (including chromosomes 13, 18, X and Y), i.e., all possible chromosome combinations are used to determine the normalized sequences (single chromosome or set of chromosomes) for all other chromosomes 1-12, 14-17, 19-20, 22, X and Y of interest in all training samples. Thus, all chromosomes are considered as chromosomes of interest, and a systematically determined normalization sequence is determined for each of all chromosomes in each unaffected sample within the training set. Table 13 provides the individual chromosomes or chromosome sets identified as systematically determined normalizing sequences for each chromosome 1-22, X, and Y of interest. As highlighted in table 13, for some chromosomes of interest, the systematically determined normalized chromosome sequences are determined to be a single chromosome (e.g., when chromosome 4 is the chromosome of interest), and for other chromosomes of interest, the systematically determined normalized chromosome sequences are determined to be a set of chromosomes (e.g., when chromosome 21 is the chromosome of interest).
Watch 13
Systematically determined normalized chromosome sequences for all chromosomes
Figure BDA00002366985701001
Figure BDA00002366985701011
The mean, Standard Deviation (SD), and Coefficient of Variation (CV) of the systematically determined normalized chromosome sequences determined for each of all chromosomes are given in table 14.
TABLE 14
Mean, Standard Deviation (SD) and Coefficient of Variation (CV) for systematically determined normalized chromosomal sequences
Chromosome of interest Mean value of SD CV
1 0.36637 0.00266 0.72%
2 0.31580 0.00068 0.22%
3 0.21983 0.00055 0.18%
4 0.98191 0.02509 2.56%
5 0.30109 0.00076 0.25%
6 0.21621 0.00059 0.27%
7 0.21214 0.00044 0.21%
8 0.25562 0.00068 0.27%
9 0.12726 0.00034 0.27%
10 0.24471 0.00098 0.40%
11 0.26907 0.00098 0.36%
12 0.12358 0.00029 0.23%
13a 0.26023 0.00122 0.47%
14 0.09286 0.00028 0.30%
15 0.21568 0.00147 0.68%
16 0.25181 0.00134 0.53%
17 0.46000 0.00248 0.54%
18a 0.10100 0.00038 0.38%
19 1.43709 0.02899 2.02%
20 0.19967 0.00123 0.62%
21a 0.07851 0.00053 0.67%
22 0.69613 0.01391 2.00%
Xb 0.46865 0.00279 0.68%
Yb 0.00028 0.00004 14.97%
aNot including trisomy
bFemale fetus
The variation of chromosome dose (as reflected by the value of CV) across all training samples confirms the use of systematically determined normalized chromosome sequences to provide a large signal-to-noise ratio and dynamic range, allowing the determination of aneuploidy with high sensitivity and high specificity, as shown below.
To demonstrate the sensitivity and specificity of this method, the chromosome dose for all chromosomes of interest 1-22, X and Y in each sample within the training set was determined for all chromosomes of interest 1-22, X and Y, and each of all samples within the test set illustrated in example 5 used the corresponding, systematically determined normalized chromosome sequences provided in table 13 above.
Using the systematically determined normalized chromosome sequences for each chromosome of interest, it was determined whether any fetal aneuploidy was present or absent in the samples of each training set and in each test sample, i.e., whether chromosomes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, and Y of each sample all contained a complete fetal chromosomal aneuploidy. Sequence information, i.e., the number of sequence tags, was obtained for all chromosomes in the samples of each training set and in each test sample, and the number of sequence tags obtained using systematically determined normalized chromosome sequences (table 13) corresponding to those determined within the test set was used for each chromosome in each training and test sample to calculate a single chromosome dose as described above. The number of sequence tags obtained for the systematically determined normalized chromosome sequences in each training sample is used to determine the chromosome dosage for each chromosome in each training sample, and the number of sequence tags obtained for the systematically determined normalized chromosome sequences in each test sample is used to determine the chromosome dosage for each chromosome in each test sample. To ensure safe and efficient classification of aneuploidy, the same conservative boundaries were chosen as illustrated in example 6.
Results of the training set
A plot of chromosome dose for chromosomes 21, 18 and 13 in the samples of the training set using the systematically determined normalized chromosome sequences is given in fig. 15. When using a systematically determined normalized chromosome sequence (i.e., set of chromosomes 4+14+16+20+ 22), 8 samples in which clinical karyotype indicated T21 had NCVs between 5.4 and 21.5. When using a systematically determined normalized chromosome sequence (i.e. set of chromosomes 2+3+5+ 7), the 4 samples in which clinical karyotype indicated T18 had NCV between 3.3 and 15.3. When using a systematically determined normalized chromosome sequence (i.e., set of chromosomes 4+ 5), 2 samples in which clinical karyotype indicated T13 had NCVs between 8.0 and 12.4. The T21 samples of the training set were shown as the last 8 samples of chromosome 21 data (∘); the T18 samples of the training set are shown as the last 4 samples of chromosome 18 data (Δ); and the T13 samples of the training set are shown as the last 2 samples of chromosome 13 data (□).
These data indicate that different, intact fetal chromosomal aneuploidies can be determined and correctly classified with high confidence using the normalized chromosome sequences. Since all samples with affected karyotypes have NCVs greater than 3, there is a probability of less than about 0.1%, namely: these samples were part of the unaffected distribution.
Similar to autosomes, when a systematically determined normalized chromosomal sequence (i.e., set of chromosomes 4+ 8) is used for chromosome X, and when a systematically determined normalized chromosomal sequence (i.e., set of chromosomes 4+ 6) is used for chromosome Y, all female and male fetuses within the training set are correctly identified. In addition, all 5 monomeric X samples were identified. Figure 18A shows a plot of the NCV determined for the X chromosome (X-axis) and the NCV determined for the Y chromosome (Y-axis) for each sample within the training set. All samples that were haplotype X by karyotype had NCV values less than-4.83. Those monomeric X samples with a karyotype consistent with the 45, X karyotype (complete or chimeric) had a Y NCV value as expected close to zero. Female samples for both X and Y were clustered around NCV ═ 0.
Test set of results
A plot of chromosome dose for chromosomes 21, 18 and 13 in the test samples using the relevant systematically determined normalized chromosome sequences is given in fig. 16. When using a systematically determined normalized chromosome sequence (i.e., set of chromosomes 4+14+16+20+ 22), 13 of the 13 samples in which clinical karyotype indicated T21 were correctly identified as having NCV between 7.2 and 16.3. When using a systematically determined normalized chromosome sequence (i.e., set of chromosomes 2+3+5+ 7), all 8 samples in which clinical karyotype indicated T18 were identified with NCVs between 12.7 and 30.7. When using a systematically determined normalized chromosome sequence (i.e., set of chromosomes 4+ 5), only one sample in which clinical karyotype indicated T13 was correctly identified with an NCV of 8.6. The T21 sample of the test group was shown as the last 13 samples of chromosome 21 data (∘); the T18 samples of the test set are shown as the last 8 samples of chromosome 18 data (Δ); and the T13 sample of the test group is shown as the last sample of chromosome 13 data (□).
These data indicate that systematically determined, normalized chromosome sequences can be used with high confidence to determine and correctly classify different intact fetal chromosomal aneuploidies. Similar to the training set, all samples with affected karyotypes had NCVs greater than 7, indicating a very small probability that: these samples are part of the unaffected distribution. (FIG. 16).
Similar to autosomes, when a systematically determined normalized chromosomal sequence (i.e., set of chromosomes 4+ 8) is used for chromosome X, and when a systematically determined normalized chromosomal sequence (i.e., set of chromosomes 4+ 6) is used for chromosome Y, all female and male fetuses within the test set are correctly identified. In addition, all 3 haplotypes X samples were identified. Figure 18B shows a plot of the NCV determined for the X chromosome (X-axis) and the NCV determined for the Y chromosome (Y-axis) for each sample within the test set.
As explained above, the method allows the determination of the presence or absence of a complete, or partial, chromosomal aneuploidy of each of chromosomes 1-22, X and Y in each sample. In addition to determining intact chromosomal aneuploidies T13, T18, T21, and monosomy X, the method determines the presence of trisomy of chromosome 9 in one of the test samples. When using systematically determined normalized chromosome sequences (i.e., set of chromosomes 3+4+8+10+17+19+20+ 22), a sample with an NCV of 14.4 was identified for chromosome 9 of interest (fig. 17). This sample corresponds to the test sample in example 6, which is suspected of being aneuploid for chromosome 9 (where chromosome 9 was used as the normalizing chromosome sequence in example 6) based on the low dose of the computed deformity for chromosome 21.
This data indicates that 100% of samples with clinical karyotypes indicative of T21, T13, T18, T9, and haplotype X were correctly identified. Figure 19 shows a plot of NCV for each of chromosomes 1-22 in each of 47 test samples. The median of NCV was normalized to zero. This data shows that the method of the invention (including the use of systematically determined normalised chromosome sequences) determines the presence of all 5 types of chromosomal aneuploidy present in this test set with 100% sensitivity and 100% specificity and clearly indicates that the method can identify any chromosomal aneuploidy for any of chromosomes 1-22, X and Y in any sample.
Example 8
Determining the presence or absence of a partial fetal chromosomal aneuploidy: determining cat eye syndrome
Digger-alder syndrome (22q11.2 deletion syndrome), a condition caused by a defect in chromosome 22, leads to poor development of several body systems. Medical problems commonly associated with degrang's syndrome include cardiac defects, poor immune system function, cleft palate, parathyroid gland, and behavioral disorders. The number and severity of the problems associated with deguelge syndrome vary greatly. Almost every person with deguelg syndrome requires treatment from experts in multiple areas.
To determine the presence or absence of a partial deletion of fetal chromosome 22, a blood sample was obtained by performing venipuncture on the mother, and cfDNA was prepared as described in the examples above. The purified cfDNA was ligated to an aptamer and subjected to cluster amplification using an Illumina cBot clustering station (cluster station). Massively parallel sequencing was performed using reversible dye terminators to generate millions of 36bp reads. These sequence reads were aligned to the human hg19 reference genome and reads uniquely mapped to the reference genome were counted as tags.<0}
A set of qualified samples that are all known as diploid for chromosome 22 (i.e., chromosome 22 or any portion thereof is known to exist only in the diploid state) is first sequenced and analyzed to obtain a plurality of sequence tags for each of the 1000 segments of 3 megabases (Mb), excluding region 22q 11.2. If the human genome comprises about 30 hundred million bases (3Gb), 1000 segments of 3Mb each constitute about the remainder of the genome. Each of these 1000 segments can be served individually or as a set of segment sequences that are used to determine the normalized segment sequence of the segment of interest, i.e., the 3Mb region of 22q 11.2. The number of sequence tags mapped onto each single 1000bp segment was used separately to calculate the segment dose for the 22q11.2 3Mb region. Furthermore, all possible combinations of two or more segments are used to determine the segment dose for the segment of interest in all qualified samples. The single 3Mb segment or a combination of two or more 3Mb segments that resulted in a segment dose with the lowest variability across the sample was selected as the normalized segment sequence.
The number of sequence tags that map onto the segment of interest in each qualifying sample is used to determine the segment dose in each qualifying sample. The mean and standard deviation of the sector doses in all qualifying samples were calculated and used to determine thresholds against which the sector doses determined in the test samples can be compared. Preferably, Normalized Segment Values (NSV) are calculated for all segments of interest in all qualifying samples, and these values are used to set the threshold.
The number of tags mapped to the normalized segment sequences in the corresponding test sample is then used to determine the dose of the segment of interest in the test sample. A Normalized Segment Value (NSV) is calculated for the segment in the test sample as previously described and the NCV of the segment of interest in the test sample is compared to a threshold determined using a qualified sample to determine the presence or absence of the 22q11.2 deletion in the test sample.
The test NCV < -3 indicates a loss in the segment of interest, i.e., the presence of a partial deletion of chromosome 22(22q11.2) in the test sample.
Example 9
Fecal DNA testing to obtain predictive outcomes for stage II colon cancer patients
Approximately 30% of all stage II colon cancer patients will relapse and die from the disease they suffer from. Stage II colon cancer patients who have had a relapse of disease show significantly more loss on chromosomes 4, 5, 15q, 17q and 18 q. In particular, loss of stage II colon cancer patients on 4q22.1-4q35.2 has been shown to be associated with worse outcomes. Determination of the presence or absence of these genomic alterations can aid in the selection of patients for adjuvant therapy (Brosens et al, analysis of cytopathology/cell Oncology 33: 95-104[2010 ]).
To determine the presence or absence of one or more chromosome deletions in the region 4q22.1 to 4q35.2 in a patient suffering from stage II colon cancer, stool and/or plasma samples are obtained from the patient or patients. Fecal DNA was obtained according to Chen et al, J Natl Cancer Inst 97: 1124 and 1132[2005] in the same manner as described above; and plasma DNA was prepared according to the method described in the examples above. DNA was sequenced according to the NGS method described herein, and sequence information of the patient sample(s) was used to calculate segment doses for one or more segments spanning the 4q22.1 to 4q35.2 regions. The segment dose is determined using a previously determined normalized segment dose within a panel of qualified stool and/or plasma samples, respectively. Segment doses in test samples (patient samples) were calculated and the presence or absence of one or more partial chromosome deletions in the 4q22.1 to 4q35.2 region was determined by comparing each segment of interest to a threshold set by NSV within a qualified sample set.
While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided herein by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It will be appreciated that a number of different alternatives to the embodiments of the invention described herein may be utilized in practicing the invention. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.

Claims (32)

1. A computer processing system for determining the presence or absence of any one or more different intact fetal chromosomal aneuploidies for one or more chromosomes of interest in a maternal test sample comprising fetal and maternal nucleic acids, the system comprising:
(a) a means for identifying at least one normalized chromosome of any one or more chromosomes of interest in a set of qualifying samples, said qualifying samples comprising diploid samples that exist at known copy numbers and that have normal copy numbers for said chromosomes of interest; determining which individual or group of chromosomes is the normalizing chromosome that results in the least variability in chromosome dose for chromosomes of interest across a qualified set of samples by using any of chromosomes 1-22, X, and Y, and a combination of two or more of chromosomes 1-22, X, and Y, whereby the system calculates all possible chromosome doses and determines a systematically determined normalizing chromosome for each chromosome of interest, determining a systematically determined sequence of normalizing chromosomes for each of chromosomes 1-22, X, and Y by systematically calculating chromosome doses for each chromosome using each possible chromosome combination as a molecule; for the chromosome of interest, the chromosome dose is calculated by the ratio between (i) and (ii) as follows: (i) the sum of the number of sequence tags obtained for the chromosome of interest and (ii) the number of sequence tags obtained for each of the remaining chromosomes, and the number of tags obtained for all possible combinations of remaining chromosomes that do not include the chromosome of interest;
(b) a means for receiving sequence information for said fetal and maternal nucleic acids in said sample;
(c) means for using the sequence information to identify a number of sequence tags for each of any one or more chromosomes of interest selected from chromosomes 1-22, X, and Y, and to identify a number of sequence tags for the normalizing chromosome for each of the any one or more chromosomes of interest;
(d) means for calculating a single chromosome dose for each of any one or more chromosomes of interest using the number of sequence tags identified for each of the one or more chromosomes of interest and the number of sequence tags identified for each of the normalization chromosomes; and is
(e) Means for comparing each said single chromosome dose for each of said any one or more chromosomes of interest to a threshold value for each of said any one or more chromosomes of interest and thereby determining the presence or absence of any one or more intact, distinct fetal chromosomal aneuploidies in said sample.
2. The system of claim 1, wherein the device (c) comprises: means for calculating a single chromosome dose for each of said chromosomes of interest as a ratio of the number of sequence tags identified for each of said chromosomes of interest to the number of such sequence tags identified for each of said normalized chromosome sequences of said chromosomes of interest.
3. The system of claim 1, wherein the device (c) comprises:
(i) means for calculating a sequence tag density ratio for each said chromosome of interest by correlating the number of sequence tags identified in device (b) for each said chromosome of interest with the length of each said chromosome of interest;
(ii) means for calculating a sequence tag density ratio for each of said normalized chromosome sequences by correlating the number of sequence tags identified in means (b) for each of said normalized chromosome sequences with the length of each of said normalized chromosome sequences; and is
(iii) Means for calculating a single chromosome dose for each of said chromosomes of interest using the sequence tag density ratios calculated in means (i) and (ii), wherein said chromosome dose is calculated as a ratio of the sequence tag density ratio for each of said chromosomes of interest to the sequence tag density ratio of said normalized chromosome sequence for each of said chromosomes of interest.
4. The system of claim 1, wherein any one or more of the chromosomes of interest selected from chromosomes 1-22, X, and Y comprises at least twenty chromosomes selected from chromosomes 1-22, X, and Y, and wherein the presence or absence of at least twenty different, intact fetal chromosomal aneuploidies is determined by the system.
5. The system of claim 1, wherein any four or more chromosomes of interest selected from chromosomes 1-22, X, and Y are all chromosomes 1-22, X, and Y, and wherein the presence or absence of a complete fetal chromosomal aneuploidy, all of which are chromosomes 1-22, X, and Y, is determined by the system.
6. The system of claim 1, wherein the normalizing chromosome sequence is a single chromosome selected from chromosomes 1-22, X, and Y.
7. The system of claim 1, wherein the normalizing chromosome sequence is a set of chromosomes selected from chromosomes 1-22, X, and Y.
8. The system of claim 1, wherein the different intact chromosomal aneuploidies are selected from an intact chromosomal trisomy, an intact chromosomal monosomy, and an intact chromosomal polysomy.
9. The system of claim 1, wherein the different, intact fetal chromosomal aneuploidies are selected from the group consisting of: trisomy 2, trisomy 8, trisomy 9, trisomy 21, trisomy 13, trisomy 16, trisomy 18, trisomy 22, trisomy 47, XXY, 47, XXX, 47, XYY, and monosomy X.
10. The system of claim 1, wherein the presence or absence of any four or more different, intact fetal chromosomal aneuploidies in each of the samples is determined by the system.
11. The system of claim 1, further comprising a means for calculating a Normalized Chromosome Value (NCV), wherein said NCV correlates said chromosome dose to an average of corresponding chromosome doses in a set of qualifying samples as:
Figure FDA0001870839110000031
wherein
Figure FDA0001870839110000032
And
Figure FDA0001870839110000033
are the estimated mean and standard deviation, respectively, for the jth chromosome dose in a set of qualifying samples, and xijIs the jth chromosome dose observed for test sample i.
12. A computer processing system for determining the presence or absence of any one or more different, partial fetal chromosomal aneuploidies for any one or more fragments of any one or more chromosomes of interest in a maternal test sample comprising fetal and maternal nucleic acids, the system comprising:
(a) a means for identifying at least one normalized fragment of said one or more fragments of any one or more chromosomes of interest in a set of qualifying samples, said qualifying samples comprising diploid samples that exist at known copy numbers and that have normal copy numbers for said chromosomes of interest; determining which individual or group of chromosomes is the normalizing chromosome that results in the least variability in chromosome dose for chromosomes of interest across a qualified set of samples by using any of chromosomes 1-22, X, and Y, and a combination of two or more of chromosomes 1-22, X, and Y, whereby the system calculates all possible chromosome doses and determines a systematically determined normalizing chromosome for each chromosome of interest, determining a systematically determined sequence of normalizing chromosomes for each of chromosomes 1-22, X, and Y by systematically calculating chromosome doses for each chromosome using each possible chromosome combination as a molecule; for the chromosome of interest, the chromosome dose is calculated by the ratio between (i) and (ii) as follows: (i) the sum of the number of sequence tags obtained for the chromosome of interest and (ii) the number of sequence tags obtained for each of the remaining chromosomes, and the number of tags obtained for all possible combinations of remaining chromosomes that do not include the chromosome of interest;
(b) a means for receiving sequence information for said fetal and maternal nucleic acids in said sample;
(c) means for using said sequence information to identify a number of sequence tags for said any one or more segments of any one or more chromosomes of interest each selected from chromosomes 1-22, X, and Y and a number of sequence tags for said normalized segment sequence of any one or more segments of each said any one or more chromosomes of interest;
(d) means for calculating a single chromosome dose for each of any one or more segments of any of said one or more chromosomes of interest using the number of said sequence tags identified for any one or more segments of each of said any one or more chromosomes of interest and the number of said sequence tags identified for each of said normalized segment sequences; and is
(e) Means for comparing each said single chromosome dose in any one or more segments for each said any one or more chromosomes of interest to a threshold for each said any one or more segments for any said any one or more chromosomes of interest, and thereby determining the presence or absence of one or more different, partial fetal chromosomal aneuploidies in said sample.
13. The system of claim 12, wherein the device (c) comprises: means for calculating a single fragment dose for each of any one or more segments of any of said one or more chromosomes of interest as a ratio of the number of sequence tags identified for each of any one or more segments of any of said one or more chromosomes of interest to the number of sequence tags identified for said normalized fragment sequence for any one or more segments of each of said any one or more chromosomes of interest.
14. A system according to claim 12 or 13, further comprising a means for calculating a Normalized Segment Value (NSV), wherein said NSV correlates said segment dose to an average of corresponding segment doses in a set of qualifying samples as:
Figure FDA0001870839110000051
wherein
Figure FDA0001870839110000052
And
Figure FDA0001870839110000053
respectively, the estimated mean and standard deviation for the jth fragment dose in a set of qualifying samples, and xijIs the observed jth fragment dose for test sample i.
15. The system of claim 12, wherein the normalized fragment sequence is a single fragment of any one or more of chromosomes 1-22, X, and Y.
16. The system of claim 12, wherein the normalized fragment sequence is a set of fragments of any one or more of chromosomes 1-22, X, and Y.
17. The system of claim 12 or 13, wherein the different partial fetal chromosomal aneuploidies are selected from partial replication, partial doubling, partial insertion, and partial deletion.
18. The system of claim 12 or 13, wherein the partial fetal aneuploidy is selected from the group consisting of a partial monosomy of chromosome 1, a partial monosomy of chromosome 4, a partial monosomy of chromosome 5, a partial monosomy of chromosome 7, a partial monosomy of chromosome 11, a partial monosomy of chromosome 15, a partial monosomy of chromosome 17, a partial monosomy of chromosome 18, and a partial monosomy of chromosome 22.
19. The system of claim 12 or 13, wherein the presence or absence of a distinct, partial fetal chromosomal aneuploidy in each of the samples is determined by the system.
20. The system of claim 1, 12 or 13, wherein means (a) comprises a means for sequencing at least a portion of said nucleic acid molecules of said test sample so as to obtain said sequence information for said fetal and maternal nucleic acid molecules of said test sample.
21. The system of claim 1, 12 or 13, wherein said test sample is a maternal sample selected from the group consisting of blood, plasma, serum, urine, and saliva samples.
22. The system of claim 1, 12 or 13, wherein the nucleic acid molecule is a mixture of fetal and maternal cell-free DNA molecules.
23. The system of claim 20, wherein the sequencing is Next Generation Sequencing (NGS).
24. The system of claim 20, wherein the sequencing is massively parallel sequencing using sequencing by synthesis with reversible dye terminators.
25. The system of claim 20, wherein the sequencing is ligation sequencing.
26. The system of claim 20, wherein said sequencing comprises an amplification.
27. The system of claim 20, wherein the sequencing is single molecule sequencing.
28. The system of claim 1 or 12, wherein the one or more chromosomes of interest is chromosome 13, and wherein the normalized chromosome of chromosome 13 is selected from the group consisting of chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 8, chromosome 9, chromosome 10, chromosome 11, chromosome 12, chromosome 14, chromosome 18, and chromosome 21, or fragments thereof.
29. The system of claim 1 or 12, wherein the one or more chromosomes of interest is chromosome 18, and wherein the normalized chromosome of chromosome 18 is selected from the group consisting of chromosome 8, chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 9, chromosome 10, chromosome 11, chromosome 12, chromosome 13, and chromosome 14, or fragments thereof.
30. The system of claim 1 or 12, wherein the one or more chromosomes of interest are chromosome 21, and wherein the normalized chromosome of chromosome 21 is selected from the group consisting of chromosome 9, chromosome 1, chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 8, chromosome 10, chromosome 11, chromosome 12, chromosome 13, chromosome 14, chromosome 15, chromosome 16, and chromosome 17, or fragments thereof.
31. The system of claim 1 or 12, wherein the one or more chromosomes of interest is chromosome X, and wherein the normalized chromosome for chromosome X is selected from the group consisting of chromosome 1, chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 8, chromosome 9, chromosome 10, chromosome 11, chromosome 12, chromosome 13, chromosome 14, chromosome 15, and chromosome 16, or fragments thereof.
32. The system of claim 1 or 12, wherein the one or more chromosomes of interest is chromosome Y, and wherein the normalized chromosome of chromosome Y is selected from the group consisting of chromosome 4 and chromosome 6, or fragments thereof.
CN201180022958.5A 2011-07-26 2011-07-26 Method for determining the presence or absence of different aneuploidies in a sample Active CN103003447B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2011/045412 WO2013015793A1 (en) 2011-07-26 2011-07-26 Method for determining the presence or absence of different aneuploidies in a sample

Publications (2)

Publication Number Publication Date
CN103003447A CN103003447A (en) 2013-03-27
CN103003447B true CN103003447B (en) 2020-08-25

Family

ID=44838718

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201180022958.5A Active CN103003447B (en) 2011-07-26 2011-07-26 Method for determining the presence or absence of different aneuploidies in a sample

Country Status (9)

Country Link
EP (1) EP2563937A1 (en)
JP (1) JP6161607B2 (en)
KR (1) KR101974492B1 (en)
CN (1) CN103003447B (en)
AU (1) AU2011373694A1 (en)
CA (1) CA2840418C (en)
GB (1) GB2485635B (en)
HK (1) HK1174063A1 (en)
WO (1) WO2013015793A1 (en)

Families Citing this family (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100112590A1 (en) 2007-07-23 2010-05-06 The Chinese University Of Hong Kong Diagnosing Fetal Chromosomal Aneuploidy Using Genomic Sequencing With Enrichment
US9323888B2 (en) 2010-01-19 2016-04-26 Verinata Health, Inc. Detecting and classifying copy number variation
US10388403B2 (en) 2010-01-19 2019-08-20 Verinata Health, Inc. Analyzing copy number variation in the detection of cancer
US9260745B2 (en) 2010-01-19 2016-02-16 Verinata Health, Inc. Detecting and classifying copy number variation
US20120010085A1 (en) 2010-01-19 2012-01-12 Rava Richard P Methods for determining fraction of fetal nucleic acids in maternal samples
EP2883965B8 (en) 2010-01-19 2018-06-27 Verinata Health, Inc Method for determining copy number variations
US20120100548A1 (en) 2010-10-26 2012-04-26 Verinata Health, Inc. Method for determining copy number variations
AU2011207544A1 (en) 2010-01-19 2012-09-06 Verinata Health, Inc. Identification of polymorphic sequences in mixtures of genomic DNA by whole genome sequencing
AU2011207561B2 (en) 2010-01-19 2014-02-20 Verinata Health, Inc. Partition defined detection methods
PL2697392T3 (en) 2011-04-12 2016-08-31 Verinata Health Inc Resolving genome fractions using polymorphism counts
US9411937B2 (en) * 2011-04-15 2016-08-09 Verinata Health, Inc. Detecting and classifying copy number variation
US11261494B2 (en) 2012-06-21 2022-03-01 The Chinese University Of Hong Kong Method of measuring a fractional concentration of tumor DNA
AU2013204536A1 (en) * 2012-07-20 2014-02-06 Verinata Health, Inc. Detecting and classifying copy number variation in a cancer genome
EP2877594B1 (en) * 2012-07-20 2019-12-04 Verinata Health, Inc. Detecting and classifying copy number variation in a fetal genome
AU2019200162B2 (en) * 2012-07-20 2021-10-07 Verinata Health, Inc. Detecting and classifying copy number variation
EP2882872B1 (en) 2012-08-13 2021-10-06 The Regents of The University of California Methods and systems for detecting biological components
GB201215449D0 (en) * 2012-08-30 2012-10-17 Zoragen Biotechnologies Llp Method of detecting chromosonal abnormalities
EP3008215B1 (en) * 2013-06-13 2020-01-01 Ariosa Diagnostics, Inc. Statistical analysis for non-invasive sex chromosome aneuploidy determination
EP3543354B1 (en) * 2013-06-17 2022-01-19 Verinata Health, Inc. Method for generating a masked reference sequence of the y chromosome
US20160154931A1 (en) * 2013-07-17 2016-06-02 Bgi Genomics Co., Limited Method and device for detecting chromosomal aneuploidy
CA2925528C (en) * 2013-10-04 2023-09-05 Sequenom, Inc. Methods and processes for non-invasive assessment of genetic variations
US10741269B2 (en) 2013-10-21 2020-08-11 Verinata Health, Inc. Method for improving the sensitivity of detection in determining copy number variations
WO2015089726A1 (en) * 2013-12-17 2015-06-25 深圳华大基因科技有限公司 Chromosome aneuploidy detection method and apparatus therefor
JP6659672B2 (en) * 2014-05-30 2020-03-04 ベリナタ ヘルス インコーポレイテッド Detection of fetal chromosome partial aneuploidy and copy number variation
EP3160654A4 (en) 2014-06-27 2017-11-15 The Regents of The University of California Pcr-activated sorting (pas)
CN106795551B (en) * 2014-09-26 2020-11-20 深圳华大基因股份有限公司 CNV analysis method and detection device for single cell chromosome
US10434507B2 (en) 2014-10-22 2019-10-08 The Regents Of The University Of California High definition microdroplet printer
CN114181997A (en) * 2014-12-12 2022-03-15 维里纳塔健康股份有限公司 Determination of copy number variation using cell-free DNA fragment size
CA2972433A1 (en) * 2014-12-31 2016-07-07 Guardant Health, Inc. Detection and treatment of disease exhibiting disease cell heterogeneity and systems and methods for communicating test results
US10319463B2 (en) * 2015-01-23 2019-06-11 The Chinese University Of Hong Kong Combined size- and count-based analysis of maternal plasma for detection of fetal subchromosomal aberrations
EP3253479B1 (en) 2015-02-04 2022-09-21 The Regents of The University of California Sequencing of nucleic acids via barcoding in discrete entities
EP3256605B1 (en) 2015-02-10 2022-02-09 The Chinese University Of Hong Kong Detecting mutations for cancer screening and fetal analysis
CN104745718B (en) * 2015-04-23 2018-02-16 北京中仪康卫医疗器械有限公司 A kind of method for detecting human embryos microdeletion and micro- repetition
US10844428B2 (en) * 2015-04-28 2020-11-24 Illumina, Inc. Error suppression in sequenced DNA fragments using redundant reads with unique molecular indices (UMIS)
WO2017031125A1 (en) * 2015-08-17 2017-02-23 The Regents Of The University Of California Microdroplet-based multiple displacement amplification (mda) methods and related compositions
EP3347466B1 (en) 2015-09-08 2024-01-03 Cold Spring Harbor Laboratory Genetic copy number determination using high throughput multiplex sequencing of smashed nucleotides
US20200095632A1 (en) * 2015-11-12 2020-03-26 Samuel Williams Rapid sequencing of short dna fragments using nanopore technology
SG11201804651XA (en) * 2015-12-04 2018-07-30 Green Cross Genome Corp Method for determining copy-number variation in sample comprising mixture of nucleic acids
US10095831B2 (en) 2016-02-03 2018-10-09 Verinata Health, Inc. Using cell-free DNA fragment size to determine copy number variations
WO2018009723A1 (en) * 2016-07-06 2018-01-11 Guardant Health, Inc. Methods for fragmentome profiling of cell-free nucleic acids
EP3497228A4 (en) 2016-08-10 2020-05-27 The Regents of The University of California Combined multiple-displacement amplification and pcr in an emulsion microdroplet
TWI603082B (en) * 2016-09-30 2017-10-21 有勁生物科技股份有限公司 Non-invasive fetal sex abnormality detecting system and method thereof and non-invasive fetal sex determination system and method thereof
EP3571308A4 (en) 2016-12-21 2020-08-19 The Regents of The University of California Single cell genomic sequencing using hydrogel based droplets
MY197535A (en) 2017-01-25 2023-06-21 Univ Hong Kong Chinese Diagnostic applications using nucleic acid fragments
US11342047B2 (en) 2017-04-21 2022-05-24 Illumina, Inc. Using cell-free DNA fragment size to detect tumor-associated variant
JP2018183095A (en) * 2017-04-26 2018-11-22 株式会社エンプラス Method for isolating fetus-derived hemopoietic precursor cell, and method for testing possibility of chromosomal aberration of fetus
US10501739B2 (en) 2017-10-18 2019-12-10 Mission Bio, Inc. Method, systems and apparatus for single cell analysis
CA3135026A1 (en) * 2019-03-28 2020-10-01 Phase Genomics, Inc. Systems and methods for karyotyping by sequencing

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010033578A2 (en) * 2008-09-20 2010-03-25 The Board Of Trustees Of The Leland Stanford Junior University Noninvasive diagnosis of fetal aneuploidy by sequencing
CN101849236A (en) * 2007-07-23 2010-09-29 香港中文大学 Diagnosing fetal chromosomal aneuploidy using genomic sequencing
US20110177517A1 (en) * 2010-01-19 2011-07-21 Artemis Health, Inc. Partition defined detection methods
WO2011091046A1 (en) * 2010-01-19 2011-07-28 Verinata Health, Inc. Identification of polymorphic sequences in mixtures of genomic dna by whole genome sequencing

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ATE508209T1 (en) * 2006-02-28 2011-05-15 Univ Louisville Res Found DETECTION OF CHROMOSOME ABNORMALITIES IN THE FETUS USING TANDEM SINGLE NUCLEOTIDE POLYMORPHISMS
US20080050739A1 (en) * 2006-06-14 2008-02-28 Roland Stoughton Diagnosis of fetal abnormalities using polymorphisms including short tandem repeats
US8262900B2 (en) 2006-12-14 2012-09-11 Life Technologies Corporation Methods and apparatus for measuring analytes using large scale FET arrays
US20100112590A1 (en) 2007-07-23 2010-05-06 The Chinese University Of Hong Kong Diagnosing Fetal Chromosomal Aneuploidy Using Genomic Sequencing With Enrichment
KR20100089060A (en) 2007-10-04 2010-08-11 할싸이언 몰레큘러 Sequencing nucleic acid polymers with electron microscopy
EP2824191A3 (en) 2009-10-26 2015-02-18 Lifecodexx AG Means and methods for non-invasive diagnosis of chromosomal aneuploidy
EP2883965B8 (en) * 2010-01-19 2018-06-27 Verinata Health, Inc Method for determining copy number variations

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101849236A (en) * 2007-07-23 2010-09-29 香港中文大学 Diagnosing fetal chromosomal aneuploidy using genomic sequencing
WO2010033578A2 (en) * 2008-09-20 2010-03-25 The Board Of Trustees Of The Leland Stanford Junior University Noninvasive diagnosis of fetal aneuploidy by sequencing
US20110177517A1 (en) * 2010-01-19 2011-07-21 Artemis Health, Inc. Partition defined detection methods
WO2011091046A1 (en) * 2010-01-19 2011-07-28 Verinata Health, Inc. Identification of polymorphic sequences in mixtures of genomic dna by whole genome sequencing

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Noninvasive diagnosis of fetal aneuploidy by shotgun sequencing DNA from maternal blood;H. Christina Fan,et al;《PNAS》;20081231;第105卷(第42期);16266-16271 *
Noninvasive Prenatal Diagnosis of Fetal Chromosomal Aneuploidies by Maternal Plasma Nucleic Acid Analysis;Y. M. Dennis Lo,et al;《Clinical Chemistry》;20081231;第54卷(第3期);461-466 *
Statistical model for whole genome sequencing and its application to minimally invasive diagnosis of fetal genetic disease;Tianjiao Chu,et al;《Bioinformatics》;20091231;第25卷(第10期);1244-1250 *

Also Published As

Publication number Publication date
GB2485635A (en) 2012-05-23
GB201114713D0 (en) 2011-10-12
CA2840418C (en) 2019-10-29
JP2014521334A (en) 2014-08-28
JP6161607B2 (en) 2017-07-12
AU2011373694A1 (en) 2013-05-02
WO2013015793A1 (en) 2013-01-31
KR101974492B1 (en) 2019-05-02
HK1174063A1 (en) 2013-05-31
KR20140050032A (en) 2014-04-28
EP2563937A1 (en) 2013-03-06
CA2840418A1 (en) 2013-01-31
CN103003447A (en) 2013-03-27
GB2485635B (en) 2012-11-28

Similar Documents

Publication Publication Date Title
US20220228197A1 (en) Method for determining copy number variations
CN103003447B (en) Method for determining the presence or absence of different aneuploidies in a sample
US20220106639A1 (en) Method for determining copy number variations
US20210082538A1 (en) Normalizing chromosomes for the determination and verification of common and rare chromosomal aneuploidies
CN107750277B (en) Determination of copy number variation using cell-free DNA fragment size
CN108485940B (en) Detection and classification of copy number variation
CN105722994B (en) Method for determining copy number variation in chromosomes
US20120237928A1 (en) Method for determining copy number variations
CA2878246A1 (en) Detecting and classifying copy number variation in a cancer genome
AU2011365507A1 (en) Normalizing chromosomes for the determination and verification of common and rare chromosomal aneuploidies
AU2016262641A1 (en) Detecting and classifying copy number variation
AU2015204302B2 (en) Method for determining copy number variations

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant