US20230368918A1 - Method of detecting fetal chromosomal aneuploidy - Google Patents
Method of detecting fetal chromosomal aneuploidy Download PDFInfo
- Publication number
- US20230368918A1 US20230368918A1 US18/225,618 US202318225618A US2023368918A1 US 20230368918 A1 US20230368918 A1 US 20230368918A1 US 202318225618 A US202318225618 A US 202318225618A US 2023368918 A1 US2023368918 A1 US 2023368918A1
- Authority
- US
- United States
- Prior art keywords
- chromosome
- reads
- samples
- sample
- reference samples
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 66
- 230000001605 fetal effect Effects 0.000 title claims abstract description 43
- 208000036878 aneuploidy Diseases 0.000 title claims abstract description 37
- 231100001075 aneuploidy Toxicity 0.000 title claims abstract description 35
- 230000002759 chromosomal effect Effects 0.000 title claims abstract description 32
- 210000000349 chromosome Anatomy 0.000 claims abstract description 112
- 238000012360 testing method Methods 0.000 claims description 108
- 239000000523 sample Substances 0.000 claims description 91
- 150000007523 nucleic acids Chemical group 0.000 claims description 53
- 230000003044 adaptive effect Effects 0.000 claims description 52
- 210000003754 fetus Anatomy 0.000 claims description 32
- 239000013074 reference sample Substances 0.000 claims description 27
- 238000012795 verification Methods 0.000 claims description 23
- 238000012163 sequencing technique Methods 0.000 claims description 21
- 239000012472 biological sample Substances 0.000 claims description 18
- 238000004458 analytical method Methods 0.000 claims description 11
- 239000012634 fragment Substances 0.000 claims description 11
- 238000005070 sampling Methods 0.000 claims description 9
- 201000010374 Down Syndrome Diseases 0.000 claims description 7
- 238000012417 linear regression Methods 0.000 claims description 7
- 210000002381 plasma Anatomy 0.000 claims description 7
- 201000006360 Edwards syndrome Diseases 0.000 claims description 6
- 201000009928 Patau syndrome Diseases 0.000 claims description 6
- 206010044686 Trisomy 13 Diseases 0.000 claims description 6
- 208000006284 Trisomy 13 Syndrome Diseases 0.000 claims description 6
- 208000007159 Trisomy 18 Syndrome Diseases 0.000 claims description 6
- 206010044688 Trisomy 21 Diseases 0.000 claims description 6
- 238000002669 amniocentesis Methods 0.000 claims description 6
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 claims description 6
- UYTPUPDQBNUYGX-UHFFFAOYSA-N guanine Chemical compound O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 claims description 6
- 206010053884 trisomy 18 Diseases 0.000 claims description 6
- 210000004369 blood Anatomy 0.000 claims description 5
- 239000008280 blood Substances 0.000 claims description 5
- 210000004252 chorionic villi Anatomy 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 5
- 210000002966 serum Anatomy 0.000 claims description 5
- 210000003954 umbilical cord Anatomy 0.000 claims description 5
- 210000001766 X chromosome Anatomy 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 4
- 229940104302 cytosine Drugs 0.000 claims description 3
- 238000009499 grossing Methods 0.000 claims description 3
- 238000000611 regression analysis Methods 0.000 claims description 3
- 206010036790 Productive cough Diseases 0.000 claims description 2
- 210000002593 Y chromosome Anatomy 0.000 claims description 2
- 210000003608 fece Anatomy 0.000 claims description 2
- 210000003097 mucus Anatomy 0.000 claims description 2
- 210000003296 saliva Anatomy 0.000 claims description 2
- 210000003802 sputum Anatomy 0.000 claims description 2
- 208000024794 sputum Diseases 0.000 claims description 2
- 210000001138 tear Anatomy 0.000 claims description 2
- 210000002700 urine Anatomy 0.000 claims description 2
- 230000035945 sensitivity Effects 0.000 abstract description 6
- 108020004414 DNA Proteins 0.000 description 19
- 230000035935 pregnancy Effects 0.000 description 10
- 208000036830 Normal foetus Diseases 0.000 description 8
- 238000001514 detection method Methods 0.000 description 8
- 108020004707 nucleic acids Proteins 0.000 description 7
- 102000039446 nucleic acids Human genes 0.000 description 7
- 238000010187 selection method Methods 0.000 description 6
- 238000002405 diagnostic procedure Methods 0.000 description 5
- 230000008774 maternal effect Effects 0.000 description 5
- 102000011022 Chorionic Gonadotropin Human genes 0.000 description 3
- 108010062540 Chorionic Gonadotropin Proteins 0.000 description 3
- 208000037280 Trisomy Diseases 0.000 description 3
- 238000012937 correction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000001914 filtration Methods 0.000 description 3
- 229940084986 human chorionic gonadotropin Drugs 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 238000007619 statistical method Methods 0.000 description 3
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 2
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 2
- 108030001694 Pappalysin-1 Proteins 0.000 description 2
- 102000005819 Pregnancy-Associated Plasma Protein-A Human genes 0.000 description 2
- 208000026928 Turner syndrome Diseases 0.000 description 2
- 230000003322 aneuploid effect Effects 0.000 description 2
- 210000004027 cell Anatomy 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000007481 next generation sequencing Methods 0.000 description 2
- 210000005259 peripheral blood Anatomy 0.000 description 2
- 239000011886 peripheral blood Substances 0.000 description 2
- 238000003793 prenatal diagnosis Methods 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 238000009966 trimming Methods 0.000 description 2
- PROQIPRRNZUXQM-UHFFFAOYSA-N (16alpha,17betaOH)-Estra-1,3,5(10)-triene-3,16,17-triol Natural products OC1=CC=C2C3CCC(C)(C(C(O)C4)O)C4C3CCC2=C1 PROQIPRRNZUXQM-UHFFFAOYSA-N 0.000 description 1
- 238000001712 DNA sequencing Methods 0.000 description 1
- 208000017924 Klinefelter Syndrome Diseases 0.000 description 1
- 208000016679 Monosomy X Diseases 0.000 description 1
- 108010021757 Polynucleotide 5'-Hydroxyl-Kinase Proteins 0.000 description 1
- 102000008422 Polynucleotide 5'-hydroxyl-kinase Human genes 0.000 description 1
- 108091081062 Repeated sequence (DNA) Proteins 0.000 description 1
- RTAQQCXQSZGOHL-UHFFFAOYSA-N Titanium Chemical compound [Ti] RTAQQCXQSZGOHL-UHFFFAOYSA-N 0.000 description 1
- 208000034790 Twin pregnancy Diseases 0.000 description 1
- 206010000210 abortion Diseases 0.000 description 1
- 231100000176 abortion Toxicity 0.000 description 1
- 102000013529 alpha-Fetoproteins Human genes 0.000 description 1
- 108010026331 alpha-Fetoproteins Proteins 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- PROQIPRRNZUXQM-ZXXIGWHRSA-N estriol Chemical compound OC1=CC=C2[C@H]3CC[C@](C)([C@H]([C@H](O)C4)O)[C@@H]4[C@@H]3CCC2=C1 PROQIPRRNZUXQM-ZXXIGWHRSA-N 0.000 description 1
- 229960001348 estriol Drugs 0.000 description 1
- 230000003116 impacting effect Effects 0.000 description 1
- 108010067471 inhibin A Proteins 0.000 description 1
- 230000036244 malformation Effects 0.000 description 1
- 208000030454 monosomy Diseases 0.000 description 1
- 239000002773 nucleotide Substances 0.000 description 1
- 125000003729 nucleotide group Chemical group 0.000 description 1
- 201000003738 orofaciodigital syndrome VIII Diseases 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 229910052719 titanium Inorganic materials 0.000 description 1
- 239000010936 titanium Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6813—Hybridisation assays
- C12Q1/6827—Hybridisation assays for detection of mutation or polymorphism
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/10—Ploidy or copy number detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
Definitions
- the present disclosure relates to a method of detecting fetal chromosomal aneuploidy in a biological sample derived from a pregnant woman, and a medium related thereto.
- Prenatal diagnosis refers to diagnosis of the presence or absence of diseases in a fetus before the fetus is born. Prenatal diagnosis is largely classified into invasive diagnostic methods and non-invasive diagnostic methods. Invasive diagnostic methods include, for example, chorionic villus sampling, amniocentesis, umbilical cord, etc. Invasive diagnostic methods may cause abortion, diseases, or malformations by impacting a fetus during the examination process, and therefore, non-invasive diagnostic methods have been developed.
- a method of detecting chromosomal aneuploidy of a target fetal chromosome is provided.
- a computer-readable medium having recorded thereon a program to be applied to performing the method of detecting chromosomal aneuploidy of a target fetal chromosome.
- a method of detecting chromosomal aneuploidy of a target fetal chromosome including:
- the method includes obtaining reads of a plurality of nucleic acid fragments obtained from a biological sample of a pregnant woman.
- the pregnant woman may be a pregnant woman carrying a single fetus or twin fetuses.
- the biological sample may be blood, plasma, serum, urine, saliva, mucus, sputum, feces, tears, or a combination thereof.
- the biological sample may be, for example, plasma of the peripheral blood.
- the biological sample may include fetal nucleic acids.
- the fetal nucleic acids may be cell-free DNA (cfDNA).
- the fetal nucleic acids may be isolated DNA.
- the obtaining of the reads of a plurality of nucleic acid fragments obtained from the biological sample of a pregnant woman may include isolating nucleic acids from the biological sample.
- a method of isolating the nucleic acids from the biological sample may be performed by a method known to those skilled in the art.
- the isolated nucleic acid fragments may be about 10 bp (base pairs) to about 2000 bp, about 15 bp to about 1500 bp, about 20 bp to about 1000 bp, about 20 bp to about 500 bp, about 20 bp to about 200 bp, or about 20 bp to about 100 bp in length.
- the obtaining of the sequence information of a plurality of nucleic acid fragments obtained from the biological sample of a pregnant woman may include performing massively parallel sequencing of the obtained nucleic acids.
- massively parallel sequencing may be used interchangeably with next-generation sequencing (NGS) or second-generation sequencing.
- the massively parallel sequencing refers to a technique in which millions of nucleic acid fragments are sequenced simultaneously.
- the massively parallel sequencing may be performed in parallel by, for example, 454 platform (Roche), GS FLX titanium, Illumina MiSeq, Illumina HiSeq, Illumina Genome Analyzer, Solexa platform, SOLiD System (Applied Biosystems), Ion Proton (Life Technologies), Complete Genomics, Helicos Biosciences Heliscope, single molecule real-time sequencing technology (SMRTTM) of Pacific Biosciences, or a combination thereof.
- 454 platform Roche
- GS FLX titanium Illumina MiSeq
- Illumina HiSeq Illumina Genome Analyzer
- Solexa platform Solexa platform
- SOLiD System Applied Biosystems
- Ion Proton Life Technologies
- Complete Genomics Helicos Biosciences Heliscope
- SMRTTM single
- the method may further include preparing DNA libraries in order to perform the massively parallel sequencing.
- the DNA libraries may be prepared according to procedures of the massive parallel sequencing.
- the DNA libraries may be prepared according to the instructions provided by a manufacturer of the massively parallel sequencing.
- the obtained sequence information of the nucleic acid fragments may be also called reads.
- the method includes mapping the obtained reads to a reference human genome to assign the reads of the nucleic acid fragments to the chromosome.
- the reference human genome may be hg18 or hg19.
- the reads mapped to only one genome location in the reference human genome may be termed unique reads.
- the reads of the nucleic acid fragments may be assigned to a chromosomal location.
- the chromosomal location may be within a consecutive region of a chromosome which is about 5 kb, about 10 kb, about 20 kb, about 50 kb, about 100 kb, about 1000 kb, or about 2000 kb in length.
- the chromosomal location may be on a single chromosome.
- the method may further include excluding intervals with a low confidence level for reads from subjects of analysis by examining depth distribution of the reads of the nucleic acid fragments assigned to the chromosome at each interval, after assigning the reads of the nucleic acid fragments to the chromosome.
- the interval may be an interval which is set in units of about kb to about 50 kb.
- the interval may be an interval which is set in units of about 10 kb to about 40 kb, about 15 kb to about 30 kb, or about 20 kb to about 25 kb.
- GC content of the base sequence may be used to perform filtering.
- grouping of the read depth and GC content of the nucleic acid fragments assigned to the chromosome may be performed and statistical analysis may be possible.
- the excluding of intervals with a low confidence level for reads from subjects of analysis may include removing mismatches, removing multi-mapped reads, removing duplicated reads, or a combination thereof.
- quality filtering may be a process of extracting high-quality reads with respect to quality of each base sequence obtained during a sequencing process.
- the trimming is a process of removing a poor-quality region because quality of the end of a base sequence is poor due to the nature of a sequencing device.
- a nucleic acid fragment may be trimmed to a size of about 50 bp or more, more than about 50 bp, or more than about 100 bp.
- a quality value of a nucleic acid fragment may be 20 or more, 30 or more, 40 or more, or 50 or more.
- the perfect match is a process of selecting only perfect matches when mapped to a reference human genome. Since multi-mapped reads are likely to be a repeated sequence region, the multi-mapped reads may be removed from the obtained reads. The removal of PCR duplicated reads is for removing more amplified regions due to errors during sequencing. Further, for statistical analysis, statistically significant results may be obtained by choosing groups with small deviation. Further, no depth region is usually the N-region of the chromosome, which may be removed from the subjects of analysis.
- the method may further include performing locally weighted scatterplot smoothing (LOWESS or LOESS) regression analysis of the reads of the nucleic acid fragments according to the following Equation 1 to reduce GC content bias, after assigning the reads of the nucleic acid fragments to the chromosome:
- LOWESS or LOESS locally weighted scatterplot smoothing
- the GC content bias is also called GC bias.
- the GC bias refers to a difference between actual GC content of the sequenced reads and predicted GC content based on the reference sequence.
- the GC bias refers to the difference of the number of sequenced reads according to the change of GC content.
- Equation 1 Rf if′ represents a corrected fraction of reads on chromosome j in sample i, and RC ij represents a corrected number of unique reads on chromosome j in sample i.
- the method may further include performing normalization of the reads of the nucleic acid fragments according to the following Equation 2, after assigning the reads of the nucleic acid fragments to the chromosome:
- Equation 2 Rf i′j′ represents a normalized fraction of reads on chromosome j in sample i, and N represents the total number of samples.
- the method may include calculating the GC content and fraction of reads (Rf) of the nucleic acid fragments on the target chromosome to the number of the nucleic acid fragments, based on the reads of the nucleic acid fragments assigned to the chromosome.
- the fraction of reads (Rf) of the nucleic acid fragments is also called read ratio.
- the Rf represents a ratio of the nucleic acid fragments on the target chromosome to the total number of the nucleic acid fragments on every chromosome of a test sample.
- the GC content represents a proportion (%) of guanine (G) and cytosine (C) in bases constituting DNA.
- the method may include selecting adaptive reference samples belonging to a shared range of unit values of Rf and unit values of GC content from reference samples, based on the calculated Rf and GC content on the target chromosome.
- the reference sample may be obtained from a biological sample of a pregnant woman carrying one or more euploid fetuses.
- the method may further include establishing a linear regression model from all of the reference samples.
- the adaptive reference sample may be a reference sample adapted to a test sample, which is selected from all reference samples.
- the adaptive reference sample may be a reference sample belonging to Rf of the target chromosome ⁇ unit value, GC content of the target chromosome ⁇ unit value, or a shared range thereof, which is selected from reference samples.
- the unit value may be a value arbitrarily set, and the unit value of Rf and the unit value of GC content may the same as or different from each other.
- the unit value (%) of Rf may be about 0.000001 to about 0.002, about 0.000005 to about 0.001, about 0.00001 to about 0.0005, or about 0.00005 to about 0.0001.
- the unit value (%) of GC content may be about 0.0001 to about 0.1, about 0.0005 to about 0.05, about 0.002 to about 0.02, or about 0.001 to about 0.01.
- the method may further include extending the unit values of Rf of the reference samples according to Rf values of the test samples, extending the unit values of GC of the reference samples according to GC contents of the test samples, or a combination thereof.
- the extending of the unit values of Rf of the reference samples, the extending of the unit values of GC of the reference samples, or a combination thereof may be implemented by computer algorithms.
- the method may include calculating z scores of a verification reference samples and z scores of the test samples using the selected adaptive reference samples.
- z score one of standard scores, refers to a score calculated by dividing a score deviation by a standard deviation of the group.
- the z score allows relative comparison between scores with different means and units by converting the scores into a unit distribution with a mean of 0 and a standard deviation of 1.
- the calculating of z scores of the verification reference samples and z scores of the test samples may include performing a linear regression analysis according to the following Equation 3 and calculating a linear predicted value of Rf according to the following Equation 4:
- Rf i′j′ represents a normalized fraction of reads on chromosome j in sample i
- a represents a constant
- ⁇ represents a coefficient factor between GC content and Rf
- e represents a residual (R).
- Rf′ i′j′ represents a fitted predicted value of fraction of reads on chromosome j in sample i
- ⁇ represents a constant
- ⁇ represents a coefficient factor between GC content and Rf.
- the calculating of z scores of the verification reference samples and z scores of the test samples may include calculating a residual (R) from a calculated value from the linear regression analysis and the calculated linear predicted value according to the following Equation 5, and calculating a Z score from the calculated residual according to the following Equation 6:
- Equation 6 R′ represents a mean value of a residual of an adaptive reference sample, R represents a residual value of a test sample, and ⁇ ′ represents a standard deviation of the residual of the adaptive reference sample.
- the method may include determining that the target chromosome has chromosomal aneuploidy, when, by comparing the calculated z score of the adaptive reference sample with the z score of the test sample, the z score of the test sample is larger than the z score of the adaptive reference sample.
- the method may further include selecting reference samples belonging to GC content ⁇ unit value of the target chromosome as verification samples; calculating z scores of the verification samples; and verifying that the target chromosome has chromosomal aneuploidy by comparing the calculated z scores of the verification samples with the z scores of the test samples.
- the method may further include determining that the target chromosome has chromosomal aneuploidy, when the calculated z score of the target chromosome of a test sample is more than 3.
- the method provides a method of detecting chromosomal aneuploidy of a target fetal chromosome.
- the target chromosome may be chromosome 13, chromosome 18, chromosome 21, an X chromosome, a Y chromosome, or a combination thereof.
- chromosomal aneuploidy refers to a state in which the number of chromosomes per cell in cells, individuals, or lineages is not a multiple of the haploid number, a state in which one to several chromosomes are gained or lost with respect to euploidy, in other words, a state in which one chromosome set is incomplete.
- nullisomy refers to loss of both members of a homologous pair of chromosomes
- monosomy refers to the absence of one chromosome and the presence of the other chromosome
- trisomy (T) refers to the presence of extra single chromosome in addition to a homologous pair of chromosomes.
- the chromosomal aneuploidy may be trisomy 13, trisomy 18, trisomy 21, XO, XXX, XXY, XYY, or a combination thereof.
- Trisomy 13 is associated with Patau syndrome.
- Trisomy 18 is associated with Edwards syndrome.
- Trisomy 21 is associated with Down syndrome.
- Monosomy X (XO, e.g., loss of one X chromosome) is associated with Turner syndrome.
- XXY is the presence of an additional X chromosome in human males, and associated with Klinefelter syndrome.
- a computer-readable medium having recorded thereon a program to be applied to performing the method according to one aspect.
- the computer-readable medium encompasses systems containing the computer-readable medium.
- fetal chromosomal aneuploidy may be non-invasively and prenatally diagnosed with excellent sensitivity and specificity.
- FIG. 1 is a graph showing, among all samples, z scores of euploid samples and trisomy 21 samples;
- FIG. 2 is a graph showing selection of adaptive reference samples belonging to a shared range of fraction of reads (Rf) and GC content from among reference samples, based on Rf and GC content of a targeted chromosome, according to an embodiment
- FIG. 3 A shows coefficient of variation for six sets of reference samples which were selected according to a shared region of GC and Rf based on a representative sample of chromosome T21 with a GC content of 0.416
- FIG. 3 B shows results of calculating Z scores of T21 verification samples with a GC content of 0.41 and euploid verification samples using the euploid samples of sets A to F adaptively selected in FIG. 3 A as reference samples;
- FIG. 4 A shows coefficient of variation for six sets of reference samples which were selected according to a shared region of GC and Rf based on a representative sample of chromosome T21 with a GC content of 0.424
- FIG. 4 B shows results of calculating Z scores of T21 verification samples with a GC content of 0.42 and euploid verification samples using the euploid samples of sets A to F adaptively selected in FIG. 4 A as reference samples;
- FIG. 5 A shows coefficient of variation for six sets of reference samples which were selected according to a shared region of GC and Rf based on a representative sample of chromosome T21 with a GC content of 0.437
- FIG. 5 B shows results of calculating Z scores of T21 verification samples with a GC content of 0.43 and euploid verification samples using the euploid samples of sets A to F adaptively selected in FIG. 5 A as reference samples;
- FIG. 6 A shows coefficient of variation for four sets of reference samples which were selected according to a shared region of GC and Rf based on a representative sample of chromosome T21 with a GC content of 0.446
- FIG. 6 B shows results of calculating Z scores of T21 verification samples with a GC content of 0.44 and euploid verification samples using the euploid samples of sets A to D adaptively selected in FIG. 6 A as reference samples;
- FIG. 7 A shows coefficient of variation for sets of reference samples which were selected according to a shared region of GC and Rf based on a sample of chromosome T18 with a GC content of 0.45
- FIG. 7 B shows z scores of all reference samples and a T18 sample with a GC content of 0.45
- FIG. 7 C shows results of calculating Z scores of T18 test samples with a GC content of 0.45 and euploid test samples using the euploid samples adaptively selected as reference samples;
- FIG. 8 A shows coefficient of variation for sets of reference samples which were selected according to a shared region of GC and Rf based on a sample of chromosome T13 with a GC content of 0.421
- FIG. 8 B shows z scores of all reference samples and a T13 sample with a GC content of 0.421
- FIG. 8 C shows results of calculating Z scores of T13 test samples with a GC content of 0.421 and euploid test samples using the euploid samples adaptively selected as reference samples;
- First-trimester screening includes measurement of serum pregnancy-associated plasma protein A (PAPP-A), total or free beta subunit of human chorionic gonadotropin (hCG), and nuchal translucency.
- Second-trimester screening includes measurement of maternal serum alpha-fetoprotein (MSAFP), hCG, unconjugated estriol, and inhibin A.
- the end-repair of the obtained cfDNA was carried out using T4 DNA polymerase, Klenow DNA polymerase, and T4 polynucleotide kinase, and then cfDNA fragments were obtained again by using Agencourt AMPure XP.
- DNA libraries for ion proton sequencing systems were constructed from the prepared cfDNA according to the protocol provided by the manufacturer (Life Technologies, SD, USA). Proton PI Chip Kit version 2.0 was used to yield an average 0.3 ⁇ sequencing coverage depth per nucleotide.
- the DNA libraries prepared as in 2. were subjected to massively parallel sequencing by using ION PROTONTM system (Thermo Fisher Scientific).
- the reads were trimmed from the 3′ end by sequencing, and low-quality reads were excluded from the subjects of analysis. Further, the reads were filtered by a quality threshold value of 20 and a read length threshold of 50 bp.
- the filtered reads were aligned to the human genomic reference sequences hg19 using Burrows-Wheeler transform (BWT). Sequence reads mapped to only one genome location in hg19 were termed unique reads. About 44.6% (about 3.3 ⁇ 10 6 ) of the total reads were unique reads. The GC contents of the total 447 samples ranged from about 30% to about 60%.
- BWT Burrows-Wheeler transform
- LOESS locally weighted scatterplot smoothing
- Rf ij′ represents a corrected fraction of reads on chromosome j in sample i
- RC ij represents a corrected number of unique reads on chromosome j in sample i.
- Equation 2 Rf i′j′ represents a normalized fraction of reads on chromosome j in sample i, and N represents the total number of samples.
- Fetal aneuploidy was detected in all samples according to a previous method of calculating z score.
- Rf i′j′ represents a normalized fraction of reads on chromosome j in sample i
- Rf′ i′j′ represents a fitted predicted value of fraction of reads on chromosome j in sample i
- GC i′j′ represents a GC content on chromosome j in sample i
- ⁇ represents a coefficient factor between a GC content and Rf
- ⁇ represents a constant
- e represents a residual (R).
- the residual (R) was calculated according to
- FIG. 1 Z scores of the euploid samples and trisomy 21 (T21) of all samples are shown in FIG. 1 .
- a z score range from about 1 to about 3 for chromosome 21 overlapped in the euploid samples and T21 samples, and positive and negative results were not clearly distinguished, and a threshold was ambiguous, indicating that the method of detecting fetal aneuploidy using z scores of the whole reference samples shows low accuracy and specificity.
- GC contents of 13 positive samples were examined.
- the positive samples were categorized into four groups according to GC content regions (ranging from ⁇ 0.005 to +0.005).
- the two positive samples in the GC content region of 0.41, the five positive samples in the GC content region of 0.42, the two positive samples in the GC content region of 0.43, and the four positive samples in the GC content region of 0.44 were clustered according to the GC regions, respectively.
- Representative positive sample was selected from each group, and the selected positive sample was used to generate a set of adaptive reference samples by increasing the GC content by 0.001 and the reads fraction by 0.00005.
- reference samples belonging to a shared range of the GC content and Rf were extracted from all reference samples.
- the GC content range was set from ⁇ 0.001 to +0.001 as a unit value when setting the GC content of a test sample as the median.
- the Rf was set from ⁇ 0.00005 to +0.00005 as a unit value when setting the Rf of a test sample as the median, which was determined by the fitting predicted fraction of Rf calculated as
- a coefficient of variation (CV) was used to evaluate performance between the previous method of using whole reference samples and the method of using adaptive reference samples.
- the coefficient of variation for chromosome 21 was calculated with and without adaptive sample selection using reference samples selected from a shared region of GC content 0.416 ⁇ X and Rf linear predicted value ⁇ Y.
- FIG. 3 A shows coefficient of variation with and without adaptive reference selection.
- reference samples Of the two test samples in the GC content region of 0.41 of chromosome 21, one sample in the GC content region of 0.416 of chromosome T21 was selected as a representative test sample, and the other sample was used to verify results using the adaptive reference samples.
- the reference samples corresponding to set A were selected from a shared range of GC content 0.416 ⁇ 0.009 and Rf linear predicted value ⁇ 1e-05.
- test sample not selected as the representative test sample was used as a verification sample.
- reference samples were selected in the same manner as in (A).
- z scores of the T21 test samples were calculated from the selected reference samples and test samples.
- euploid samples arbitrarily selected in the GC content range of 0.416 ⁇ 0.001 were used as test samples, and the 6 sets of the reference samples were used to calculate z scores of the euploid test samples.
- samples in the GC content region of 0.42 of chromosome 21 were selected as a representative test sample, and other samples were used to demonstrate results using the adaptive reference samples.
- CVs for the selected six sets of reference samples are shown.
- FIG. 4 B show results of calculating Z scores of T21 test samples and euploid test samples using the euploid samples of sets A to F adaptively selected in FIG. 4 A as reference samples.
- the reference samples corresponding to set A were selected from a shared range of GC content 0.424 ⁇ 0.004 and Rf linear predicted value ⁇ 1e-05. In this regard, the remaining test samples not selected as the representative test sample were used as verification samples.
- reference samples were selected in the same manner as in (A). z scores of the T21 test samples were calculated from the selected reference samples and test samples.
- euploid samples arbitrarily selected in the GC content range of 0.424 ⁇ 0.001 were used as test samples, and the 6 sets of the reference samples were used to calculate z scores of the euploid test samples.
- FIG. 4 B when the adaptively selected sets A to F were used, normal fetus (euploid) and T21 fetus were clearly distinguished, and a threshold which is the z score for T21 fetus was unambiguous.
- one sample in the GC content region of 0.43 of chromosome 21 was selected as a representative test sample, and the other sample was used to demonstrate results using the adaptive reference samples.
- CVs for the selected six sets of reference samples are shown.
- FIG. 5 B show results of calculating Z scores of T21 test samples and euploid test samples using the euploid samples of sets A to F adaptively selected in FIG. 5 A as reference samples.
- the reference samples corresponding to set A were selected from a shared range of GC content 0.437 ⁇ 0.009 and Rf linear predicted value ⁇ 1e-05. In this regard, the remaining test samples not selected as the representative test sample were used as verification samples.
- reference samples were selected in the same manner as in (A). z scores of the T21 test samples were calculated from the selected reference samples and test samples.
- euploid samples arbitrarily selected in the GC content range of 0.437 ⁇ 0.001 were used as test samples, and the 6 sets of the reference samples were used to calculate z scores of the euploid test samples.
- FIG. 5 B when the adaptively selected sets A to F were used, normal fetus (euploid) and T21 fetus were clearly distinguished, and a threshold which is the z score for T21 fetus was unambiguous.
- samples in the GC content region of 0.44 of chromosome 21 were selected as representative test samples, and other samples were used to demonstrate results using the adaptive reference samples.
- FIG. 6 A CVs for the selected four sets of reference samples are shown.
- FIG. 6 B show results of calculating Z scores of T21 test samples and euploid test samples using the euploid samples of sets A to D adaptively selected in FIG. 6 A as reference samples.
- the reference samples corresponding to set A were selected from a shared range of GC content 0.446 ⁇ 0.011 and Rf linear predicted value ⁇ 2e-05. In this regard, the remaining test samples not selected as the representative test sample were used as verification samples.
- reference samples were selected in the same manner as in (A). z scores of the T21 test samples were calculated from the selected reference samples and test samples.
- euploid samples arbitrarily selected in the GC content range of 0.446 ⁇ 0.001 were used as test samples, and the 6 sets of the reference samples were used to calculate z scores of the euploid test samples.
- FIG. 6 B when the adaptively selected sets A to D were used, normal fetus (euploid) and T21 fetus were clearly distinguished, and a threshold which is the z score for T21 fetus was unambiguous.
- a representative test sample was also used as a test sample.
- CVs for the set A and reference value (non-selected reference sample) are shown.
- FIG. 7 B shows Z scores of non-selected reference sample and T18 sample.
- the reference sample corresponding to set A were selected from a shared range of GC content 0.45 ⁇ 0.014 and Rf linear predicted value ⁇ 2e-05. Further, euploid samples arbitrarily selected in the GC content range of 0.45 ⁇ 0.001 were used as test samples, and the reference sample set was used to calculate z scores of the euploid test samples.
- Trisomy 13 (T13) sample was detected by the adaptive selection method as described in 5.(2).
- a representative test sample was also used as a test sample.
- CVs for the set A and reference value (non-selected reference sample) are shown.
- FIG. 8 B shows Z scores of non-selected reference sample and T13 sample.
- the reference sample corresponding to set A were selected from a shared range of GC content 0.421 ⁇ 0.017 and Rf linear predicted value ⁇ 0.0001. Further, euploid samples arbitrarily selected in the GC content range of 0.421 ⁇ 0.001 were used as test samples, and the reference sample set was used to calculate z scores of the euploid test samples.
- reference samples belonging to a range set based on Rf of test chromosome and GC content may be selected from a target chromosome.
- test sample is a trisomy fetus or not may be detected with excellent sensitivity and specificity by comparing Z scores calculated from selected reference samples and Z score calculated from the test sample.
Abstract
Provided are a method of detecting chromosomal aneuploidy of a targeted fetal chromosome, and a computer-readable medium having recorded thereon a program to be applied to performing the method. According to the present disclosure, fetal chromosomal aneuploidy may be non-invasively and prenatally diagnosed with excellent sensitivity and specificity.
Description
- This application is a continuation of U.S. application Ser. No. 16/071,883, filed Jul. 20, 2018, which is a national phase of PCT Application PCT/KR2017/000266, filed Jan. 9, 2017, which claims benefit of priority from Korean patent application No. 10-2016-0008903, filed Jan. 25, 2016, the contents of which are incorporated by reference.
- The present disclosure relates to a method of detecting fetal chromosomal aneuploidy in a biological sample derived from a pregnant woman, and a medium related thereto.
- Prenatal diagnosis refers to diagnosis of the presence or absence of diseases in a fetus before the fetus is born. Prenatal diagnosis is largely classified into invasive diagnostic methods and non-invasive diagnostic methods. Invasive diagnostic methods include, for example, chorionic villus sampling, amniocentesis, umbilical cord, etc. Invasive diagnostic methods may cause abortion, diseases, or malformations by impacting a fetus during the examination process, and therefore, non-invasive diagnostic methods have been developed.
- Recently, it has been demonstrated that non-invasive diagnosis of fetal chromosomal aneuploidy is feasible by massively parallel sequencing of DNA molecules in the plasma of pregnant women. Fetal DNA is detectable in maternal plasma and serum from the seventh week of gestation, and the amount of fetal DNA in maternal blood increases as pregnancy progresses. When massively parallel sequencing of fetal DNA is carried out, a threshold for discriminating between euploid fetuses and chromosomal aneuploid fetuses is ambiguous, and thus there is a problem in that sensitivity and specificity for chromosomal aneuploidy detection are low.
- Accordingly, it is necessary to develop a method capable of increasing sensitivity and specificity for chromosomal aneuploidy detection by clearly discriminating between euploid fetuses and chromosomal aneuploid fetuses.
- Provided is a method of detecting chromosomal aneuploidy of a target fetal chromosome.
- Provided is a computer-readable medium having recorded thereon a program to be applied to performing the method of detecting chromosomal aneuploidy of a target fetal chromosome.
- According to an aspect of the present disclosure, provided is a method of detecting chromosomal aneuploidy of a target fetal chromosome, the method including:
-
- obtaining sequence information (reads) of a plurality of nucleic acid fragments obtained from a biological sample of a pregnant woman;
- mapping the obtained reads to a reference human genome to assign the reads of the nucleic acid fragments to the chromosome;
- calculating a GC content and a fraction of reads (Rf) of the nucleic acid fragments on the target chromosome to the number of the nucleic acid fragments, based on the reads of the nucleic acid fragments assigned to the chromosome;
- selecting adaptive reference samples belonging to a shared range of unit values of Rf and unit values of GC content from reference samples, based on the calculated Rf and GC content on the target chromosome;
- calculating z scores of a verification reference samples and z scores of the test samples using the selected adaptive reference samples; and
- determining that the target chromosome has chromosomal aneuploidy when, by comparing the calculated z scores of the verification reference samples with the z scores of the test samples, the z scores of the test samples are larger than the z scores of the verification reference samples.
- The method includes obtaining reads of a plurality of nucleic acid fragments obtained from a biological sample of a pregnant woman.
- The pregnant woman may be a pregnant woman carrying a single fetus or twin fetuses.
- The biological sample may be blood, plasma, serum, urine, saliva, mucus, sputum, feces, tears, or a combination thereof. The biological sample may be, for example, plasma of the peripheral blood. The biological sample may include fetal nucleic acids. The fetal nucleic acids may be cell-free DNA (cfDNA). The fetal nucleic acids may be isolated DNA.
- The obtaining of the reads of a plurality of nucleic acid fragments obtained from the biological sample of a pregnant woman may include isolating nucleic acids from the biological sample.
- A method of isolating the nucleic acids from the biological sample may be performed by a method known to those skilled in the art. The isolated nucleic acid fragments may be about 10 bp (base pairs) to about 2000 bp, about 15 bp to about 1500 bp, about 20 bp to about 1000 bp, about 20 bp to about 500 bp, about 20 bp to about 200 bp, or about 20 bp to about 100 bp in length.
- The obtaining of the sequence information of a plurality of nucleic acid fragments obtained from the biological sample of a pregnant woman may include performing massively parallel sequencing of the obtained nucleic acids.
- The term “massively parallel sequencing” may be used interchangeably with next-generation sequencing (NGS) or second-generation sequencing. The massively parallel sequencing refers to a technique in which millions of nucleic acid fragments are sequenced simultaneously. The massively parallel sequencing may be performed in parallel by, for example, 454 platform (Roche), GS FLX titanium, Illumina MiSeq, Illumina HiSeq, Illumina Genome Analyzer, Solexa platform, SOLiD System (Applied Biosystems), Ion Proton (Life Technologies), Complete Genomics, Helicos Biosciences Heliscope, single molecule real-time sequencing technology (SMRT™) of Pacific Biosciences, or a combination thereof.
- The method may further include preparing DNA libraries in order to perform the massively parallel sequencing.
- The DNA libraries may be prepared according to procedures of the massive parallel sequencing. The DNA libraries may be prepared according to the instructions provided by a manufacturer of the massively parallel sequencing.
- The obtained sequence information of the nucleic acid fragments may be also called reads.
- The method includes mapping the obtained reads to a reference human genome to assign the reads of the nucleic acid fragments to the chromosome.
- The reference human genome may be hg18 or hg19. The reads mapped to only one genome location in the reference human genome may be termed unique reads. Based on a unique sequence number, the reads of the nucleic acid fragments may be assigned to a chromosomal location. The chromosomal location may be within a consecutive region of a chromosome which is about 5 kb, about 10 kb, about 20 kb, about 50 kb, about 100 kb, about 1000 kb, or about 2000 kb in length. The chromosomal location may be on a single chromosome.
- The method may further include excluding intervals with a low confidence level for reads from subjects of analysis by examining depth distribution of the reads of the nucleic acid fragments assigned to the chromosome at each interval, after assigning the reads of the nucleic acid fragments to the chromosome. The interval may be an interval which is set in units of about kb to about 50 kb. For example, the interval may be an interval which is set in units of about 10 kb to about 40 kb, about 15 kb to about 30 kb, or about 20 kb to about 25 kb. By setting the intervals, GC content of the base sequence may be used to perform filtering. Further, by setting the intervals, grouping of the read depth and GC content of the nucleic acid fragments assigned to the chromosome may be performed and statistical analysis may be possible.
- The excluding of intervals with a low confidence level for reads from subjects of analysis may include removing mismatches, removing multi-mapped reads, removing duplicated reads, or a combination thereof. To exclude intervals with a low confidence level for reads from subjects of analysis, quality filtering, trimming, perfect match, removal of multi-mapped reads, removal of PCR duplicated reads, or a combination thereof may be performed. The quality filtering may be a process of extracting high-quality reads with respect to quality of each base sequence obtained during a sequencing process. The trimming is a process of removing a poor-quality region because quality of the end of a base sequence is poor due to the nature of a sequencing device. For example, a nucleic acid fragment may be trimmed to a size of about 50 bp or more, more than about 50 bp, or more than about 100 bp. For example, a quality value of a nucleic acid fragment may be 20 or more, 30 or more, 40 or more, or 50 or more. The perfect match is a process of selecting only perfect matches when mapped to a reference human genome. Since multi-mapped reads are likely to be a repeated sequence region, the multi-mapped reads may be removed from the obtained reads. The removal of PCR duplicated reads is for removing more amplified regions due to errors during sequencing. Further, for statistical analysis, statistically significant results may be obtained by choosing groups with small deviation. Further, no depth region is usually the N-region of the chromosome, which may be removed from the subjects of analysis.
- The method may further include performing locally weighted scatterplot smoothing (LOWESS or LOESS) regression analysis of the reads of the nucleic acid fragments according to the following
Equation 1 to reduce GC content bias, after assigning the reads of the nucleic acid fragments to the chromosome: -
Rf ij′ =R ij/Σj=1 22 RC ij (Equation 1). - The GC content bias is also called GC bias. The GC bias refers to a difference between actual GC content of the sequenced reads and predicted GC content based on the reference sequence. The GC bias refers to the difference of the number of sequenced reads according to the change of GC content.
- In
Equation 1, Rfif′ represents a corrected fraction of reads on chromosome j in sample i, and RCij represents a corrected number of unique reads on chromosome j in sample i. - The method may further include performing normalization of the reads of the nucleic acid fragments according to the following
Equation 2, after assigning the reads of the nucleic acid fragments to the chromosome: -
Rf i′j′ =Rf ij′/Σi=1 N Rf ij′ (Equation 2). - In
Equation 2, Rfi′j′ represents a normalized fraction of reads on chromosome j in sample i, and N represents the total number of samples. - The method may include calculating the GC content and fraction of reads (Rf) of the nucleic acid fragments on the target chromosome to the number of the nucleic acid fragments, based on the reads of the nucleic acid fragments assigned to the chromosome.
- The fraction of reads (Rf) of the nucleic acid fragments is also called read ratio. The Rf represents a ratio of the nucleic acid fragments on the target chromosome to the total number of the nucleic acid fragments on every chromosome of a test sample.
- The GC content represents a proportion (%) of guanine (G) and cytosine (C) in bases constituting DNA. The GC content may be calculated from Equation of GC content=(G+C)/(A+T+G+C).
- The method may include selecting adaptive reference samples belonging to a shared range of unit values of Rf and unit values of GC content from reference samples, based on the calculated Rf and GC content on the target chromosome.
- The reference sample may be obtained from a biological sample of a pregnant woman carrying one or more euploid fetuses. The method may further include establishing a linear regression model from all of the reference samples.
- The term “adaptive” may be used interchangeably with the term “selective” or the term “personalized”. The adaptive reference sample may be a reference sample adapted to a test sample, which is selected from all reference samples. The adaptive reference sample may be a reference sample belonging to Rf of the target chromosome±unit value, GC content of the target chromosome±unit value, or a shared range thereof, which is selected from reference samples. The unit value may be a value arbitrarily set, and the unit value of Rf and the unit value of GC content may the same as or different from each other. For example, the unit value (%) of Rf may be about 0.000001 to about 0.002, about 0.000005 to about 0.001, about 0.00001 to about 0.0005, or about 0.00005 to about 0.0001. The unit value (%) of GC content may be about 0.0001 to about 0.1, about 0.0005 to about 0.05, about 0.002 to about 0.02, or about 0.001 to about 0.01.
- The method may further include extending the unit values of Rf of the reference samples according to Rf values of the test samples, extending the unit values of GC of the reference samples according to GC contents of the test samples, or a combination thereof. The extending of the unit values of Rf of the reference samples, the extending of the unit values of GC of the reference samples, or a combination thereof may be implemented by computer algorithms.
- The method may include calculating z scores of a verification reference samples and z scores of the test samples using the selected adaptive reference samples.
- The term “z score”, one of standard scores, refers to a score calculated by dividing a score deviation by a standard deviation of the group. The z score allows relative comparison between scores with different means and units by converting the scores into a unit distribution with a mean of 0 and a standard deviation of 1.
- The calculating of z scores of the verification reference samples and z scores of the test samples may include performing a linear regression analysis according to the following
Equation 3 and calculating a linear predicted value of Rf according to the following Equation 4: -
Rf i′j′ =α+β×GC i′j′ +e (Equation 3), and -
Rf′ i′j′ =α+β×GC i′j′ (Equation 4). - In
Equation 3, Rfi′j′ represents a normalized fraction of reads on chromosome j in sample i, a represents a constant, β represents a coefficient factor between GC content and Rf, and e represents a residual (R). - In
Equation 4, Rf′i′j′ represents a fitted predicted value of fraction of reads on chromosome j in sample i, α represents a constant, and β represents a coefficient factor between GC content and Rf. - The calculating of z scores of the verification reference samples and z scores of the test samples may include calculating a residual (R) from a calculated value from the linear regression analysis and the calculated linear predicted value according to the following
Equation 5, and calculating a Z score from the calculated residual according to the following Equation 6: -
R=Rf i′j′ −Rf′ i′j′ (Equation 5); and -
z score=(R−R′)/σ′ (Equation 6). - In
Equation 6, R′ represents a mean value of a residual of an adaptive reference sample, R represents a residual value of a test sample, and σ′ represents a standard deviation of the residual of the adaptive reference sample. - The method may include determining that the target chromosome has chromosomal aneuploidy, when, by comparing the calculated z score of the adaptive reference sample with the z score of the test sample, the z score of the test sample is larger than the z score of the adaptive reference sample.
- The method may further include selecting reference samples belonging to GC content±unit value of the target chromosome as verification samples; calculating z scores of the verification samples; and verifying that the target chromosome has chromosomal aneuploidy by comparing the calculated z scores of the verification samples with the z scores of the test samples.
- The method may further include determining that the target chromosome has chromosomal aneuploidy, when the calculated z score of the target chromosome of a test sample is more than 3.
- The method provides a method of detecting chromosomal aneuploidy of a target fetal chromosome.
- The target chromosome may be
chromosome 13,chromosome 18,chromosome 21, an X chromosome, a Y chromosome, or a combination thereof. - The term “chromosomal aneuploidy” refers to a state in which the number of chromosomes per cell in cells, individuals, or lineages is not a multiple of the haploid number, a state in which one to several chromosomes are gained or lost with respect to euploidy, in other words, a state in which one chromosome set is incomplete. In the case of diploid, nullisomy refers to loss of both members of a homologous pair of chromosomes, monosomy refers to the absence of one chromosome and the presence of the other chromosome, and trisomy (T) refers to the presence of extra single chromosome in addition to a homologous pair of chromosomes.
- The chromosomal aneuploidy may be
trisomy 13,trisomy 18,trisomy 21, XO, XXX, XXY, XYY, or a combination thereof.Trisomy 13 is associated with Patau syndrome.Trisomy 18 is associated with Edwards syndrome.Trisomy 21 is associated with Down syndrome. Monosomy X (XO, e.g., loss of one X chromosome) is associated with Turner syndrome. XXY is the presence of an additional X chromosome in human males, and associated with Klinefelter syndrome. - According to another aspect, provided is a computer-readable medium having recorded thereon a program to be applied to performing the method according to one aspect. The computer-readable medium encompasses systems containing the computer-readable medium.
- According to a method of detecting chromosomal aneuploidy in a targeted fetal chromosome, and a computer-readable medium having recorded thereon a program to be applied to performing the method according to specific embodiments, fetal chromosomal aneuploidy may be non-invasively and prenatally diagnosed with excellent sensitivity and specificity.
-
FIG. 1 is a graph showing, among all samples, z scores of euploid samples andtrisomy 21 samples; -
FIG. 2 is a graph showing selection of adaptive reference samples belonging to a shared range of fraction of reads (Rf) and GC content from among reference samples, based on Rf and GC content of a targeted chromosome, according to an embodiment; -
FIG. 3A shows coefficient of variation for six sets of reference samples which were selected according to a shared region of GC and Rf based on a representative sample of chromosome T21 with a GC content of 0.416, andFIG. 3B shows results of calculating Z scores of T21 verification samples with a GC content of 0.41 and euploid verification samples using the euploid samples of sets A to F adaptively selected inFIG. 3A as reference samples; -
FIG. 4A shows coefficient of variation for six sets of reference samples which were selected according to a shared region of GC and Rf based on a representative sample of chromosome T21 with a GC content of 0.424, andFIG. 4B shows results of calculating Z scores of T21 verification samples with a GC content of 0.42 and euploid verification samples using the euploid samples of sets A to F adaptively selected inFIG. 4A as reference samples; -
FIG. 5A shows coefficient of variation for six sets of reference samples which were selected according to a shared region of GC and Rf based on a representative sample of chromosome T21 with a GC content of 0.437, andFIG. 5B shows results of calculating Z scores of T21 verification samples with a GC content of 0.43 and euploid verification samples using the euploid samples of sets A to F adaptively selected inFIG. 5A as reference samples; -
FIG. 6A shows coefficient of variation for four sets of reference samples which were selected according to a shared region of GC and Rf based on a representative sample of chromosome T21 with a GC content of 0.446, andFIG. 6B shows results of calculating Z scores of T21 verification samples with a GC content of 0.44 and euploid verification samples using the euploid samples of sets A to D adaptively selected inFIG. 6A as reference samples; -
FIG. 7A shows coefficient of variation for sets of reference samples which were selected according to a shared region of GC and Rf based on a sample of chromosome T18 with a GC content of 0.45,FIG. 7B shows z scores of all reference samples and a T18 sample with a GC content of 0.45, andFIG. 7C shows results of calculating Z scores of T18 test samples with a GC content of 0.45 and euploid test samples using the euploid samples adaptively selected as reference samples; -
FIG. 8A shows coefficient of variation for sets of reference samples which were selected according to a shared region of GC and Rf based on a sample of chromosome T13 with a GC content of 0.421,FIG. 8B shows z scores of all reference samples and a T13 sample with a GC content of 0.421, andFIG. 8C shows results of calculating Z scores of T13 test samples with a GC content of 0.421 and euploid test samples using the euploid samples adaptively selected as reference samples; and -
FIGS. 9A to 9F show the relationship between Rfs ofchromosomes 1 to 22 and GC contents of euploid reference samples (n=396) as confirmed by karyotyping. - Hereinafter, the present disclosure will be described in more detail with reference to Examples. However, these Examples are for illustrative purposes only, and the scope of the present disclosure is not intended to be limited by these Examples.
- 1. Preparation of Sample
- A total of 447 pregnant women were enrolled at 12 hospitals in Korea. Information of the test subjects is shown in Table 1 below.
-
TABLE 1 Characteristic Value No. of pregnant women 447 Maternal age (year) Mean 35 Range 20 to 46 Gestational age (week) Mean 15 Median 16 Range 11 to 22 Pregnancy First: 1-13 week gestation 137 (30.6) trimester (%) Second: 14-26 week gestation 310 (69.4) Third: 27-40 week gestation 0 Fetal sex (%) Male fetus 249 (52.5) Female fetus 225 (47.5) - Of the test subjects, 29 were carrying twins, and information thereof is shown in Table 2 below.
-
TABLE 2 Characteristic Value No. of pregnant women 29 carrying twins Maternal age (year) Mean 35 Range 22 to 43 Gestational age (week) Mean 14 Median 13 Range 11 to 21 Pregnancy First: 1-13 week gestation 16 (55.2) trimester (%) Second: 14-26 week gestation 13 (44.8) Third: 27-40 week gestation 0 Fetal sex (%) Male fetus 26 (48.1) Female fetus 28 (51.9) - Two pregnant women with unknown fetal sex were excluded.
- All 447 test subjects had amniocentesis for fetal karyotyping, the results of which were obtained by blind analysis. The institutional review board at each participating hospital approved this study. Written informed consent was obtained from all participants.
- All test subjects underwent standard prenatal aneuploidy screening in accredited clinical laboratories. First-trimester screening includes measurement of serum pregnancy-associated plasma protein A (PAPP-A), total or free beta subunit of human chorionic gonadotropin (hCG), and nuchal translucency. Second-trimester screening includes measurement of maternal serum alpha-fetoprotein (MSAFP), hCG, unconjugated estriol, and inhibin A.
- From the results of karyotyping, there were 13 fetuses with trisomy 21 (including three twin samples), one fetus with
trisomy 18 in a twin pregnancy, one fetus withtrisomy 13, and two fetuses with XXY. 17 samples with aneuploidy, 29 samples with twins, and 5 samples with higher GC contents were excluded from total 447 samples, and the remaining 396 samples were used as reference samples. - 2. Preparation of Cell-Free DNA and DNA Libraries for DNA Sequencing
- About 10 mL of peripheral blood was collected from each test subject described in 1. in a BCT™ tube (Streck, Omaha, NE, USA). Each of the collected blood samples was centrifuged at 1,200×g at 4° C. for 15 min. The plasma portion of blood was collected and centrifuged again at 16,000×g at 4° C. for 10 min. Cell-free DNA (cfDNA) was extracted from the centrifuged plasma by using a QIAamp circulating nucleic acid kit (Qiagen, Netherland).
- The end-repair of the obtained cfDNA was carried out using T4 DNA polymerase, Klenow DNA polymerase, and T4 polynucleotide kinase, and then cfDNA fragments were obtained again by using Agencourt AMPure XP.
- DNA libraries for ion proton sequencing systems were constructed from the prepared cfDNA according to the protocol provided by the manufacturer (Life Technologies, SD, USA). Proton PI Chip Kit version 2.0 was used to yield an average 0.3× sequencing coverage depth per nucleotide.
- 3. Massively Parallel Sequencing
- The DNA libraries prepared as in 2. were subjected to massively parallel sequencing by using ION PROTON™ system (Thermo Fisher Scientific).
- Different raw reads were obtained using ION TORRENT SUITE™ software (Thermo Fisher Scientific). The number of the obtained raw reads was about (7.4±2.1)×106 per sample on average.
- The reads were trimmed from the 3′ end by sequencing, and low-quality reads were excluded from the subjects of analysis. Further, the reads were filtered by a quality threshold value of 20 and a read length threshold of 50 bp.
- The filtered reads were aligned to the human genomic reference sequences hg19 using Burrows-Wheeler transform (BWT). Sequence reads mapped to only one genome location in hg19 were termed unique reads. About 44.6% (about 3.3×106) of the total reads were unique reads. The GC contents of the total 447 samples ranged from about 30% to about 60%.
- Meanwhile, duplicate DNA reads were removed from the subjects of analysis by Picard (http://picard.sourceforge.net/).
- 4. Correction and Normalization of DNA Reads
- In order to reduce the effect of GC bias in the DNA reads obtained in 3., and difference between samples, correction and normalization of the DNA reads were performed.
- First, all chromosomes were divided into segments with a bin size of 20 kb. The number of unique reads and GC content (rounded to 0.1%) in each bin were determined. Bins including reference sequences with undeterminable bases and bins without any reads were filtered.
- Then, a locally weighted scatterplot smoothing (LOESS) regression analysis was used. In detail, the fit predicted value (URloess) of each bin was obtained by the number of unique reads in each bin against the GC content (GCbin) of the corresponding bin according to the following equation: URloess=f(GCbin). The LOESS-corrected reads number (URcorrected) was calculated using the following equation: URcorrected=UR−[URloess−e(UR)], wherein e(UR) was the expected value for unique reads of each bin, which was set to the overall average unique reads number in each bin (Liao C. et al., Proc. Natl. Acad. Sci., 2014, 111(20):7415-7420).
- After LOESS correction, a fraction of reads (Rf) of sample i on the chromosome j was calculated by the following equation:
-
Rf ij′ =RC ij/Σj=1 22 RC ij (Equation 1). - In
Equation 1, Rfij′ represents a corrected fraction of reads on chromosome j in sample i, and RCij represents a corrected number of unique reads on chromosome j in sample i. - The normalized fraction of reads was calculated using the calculated Rfij′ according to the following equation:
-
Rf i′j′ =Rf ij′/Σi=1 N Rf ij′ (Equation 2). - In
Equation 2, Rfi′j′ represents a normalized fraction of reads on chromosome j in sample i, and N represents the total number of samples. - 5. Selection of Adaptive Reference Sample and Detection of Fetal Aneuploidy
- (1) Calculation of Z Scores for all Samples and Detection of Fetal Aneuploidy
- Fetal aneuploidy was detected in all samples according to a previous method of calculating z score.
- In detail, a full linear regression model for all samples was established, based on
-
Rf i′j′ =α+β×GC i′j′ +e (Equation 3). - A fitted predicted value of fraction of reads was calculated by the following equation:
-
Rf′ i′j′ =α+β×GC i′j′ (Equation 4). - In the above Equations, Rfi′j′ represents a normalized fraction of reads on chromosome j in sample i, Rf′i′j′ represents a fitted predicted value of fraction of reads on chromosome j in sample i, GCi′j′ represents a GC content on chromosome j in sample i, β represents a coefficient factor between a GC content and Rf, α represents a constant, and e represents a residual (R). The residual (R) was calculated according to
-
R=Rf i′j′ −Rf′ i′j′ (Equation 5); -
Equation 5, and fitted to a normal distribution. The z score for fetal aneuploidy was calculated by the following equation: z score=(R−R′)/σ′, wherein R represents a residual on the chromosome in the sample, R′ represents the average value of the residuals in reference samples or test samples, and σ′ represents the standard deviation of the residuals in reference samples or test samples. z score>3 represents a fraction of reads greater than that of the 99.9th percentile of the reference sample set. - Z scores of the euploid samples and trisomy 21 (T21) of all samples are shown in
FIG. 1 . As shown inFIG. 1 , a z score range from about 1 to about 3 forchromosome 21 overlapped in the euploid samples and T21 samples, and positive and negative results were not clearly distinguished, and a threshold was ambiguous, indicating that the method of detecting fetal aneuploidy using z scores of the whole reference samples shows low accuracy and specificity. - (2) Detection of T21 Sample Using Adaptive Reference Sample
- t was considered that the ambiguous threshold in the previous method of detecting fetal aneuploidy as described in 5.(1) could result from a suboptimal reference sample collection. Therefore, reference samples adapted to a test sample were selected from the whole reference samples, followed by statistical analysis.
- First, GC contents of 13 positive samples (e.g., T21 sample) were examined. The positive samples were categorized into four groups according to GC content regions (ranging from −0.005 to +0.005). The two positive samples in the GC content region of 0.41, the five positive samples in the GC content region of 0.42, the two positive samples in the GC content region of 0.43, and the four positive samples in the GC content region of 0.44 were clustered according to the GC regions, respectively. Representative positive sample was selected from each group, and the selected positive sample was used to generate a set of adaptive reference samples by increasing the GC content by 0.001 and the reads fraction by 0.00005.
- As adaptive reference samples, reference samples belonging to a shared range of the GC content and Rf were extracted from all reference samples. The GC content range was set from −0.001 to +0.001 as a unit value when setting the GC content of a test sample as the median. The Rf was set from −0.00005 to +0.00005 as a unit value when setting the Rf of a test sample as the median, which was determined by the fitting predicted fraction of Rf calculated as
-
Rf′ i′j′ =α+β×GC i′j′ (Equation 4) - from all reference samples.
- A coefficient of variation (CV) was used to evaluate performance between the previous method of using whole reference samples and the method of using adaptive reference samples.
- (i) Application of Adaptive Selection Method to T21 Test Samples in GC Content Region of 0.41
- The coefficient of variation for
chromosome 21 was calculated with and without adaptive sample selection using reference samples selected from a shared region of GC content 0.416±X and Rf linear predicted value±Y.FIG. 3A shows coefficient of variation with and without adaptive reference selection. - In
FIG. 3A , the baseline represents the coefficient of variation used to measure the genomic representation ofchromosome 21 among reference samples (n=396) without adaptive reference selection. Of the two test samples in the GC content region of 0.41 ofchromosome 21, one sample in the GC content region of 0.416 of chromosome T21 was selected as a representative test sample, and the other sample was used to verify results using the adaptive reference samples. Six sets of reference samples (A, B, C, D, E, and F) were selected according to shared ranges of GC content and Rf, based on the representative test samples (A: n=27, B: n=110, C: n=157, D: n=195, E: n=246, F: n=276), respectively. InFIG. 3A , CVs for the selected six sets of reference samples are shown. CVs for the selected sets, A to F were lower than CV of the baseline. Therefore, it was confirmed that the reference samples selected by adaptive selection may show uniform sample distribution and higher sensitivity and specificity for T21. - (A) to (F) of
FIG. 3B show results of calculating Z scores of T21 test samples and euploid test samples using the euploid samples of sets A to F adaptively selected inFIG. 3A as reference samples, and specifically, z scores of the euploid samples of adaptively selected sets A to F, and T21 test sample (n=1) in the GC content region of 0.41 (T21 (absolute value of GC content; absolute value of Rf), euploid (absolute value of GC content)). In (A), the reference samples corresponding to set A were selected from a shared range of GC content 0.416±0.009 and Rf linear predicted value±1e-05. In this regard, one remaining test sample not selected as the representative test sample was used as a verification sample. In (B) to (F), reference samples were selected in the same manner as in (A). z scores of the T21 test samples were calculated from the selected reference samples and test samples. Further, euploid samples arbitrarily selected in the GC content range of 0.416±0.001 were used as test samples, and the 6 sets of the reference samples were used to calculate z scores of the euploid test samples. - As shown in
FIG. 3B , when the adaptively selected sets A to F were used, the euploid samples (normal fetus) and the T21 samples were clearly distinguished, and a threshold which is the z score for T21 was unambiguous. - (ii) Application of Adaptive Selection Method to T21 Test Samples in GC Content Region of 0.42
- Similarly, of the five test samples in the GC content region of 0.42 of
chromosome 21, samples in the GC content region of 0.424 of chromosome T21 were selected as a representative test sample, and other samples were used to demonstrate results using the adaptive reference samples. Six sets of reference samples (A, B, C, D, E, and F) were selected according to shared ranges of GC content and Rf based on the representative test samples (A: n=37, B: n=210, C: n=120, D: n=166, E: n=226, F: n=278), respectively. InFIG. 4A , CVs for the selected six sets of reference samples are shown. - (A) to (F) of
FIG. 4B show results of calculating Z scores of T21 test samples and euploid test samples using the euploid samples of sets A to F adaptively selected inFIG. 4A as reference samples. In (A), the reference samples corresponding to set A were selected from a shared range of GC content 0.424±0.004 and Rf linear predicted value±1e-05. In this regard, the remaining test samples not selected as the representative test sample were used as verification samples. In (B) to (F), reference samples were selected in the same manner as in (A). z scores of the T21 test samples were calculated from the selected reference samples and test samples. Further, euploid samples arbitrarily selected in the GC content range of 0.424±0.001 were used as test samples, and the 6 sets of the reference samples were used to calculate z scores of the euploid test samples. As shown inFIG. 4B , when the adaptively selected sets A to F were used, normal fetus (euploid) and T21 fetus were clearly distinguished, and a threshold which is the z score for T21 fetus was unambiguous. - (iii) Application of Adaptive Selection Method to T21 Test Samples in GC Content Region of 0.43
- Similarly, of the two test samples in the GC content region of 0.43 of
chromosome 21, one sample in the GC content region of 0.437 of chromosome T21 was selected as a representative test sample, and the other sample was used to demonstrate results using the adaptive reference samples. Six sets of reference samples (A, B, C, D, E, and F) were selected according to shared ranges of GC content and Rf based on the representative test samples (A: n=31, B: n=90, C: n=138, D: n=189, E: n=227, F: n=292), respectively. InFIG. 5A , CVs for the selected six sets of reference samples are shown. - (A) to (F) of
FIG. 5B show results of calculating Z scores of T21 test samples and euploid test samples using the euploid samples of sets A to F adaptively selected inFIG. 5A as reference samples. In (A), the reference samples corresponding to set A were selected from a shared range of GC content 0.437±0.009 and Rf linear predicted value±1e-05. In this regard, the remaining test samples not selected as the representative test sample were used as verification samples. In (B) to (F), reference samples were selected in the same manner as in (A). z scores of the T21 test samples were calculated from the selected reference samples and test samples. Further, euploid samples arbitrarily selected in the GC content range of 0.437±0.001 were used as test samples, and the 6 sets of the reference samples were used to calculate z scores of the euploid test samples. As shown inFIG. 5B , when the adaptively selected sets A to F were used, normal fetus (euploid) and T21 fetus were clearly distinguished, and a threshold which is the z score for T21 fetus was unambiguous. - (iv) Application of Adaptive Selection Method to T21 Samples in GC Content Region of 0.44
- Similarly, of the four test samples in the GC content region of 0.44 of
chromosome 21, samples in the GC content region of 0.446 of chromosome T21 were selected as representative test samples, and other samples were used to demonstrate results using the adaptive reference samples. Four sets of reference samples (A, B, C, and D) were selected according to shared ranges of GC content and Rf based on the representative test samples (A: n=38, B: n=127, C: n=93, D: n=181), respectively. InFIG. 6A , CVs for the selected four sets of reference samples are shown. - (A) to (D) of
FIG. 6B show results of calculating Z scores of T21 test samples and euploid test samples using the euploid samples of sets A to D adaptively selected inFIG. 6A as reference samples. In (A), the reference samples corresponding to set A were selected from a shared range of GC content 0.446±0.011 and Rf linear predicted value±2e-05. In this regard, the remaining test samples not selected as the representative test sample were used as verification samples. In (B) to (D), reference samples were selected in the same manner as in (A). z scores of the T21 test samples were calculated from the selected reference samples and test samples. Further, euploid samples arbitrarily selected in the GC content range of 0.446±0.001 were used as test samples, and the 6 sets of the reference samples were used to calculate z scores of the euploid test samples. As shown inFIG. 6B , when the adaptively selected sets A to D were used, normal fetus (euploid) and T21 fetus were clearly distinguished, and a threshold which is the z score for T21 fetus was unambiguous. - (3) Detection of T18 Sample Using Adaptive Reference Sample
- Trisomy 18 (T18) sample was detected by the adaptive selection method as described in 5.(2).
- Because there was only one T18 sample, a representative test sample was also used as a test sample. One set (A) of reference samples was selected according to a shared range of GC content and Rf based on the representative test sample (A: n=8). In
FIG. 7A , CVs for the set A and reference value (non-selected reference sample) are shown.FIG. 7B shows Z scores of non-selected reference sample and T18 sample. - In
FIG. 7C , the reference sample corresponding to set A were selected from a shared range of GC content 0.45±0.014 and Rf linear predicted value±2e-05. Further, euploid samples arbitrarily selected in the GC content range of 0.45±0.001 were used as test samples, and the reference sample set was used to calculate z scores of the euploid test samples. - As shown in
FIGS. 7B and 7C , when the non-selected reference sample was used, normal fetus (euploid) and T18 fetus were not distinguished. In contrast, when the adaptively selected set A was used, normal fetus (euploid) and T18 fetus were clearly distinguished. - (4) Detection of T13 Sample Using Adaptive Reference Sample
- Trisomy 13 (T13) sample was detected by the adaptive selection method as described in 5.(2).
- Because there was only one T13 sample, a representative test sample was also used as a test sample. One set (A) of reference samples was selected according to a shared range of GC content and Rf based on the representative test sample (A: n=177). In
FIG. 8A , CVs for the set A and reference value (non-selected reference sample) are shown.FIG. 8B shows Z scores of non-selected reference sample and T13 sample. - In
FIG. 8C , the reference sample corresponding to set A were selected from a shared range of GC content 0.421±0.017 and Rf linear predicted value±0.0001. Further, euploid samples arbitrarily selected in the GC content range of 0.421±0.001 were used as test samples, and the reference sample set was used to calculate z scores of the euploid test samples. - As shown in
FIGS. 8B and 8C , when the non-selected reference sample was used, normal fetus (euploid) and T13 fetus were distinguished (z score difference of about 1.5), but when the adaptively selected set A was used, normal fetus (euploid) and T13 fetus were more clearly distinguished (z score difference of about 4). - (5) Relationship of Fraction of Reads and GC Content in Chromosomes
- The relationship of a fraction of reads and a GC content in respective chromosomes was calculated by fitting to a linear model, and results are shown in
FIGS. 9A and 9F .FIGS. 9A to 9F show the relationship between Rfs ofchromosomes 1 to 22 and GC contents of the euploid controls (n=396) as confirmed by karyotyping. - As shown in
FIGS. 9A to 9F , there was a linear relationship between Rf of chromosome and GC content. Therefore, in order to detect trisomy samples, reference samples belonging to a range set based on Rf of test chromosome and GC content may be selected from a target chromosome. - Accordingly, whether a test sample is a trisomy fetus or not may be detected with excellent sensitivity and specificity by comparing Z scores calculated from selected reference samples and Z score calculated from the test sample.
Claims (20)
1. A method of detecting chromosomal aneuploidy of a target fetal chromosome of a fetus using at least one processor, the method comprising:
isolating a plurality of nucleic acid fragments from a biological sample of a pregnant woman, wherein the biological sample was collected without performing an invasive method selected from chorionic villus sampling, amniocentesis, and sampling from an umbilical cord, and wherein the plurality of nucleic acid fragments includes cell-free fetal nucleic acid fragments;
obtaining sequence information (reads) indicating the plurality of nucleic acid fragments using the at least one processor;
mapping, using the at least one processor, the obtained reads to a reference human genome, to assign corresponding fragments among the plurality of the nucleic acid fragments to the target fetal chromosome as a test sample;
calculating, using the at least one processor, a guanine (G) and cytosine (C) (GC) content of the corresponding fragments on the target fetal chromosome and a fraction-of-reads (Rf) of the number of the corresponding fragments on the target fetal chromosome to the number of the plurality of the nucleic acid fragments, based on the reads indicating the plurality of the nucleic acid fragments mapped to the reference human genome including the corresponding fragments assigned to the target fetal chromosome as the test sample;
selecting, using the at least one processor, from reference samples obtained from biological samples of pregnant women carrying euploid fetuses, adaptive reference samples belonging to a shared range of Rf unit values and GC content unit values shared between the adaptive reference samples and the test sample, based on the calculated Rf content and the calculated GC content;
calculating, using the at least one processor, z scores of the selected adaptive reference samples and a z score of the sequences of the plurality of the nucleic acid fragments assigned to the target fetal chromosome as the test sample;
detecting, using the at least one processor, that the target chromosome has chromosomal aneuploidy without performing the invasive method, by comparing the calculated z scores of the selected adaptive reference samples with the z score of the test sample, when, the z score of the test sample is larger than the z scores of the selected adaptive reference samples.
2. The method of claim 1 , further comprising performing an invasive method selected from chorionic villus sampling, amniocentesis, and sampling from an umbilical cord, based on the determining that the target chromosome has chromosomal aneuploidy.
3. The method of claim 1 , wherein the sequence information (the reads) is obtained by a massively parallel sequencing system.
4. The method of claim 1 , wherein the selected adaptive reference samples have a lower coefficient of variance than reference samples without adaptive selection
5. The method of claim 1 , wherein the biological sample is blood, plasma, serum, urine, saliva, mucus, sputum, feces, tears, or a combination thereof.
6. The method of claim 1 , further comprising excluding, by the at least one processor, intervals with a low confidence level for the reads by examining depth distribution of the reads assigned to the reference human genome at each interval, after mapping the reads and assigning the corresponding fragments to the target fetal chromosome.
7. The method of claim 6 , wherein the interval is an interval set in units of about 5 kb to about 50 kb.
8. The method of claim 6 , wherein the excluding of intervals with a low confidence level for the reads comprises removing mismatches, removing multi-mapped reads, removing duplicated reads, or a combination thereof.
9. The method of claim 1 , further comprising performing locally weighted scatterplot smoothing (LOWESS or LOESS) regression analysis of the reads according to the following Equation 1 to reduce GC content bias, after assigning the corresponding fragments to the target fetal chromosome:
Rf ij′ =R ij/Σj=1 22 RC ij (Equation 1)
Rf ij′ =R ij/Σj=1 22 RC ij (Equation 1)
wherein Rfij′ represents a corrected fraction of reads on chromosome j in sample i, and RCij represents a corrected number of unique reads on chromosome j in sample i.
10. The method of claim 1 , further comprising performing normalization of the reads of the nucleic acid fragments according to the following Equation 2, after assigning the reads of the nucleic acid fragments to the chromosome:
Rf i′j′ =Rf ij′/Σi=1 N Rf ij′ (Equation 2)
Rf i′j′ =Rf ij′/Σi=1 N Rf ij′ (Equation 2)
wherein Rfi′j′ represents a normalized fraction of reads on chromosome j in sample i, and N represents the total number of samples.
11. The method of claim 1 , further comprising establishing a linear regression model from all of the reference samples.
12. The method of claim 1 , further comprising extending the unit values of Rf of the reference samples according to Rf values of the test sample, extending the unit values of GC of the reference samples according to GC contents of the test sample, or a combination thereof.
13. The method of claim 1 , wherein the calculating of z scores of the selected adaptive reference samples and z score of the test sample comprises performing a linear regression analysis according to the following Equation 3 and calculating a linear predicted value of Rf according to the following Equation 4:
Rfi′j′=α+β×GCi′j′+e (Equation 3)
Rfi′j′=α+β×GCi′j′+e (Equation 3)
Wherein, in Equation 3, Rfi′j′ represents a normalized fraction of reads on chromosome j in sample i, α represents a constant, β represents a coefficient factor between GC content and Rf, and e represents a residual (R); and
Rf′i′j′=α+β×GCi′j′ (Equation 4)
Rf′i′j′=α+β×GCi′j′ (Equation 4)
in Equation 4, Rfi′j′ represents a fitted predicted value of a fraction of reads on chromosome j in sample i, α represents a constant, and β represents a coefficient factor between GC content and Rf.
14. The method of claim 13 , wherein the calculating of z scores of the selected adaptive reference samples and z score of the test sample comprises calculating a residual (R) from a calculated value from the linear regression analysis and the calculated linear predicted value according to the following Equation 5, and calculating a Z score from the calculated residual according to the following Equation 6:
R=Rfi′j′−Rf′i′j′ (Equation 5); and
z score=(R−R′)/σ′ (Equation 6)
R=Rfi′j′−Rf′i′j′ (Equation 5); and
z score=(R−R′)/σ′ (Equation 6)
wherein, in Equation 6, R′ represents a mean value of a residual of an adaptive reference sample, R represents a residual value of the test sample, and σ′ represents a standard deviation of the residual of the adaptive reference sample.
15. The method of claim 1 , further comprising
selecting adaptive reference samples belonging to GC content±unit value of the target chromosome or GC content±unit value of the adaptive reference samples, as verification samples;
calculating z scores of the verification samples; and
verifying that the target chromosome has chromosomal aneuploidy by comparing the calculated z scores of the selected adaptive samples with the z score of the test sample.
16. The method of claim 1 , wherein the target chromosome is chromosome 13, chromosome 18, chromosome 21, an X chromosome, a Y chromosome, or a combination thereof.
17. The method of claim 1 , wherein the chromosomal aneuploidy is trisomy 13, trisomy 18, trisomy 21, XO, XXX, XXY, XYY, or a combination thereof.
18. A system containing a computer-readable medium having recorded thereon a program that performs the steps of:
obtaining sequence information (reads) indicating a plurality of nucleic acid fragments using at least one processor, wherein the plurality of nucleic acid fragments is isolated from a biological sample of a pregnant woman, wherein the biological sample was collected without performing an invasive method selected from chorionic villus sampling, amniocentesis, and sampling from an umbilical cord, and wherein the plurality of nucleic acid fragments includes cell-free fetal nucleic acid fragments;
mapping, using the at least one processor, the obtained reads to a reference human genome, to assign corresponding fragments among the plurality of the nucleic acid fragments to a target fetal chromosome as a test sample;
calculating, using the at least one processor, a guanine (G) and cytosine (C) (GC) content of the corresponding fragments on the target fetal chromosome and a fraction-of-reads (Rf) of the number of the corresponding fragments on the target fetal chromosome to the number of the plurality of the nucleic acid fragments, based on the reads indicating the plurality of the nucleic acid fragments mapped to the reference human genome including the corresponding fragments assigned to the target fetal chromosome as the test sample;
selecting, using the at least one processor, from reference samples obtained from biological samples of pregnant women carrying euploid fetuses, adaptive reference samples belonging to a shared range of Rf unit values and GC content unit values shared between the adaptive reference samples and the test sample, based on the calculated Rf content and the calculated GC content;
calculating, using the at least one processor, z scores of the selected adaptive reference samples and a z score of the sequences of the plurality of the nucleic acid fragments assigned to the target fetal chromosome as the test sample;
detecting, using the at least one processor, that the target chromosome has chromosomal aneuploidy without performing the invasive method, by comparing the calculated z scores of the selected adaptive reference samples with the z score of the test sample, when, the z score of the test sample is larger than the z scores of the selected adaptive reference samples.
19. The system of claim 18 , further comprising performing an invasive method selected from chorionic villus sampling, amniocentesis, and sampling from an umbilical cord, based on the determining that the target chromosome has chromosomal aneuploidy.
20. The system of claim 18 , wherein the sequence information (the reads) is obtained by a massively parallel sequencing system.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/225,618 US20230368918A1 (en) | 2016-01-25 | 2023-07-24 | Method of detecting fetal chromosomal aneuploidy |
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2016-0008903 | 2016-01-25 | ||
KR1020160008903A KR101739535B1 (en) | 2016-01-25 | 2016-01-25 | Method for detecting aneuploidy of fetus |
PCT/KR2017/000266 WO2017131359A1 (en) | 2016-01-25 | 2017-01-09 | Method for detecting fetal chromosomal aneuploidy |
US201816071883A | 2018-07-20 | 2018-07-20 | |
US18/225,618 US20230368918A1 (en) | 2016-01-25 | 2023-07-24 | Method of detecting fetal chromosomal aneuploidy |
Related Parent Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2017/000266 Continuation WO2017131359A1 (en) | 2016-01-25 | 2017-01-09 | Method for detecting fetal chromosomal aneuploidy |
US16/071,883 Continuation US11710565B2 (en) | 2016-01-25 | 2017-01-09 | Method of detecting fetal chromosomal aneuploidy |
Publications (1)
Publication Number | Publication Date |
---|---|
US20230368918A1 true US20230368918A1 (en) | 2023-11-16 |
Family
ID=59051233
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/071,883 Active 2040-09-04 US11710565B2 (en) | 2016-01-25 | 2017-01-09 | Method of detecting fetal chromosomal aneuploidy |
US18/225,618 Pending US20230368918A1 (en) | 2016-01-25 | 2023-07-24 | Method of detecting fetal chromosomal aneuploidy |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/071,883 Active 2040-09-04 US11710565B2 (en) | 2016-01-25 | 2017-01-09 | Method of detecting fetal chromosomal aneuploidy |
Country Status (3)
Country | Link |
---|---|
US (2) | US11710565B2 (en) |
KR (1) | KR101739535B1 (en) |
WO (1) | WO2017131359A1 (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108733979A (en) * | 2017-10-30 | 2018-11-02 | 成都凡迪医疗器械有限公司 | G/C content calibration method, device and the computer readable storage medium of NIPT |
KR102142914B1 (en) * | 2018-09-06 | 2020-08-11 | 이원다이애그노믹스(주) | Non-invasive prenatal testing method using cell free dna fragment derived maternal blood |
CN110993029B (en) * | 2019-12-26 | 2023-09-05 | 北京优迅医学检验实验室有限公司 | Method and system for detecting chromosome abnormality |
KR20230076686A (en) | 2021-11-24 | 2023-05-31 | 테라젠지놈케어 주식회사 | Method for detecting aneuploidy of fetus based on synthetic data |
KR20230157204A (en) | 2022-05-09 | 2023-11-16 | 테라젠지놈케어 주식회사 | Method for detecting aneuploidy of fetus based on synthetic positive data and synthetic negative data |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140235474A1 (en) * | 2011-06-24 | 2014-08-21 | Sequenom, Inc. | Methods and processes for non invasive assessment of a genetic variation |
WO2014133369A1 (en) * | 2013-02-28 | 2014-09-04 | 주식회사 테라젠이텍스 | Method and apparatus for diagnosing fetal aneuploidy using genomic sequencing |
-
2016
- 2016-01-25 KR KR1020160008903A patent/KR101739535B1/en active IP Right Grant
-
2017
- 2017-01-09 US US16/071,883 patent/US11710565B2/en active Active
- 2017-01-09 WO PCT/KR2017/000266 patent/WO2017131359A1/en active Application Filing
-
2023
- 2023-07-24 US US18/225,618 patent/US20230368918A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
US20190103187A1 (en) | 2019-04-04 |
KR101739535B1 (en) | 2017-05-24 |
US11710565B2 (en) | 2023-07-25 |
WO2017131359A1 (en) | 2017-08-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20230368918A1 (en) | Method of detecting fetal chromosomal aneuploidy | |
US20170363628A1 (en) | Means and methods for non-invasive diagnosis of chromosomal aneuploidy | |
US11339426B2 (en) | Method capable of differentiating fetal sex and fetal sex chromosome abnormality on various platforms | |
US10053729B2 (en) | Rapid aneuploidy detection | |
JP5659319B2 (en) | Non-invasive detection of genetic abnormalities in the fetus | |
DK2334812T3 (en) | Non-invasive diagnosis of fetal aneuploidy by sequencing | |
JP5938484B2 (en) | Method, system, and computer-readable storage medium for determining presence / absence of genome copy number variation | |
US20200255896A1 (en) | Method for non-invasive prenatal screening for aneuploidy | |
US20190032125A1 (en) | Method of detecting chromosomal abnormalities | |
KR101881098B1 (en) | Method for detecting aneuploidy of fetus | |
KR20230076686A (en) | Method for detecting aneuploidy of fetus based on synthetic data | |
WO2017051996A1 (en) | Non-invasive type fetal chromosomal aneuploidy determination method | |
KR101907650B1 (en) | Method of non-invasive trisomy detection of fetal aneuploidy | |
KR20230157204A (en) | Method for detecting aneuploidy of fetus based on synthetic positive data and synthetic negative data | |
KR102519739B1 (en) | Non-invasive prenatal testing method and devices based on double Z-score | |
KR20170036649A (en) | Method of non-invasive trisomy detection of fetal aneuploidy | |
US20220101947A1 (en) | Method for determining fetal fraction in maternal sample | |
WO2020119626A1 (en) | Method for non-invasive prenatal testing of fetus for genetic disease | |
TWI489305B (en) | Non-invasive detection of fetus genetic abnormality | |
정희정 | Non-invasive prenatal testing by using next-generation sequencing of cell-free DNA from maternal plasma in multifetal pregnancies | |
GB2564846A (en) | Prenatal screening and diagnostic system and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: GENOMECARE CO., LTD., KOREA, REPUBLIC OF Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KIM, SUN SHIN;JEONG, MYUNG JUN;MIN, KYUNG TAE;AND OTHERS;REEL/FRAME:064364/0094 Effective date: 20180716 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
AS | Assignment |
Owner name: THERAGEN GENOMECARE CO., LTD., KOREA, REPUBLIC OF Free format text: CHANGE OF NAME;ASSIGNOR:GENOMECARE CO., LTD.;REEL/FRAME:064868/0548 Effective date: 20180525 |