WO2021137770A1 - Procédé d'estimation de fraction foetale basé sur la détection et l'interprétation de variants nucléotidiques simples - Google Patents

Procédé d'estimation de fraction foetale basé sur la détection et l'interprétation de variants nucléotidiques simples Download PDF

Info

Publication number
WO2021137770A1
WO2021137770A1 PCT/SK2019/050016 SK2019050016W WO2021137770A1 WO 2021137770 A1 WO2021137770 A1 WO 2021137770A1 SK 2019050016 W SK2019050016 W SK 2019050016W WO 2021137770 A1 WO2021137770 A1 WO 2021137770A1
Authority
WO
WIPO (PCT)
Prior art keywords
fetal
reads
sequencing
dna
sample
Prior art date
Application number
PCT/SK2019/050016
Other languages
English (en)
Inventor
Werner KRAMPL
Marcel KUCHARÍK
Dávid SMOĽAK
Rastislav HEKEL
Jaroslav BUDIŠ
Tomáš SZEMES
Original Assignee
Geneton S.R.O.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Geneton S.R.O. filed Critical Geneton S.R.O.
Priority to PCT/SK2019/050016 priority Critical patent/WO2021137770A1/fr
Publication of WO2021137770A1 publication Critical patent/WO2021137770A1/fr

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6879Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for sex determination
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/10Ploidy or copy number detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Definitions

  • the invention generally relates to DNA sequencing of families and fetal fraction estimation from parent genomic data which can be utilized by non-invasive prenatal testing and falls into the field of bioinformatics.
  • the invention also relates to methods of estimation of genetic diseases in children of pregnant women. It further relates to fields of molecular biology and biotechnology.
  • Genome is physically stored in a double helix DNA molecule, which consists of two strands, each carrying a sequence of nucleotides (A, T, G, C) called bases.
  • a whole human genome is a sequence of roughly 3.2 billion DNA bases.
  • the reference genome artificial genome composed by scientists, is the most common sequence of bases in human DNA. Genome of every individual differs from the reference genome by around 0.5% of bases, owing to genetic variations. These variations make each genotype unique and some of them can have significant impact on human health.
  • Genome analysis starts with collection of a biological sample (e.g. blood, saliva, etc.) from which the DNA molecule is extracted and prepared for process called sequencing.
  • the DNA sequencing is biochemical process for determining the precise order of nucleotide bases within the DNA molecule.
  • the molecule When using the massive parallel sequencing technology (Mayer et al. 1998), the molecule must be fragmented and placed on a sequencing platform. Here the fragments are read in parallel creating digital sequences of DNA bases called reads. For example, to detect genomic variants in human genome, over a billion of reads are needed (Kim et al. 2015). These reads are randomly ordered with unknown direction and unknown DNA strand of origin. If genomic reads belong to an organism with known reference genome, such as human, they can be sorted in the process called mapping.
  • Aim of the mapping is to reconstruct the original genomic sequence.
  • Each read is aligned and thus mapped to the most probable region of origin on the reference genome.
  • An aligned and mapped read is often simply called alignment.
  • Set of aligned reads is de facto a digital copy of the DNA contained within the biological sample.
  • Aligned reads reveal differences between a sequenced and the reference genome, called genomic variants. Whole set or even specific selection of genomic variants is unique to each individual hence a genome is the ultimate person identifier. Although the majority of these variants have no apparent effect on an individual, genome wide association studies have linked some of the variants to diseases, appearance or even a behavior of an individual.
  • SNV single nucleotide variation
  • SNP single nucleotide polymorphism
  • Human genome is organized into 22 pairs of homologous chromosomes and one pair of sex chromosomes. In each pair, one chromosome is derived from mother and the other one from father.
  • the maternal and paternal chromosomes in a homologous pair have the same gene at the same locus, nevertheless alleles of this gene can differ between the chromosomes. If both alleles are identical, the organism is said to be homozygous for that locus. If they differ, the organism is said to be heterozygous for that locus.
  • Prenatal testing for genetic diseases in human fetus has become standard part of medical care during pregnancy.
  • the typical testing is screening for the presence of abnormal number of chromosomes within the fetus. It is focused on most common cases, which are trisomies on chromosomes 13, 18, and 21.
  • amniocentesis traditionally used method of prenatal testing, requires amniotic fluid which need to be extracted from directly from the uterus of a mother. Therefore, amniocentesis is an invasive procedure which puts mother and her fetus in risk of miscarriage caused by needle injury, infection or stress. For this reason, it is performed only in case of high-risk pregnancy, wherein the risk is evaluated by several safe but imprecise screening procedures. To eliminate the risks and inconvenience associated with amniocentesis, alternative non-invasive prenatal testing methods are being developed.
  • Non-invasive prenatal testing is possible owing to a recent discovery that plasma extracted from maternal blood, contains fragments of cell-free fetal DNA (cffDNA) (Lo et al. 1997). This fetal DNA can be detected from the fifth week of gestation and is collected with standard blood sampling from a mother.
  • cffDNA cell-free fetal DNA
  • Invasive methods examine fetal chromosomes directly within the cells of a fetus, whereas cffDNA contains only fragments from these chromosomes.
  • cffDNA contributes only around 10% of cell-free DNA (cfDNA) isolated from maternal plasma. According to the current knowledge, fetal fragments cannot be unambiguously distinguished from the maternal ones. For these reasons NIPT does still not achieve accuracy of invasive methods and further improvements are necessary for their full replacement.
  • NIPT methods are based on identification of cffDNA fragments produced by massive parallel sequencing technology that produces vast amount of DNA reads (exact number vary, depending on the specific sequencing technology and the number of reads needed to obtain researched information) at affordable costs for a patient. Higher number of sequenced reads is correlated with better prediction accuracy but leads to higher operation costs. As previous paragraph implicates, further improvements are necessary to make NIPT more accurate and affordable at the same time.
  • the proportion of cffDNA within isolated cfDNA is called fetal fraction and is important aspect of NIPT analysis.
  • a sample with a very low fetal fraction can be incorrectly diagnosed as healthy (false negative), since a genetic defect cause only a weak deviation from normal values that can be interpreted as measurement error. For this reason, it is important to interpret the analysis results with respect to the fetal fraction.
  • fetal fraction The relative amount of fetal DNA in blood of pregnant woman called a fetal fraction is a crucial parameter for accurate interpretation of non-invasive prenatal tests.
  • Several methods have been proposed for prediction of fetal fraction (Peng and Jiang. 2017).
  • FL model is then based on the size analysis of short and long DNA fragments in the maternal plasma with the assumption that ratio of short DNA fragments and long DNA fragments is correlated to proportion of fetal DNA fragments in maternal plasma. Dataset of measured values (read lengths distribution of each mother) is then divided into two groups: a training group and a validation group and together they are used to create a linear regression representing resulting fetal fraction from read lengths distribution (Yu et al. 2014).
  • SeqFF SeqFF
  • the method takes as input a vector of Loess- corrected fragment counts, partitioned into bins 50,000 bases long.
  • the fetal fraction is then determined using standard multivariate regression models. Estimation of these weights requires a huge amount of training samples that are scarcely available for small laboratories, and so the method is applicable only for established tests with a large cohort of pre-analyzed samples. (Kim et al. 2015).
  • the SANEFALCON method uses more detailed information of fragment origin based on molecular mechanisms of fragment degradation that differs between maternal and fetal fragments.
  • fragments from all training samples are aggregated together, to localize local coverage peaks.
  • Fragment length profiles differ slightly between fetal and maternal DNA due to differences in DNA degradation. Published results supported this hypothesis, however the accuracy of the method is not sufficient for reliable fetal fraction prediction (Straver et al. 2016).
  • the prior art further comprises the FetalQuantSD method (Jiang et al. 2016), a variation of a previous method FetalQuant by the same authors (Jiang et al. 2012).
  • the FetalQuant method estimates fetal fraction and as the input it needs deeply sequenced mother cells (purely mother DNA) and deeply sequenced cfDNA of pregnant woman.
  • the deep sequencing requirement is needed to be able to identify variants both for mother and for mixture of fetus-mother DNA.
  • These variants are then stored in compact VCF formats (defined in section "Definitions") and further analyzed.
  • the variants that are present in the mother-fetus mixture and not present in mother are categorized as fetus variants inherited from father and based on their frequency the fetal fraction is estimated. This fetal fraction is then cleaned from effects of sequential errors and other systematic bias.
  • the FetalQuantSD method improves the previous method, since it no longer relies on the deeply sequenced cfDNA of pregnant woman, but only shallow sequencing is needed. Flowever, deep sequencing of purely mother variants is still needed and the accuracy of the fetal fraction estimation is highly dependent on number of SNP loci identified in the mother and on the number of reads of the shallow cfDNA sequencing.
  • the FetalQuantSD computes the fetal fraction as follows: first, it compiles list of homozygous SNPs in mother from the first data set. Then, it searches for non-maternal alleles in reads from the second data set. These non-maternal alleles would potentially represent paternally inherited fetal alleles. Flowever, a small proportion of these non- maternal alleles could be caused by sequencing errors in maternal plasma and/or genotyping errors in maternal genomic DNA. The authors assume linear error rates across different cases and correct it by a linear regression on training set comprised of 23 samples.
  • the actual DNA fraction was deduced by comparing the aligned sequence reads to the sites where the maternal genotypes were homozygous (AA) and the fetal genotypes were heterozygous (AB) using the formula where p is the number of sequenced reads carrying fetal-specific alleles (i.e., allele B) and q is the number of sequenced reads carrying alleles shared by the mother and the fetus (i.e., allele A).
  • the proposed method can be used broadly, because it does not rely only on alleles that are homozygous in pregnant woman and heterozygous in the fetus and incorporates the processing of father's sample, thus the number of required reads of pregnant mother is smaller, e.g. only shallow sequencing of cell-free DNA from pregnant mother's blood is necessary.
  • the mother and father sequenced DNA is available upfront - which is slowly becoming standard nowadays, the whole method is faster and requires less computing resources when comparing with prior art methods.
  • the effective blood sample size can be smaller, and the method effectively requires less agents during sample preparation.
  • the proposed method computes the fetal fraction differently with likelihood of each hypothetical fetal fraction in mind - this allows the user to quantify the probability of each fetal fraction and not rely only on the best one.
  • the proposed invention can also supply confidence intervals for the computation, which is beneficial for quantifying the inaccuracies of this method, so it can be easily verified whether the fetal fraction estimation for an individual sample set is reliable or whether the test is to be repeated so it provides an reliable estimate.
  • variant refers to a difference between a genome and the reference genome. It is defined by start position (on the reference), reference allele and alternative allele. In other words, it is a replacement of reference allele by alternative allele at specific genomic position.
  • Allele - Depending on presence of one or more variants, same gene can have different forms, called alleles.
  • the term allele refers to particular sequence at the position of a variant. Reference allele is the sequence present in the reference genome. Alternative allele is a different sequence in place of reference allele. Variant is always described by one reference allele and at least one alternative allele.
  • Genome The complete set of DNA sequences within an organism.
  • Exome - Protein-coding subset of a genome which constitutes about 1 % of the human genome.
  • VCF file - File that contains variants of an individual in a concise format.
  • the VCF format is known to people skilled in using of the bioinformatic tools as the standard format to store variants of an individual.
  • Massively parallel sequencing or next generation sequencing (NGS) - Techniques, that are well known to the people skilled in the art, of sequencing of huge amounts of fragments of DNA.
  • the most widely known sequencing technologies in the field are lllumina or lonTorrent. Protocols for whole genome sequencing are known to the skilled person and can be found in the examples below.
  • a read is an inferred sequence of base pairs (or base pair probabilities) corresponding to all or part of a single DNA fragment. In other words, they are small continuous parts of an individual's DNA.
  • the read should be long enough to serve as a sequence tag, so it can be unambiguously mapped or assigned to a precise location to a reference genome - at least 30-35bp. Usually small degree of mismatch (1 bp) can encompass natural variation in individual genomes.
  • Deep vs shallow sequencing - Deep sequencing means to sequence with a very high coverage of the sequenced part (usually 30x and more for variant analysis). With WGS, this means sequencing of roughly hundreds of millions to billions of reads. Shallow sequencing commonly used in clinical practise especially in modern non-invasive prenatal testing on the other hand uses WGS of only 0.2x - 0.5x coverage so it is in order of millions to few tens of millions of reads.
  • SAM/BAM file that contains aligned sequencing reads in a text format (SAM) or a compressed binary format (BAM). For every read it contains its mapped position on the reference genome (if the mapping for that read was successful), the mapping quality, sequencing quality (if provided), the location of the paired read (in case of pair-end sequencing), and various other information. It is a standard for storing aligned reads. Each SAM/BAM file is dependent on the reference genome used - this information is stored in the header of the SAM/BAM file.
  • SAM text format
  • BAM compressed binary format
  • WES - Whole exome sequencing is sequencing targeted at exonic regions of a genome. It is a cost-effective alternative to WGS.
  • the goal of the present invention is to provide a novel and alternative method for determination of fetal fraction of cfDNA for non-invasive prenatal testing of pregnant women. Moreover, the goal of present invention is to provide user- defined confidence intervals and increased accuracy estimation of the prediction while minimizing the required number of reads from pregnant mother. Since it is based on SNP counting approach, it can be combined with other methods for estimating fetal fraction based on other principles to get even better results.
  • the method is reliant on 3 inputs
  • All the reads in input should cover the same regions, so there is overlap of SNP variants on all the three inputs - the accuracy of the predicted fetal fraction is dependent on the number of reads from the third input that cover a SNP variant identified from the first two inputs.
  • the reads from the mother-fetus mixture are extracted. For each such read, the number of mother and father alleles that correspond to this SNP is counted. From this, the likelihood of observing this read is calculated. Likelihood of observing all of the reads is then a simple product of likelihoods of observing individual reads, since reads are considered independent. This overall likelihood is dependent on the desired fetal fraction.
  • the invention then computed the overall likelihoods for fetal fractions from 0.01 % to 99.99% with a step of 0.01% and picks the one with the highest likelihood as the resulting fetal fraction. Confidence interval is computed from the precomputed likelihoods as the interval around the resulting fetal fraction that covers 95% of the sum of all likelihoods.
  • the invention needs three input files as was described before:
  • All three inputs (A1 , A2, A3) are mapped (101, 102, 103) to human reference genome obtaining intermediate BAM files BAM F , BAM M , and BAMp (B1 , B2, B3).
  • the choice of the mapping software should not affect the result much. Flowever, we recommend using modern mappers such as bowtie2 or BWA as their performance yields more accurate results (Li and Durbin 2009).
  • BAM F and BAM M are further processed (201 , 202) to obtain SNP variants stored in VCF files VCF F and VCF M (C1 , C2) - this process is called variant calling and it is well known in the field of bioinformatics. Again, choice of a variant calling tool should not affect results much, but we recommend using state-of-the-art variant callers as VarDict, Varscan (Koboldt et al. 2012), GATK (McKenna et al. 2010) or FreeBayes (Garrison and Marth 2012).
  • SNP positions that are either in VCF F or VCF M are extracted (301) and for every such position we search for reads in BAMP that cover these positions (D1).
  • Each of these reads can tell us, if the read comes from the mother or from the fetus, due to SNP distinctions between father and mother.
  • For every read from BAMp and for every SNP position on that read we count the number of the same alleles in mother (m c) and father (fc) (extracted from the corresponding VCF F and VCF M ). For example, mother is on some position heterozygous with C, T alleles. Father is homozygous with C, C alleles. In a mapped read from mother-fetus mixture (BAMP), C is observed on this position. Therefore, mother count is father count is 2
  • the fetal fraction is determined from the me and fc counts explained in the previous section.
  • the probability of observing read depending on mother count (mc) and father count (fc) and fetal fraction (ft) is computed by adding 3 distinct probabilities: probability of sequencing the read from mother probability of sequencing the read from fetus, but with inherited SNP from mother and probability of sequencing this read from the fetus with inherited SNP from father Together the whole probability can be written as following formula: Since sequencing multiple reads is considered as independent events, to compute the overall likelihood of observing all the reads simultaneously, a product of these individual probabilities is computed. This product can be mathematically described by the following function:
  • Nmc,fc denote the number of reads with mother count equal to me and father count equal to fc. Then this likelihood can be shortened as follows:
  • the invention then computes (401 ) these likelihoods (E1 ) for fetal fractions from 0.01 % to 99.99% with a step of 0.01% and picks the one with the highest likelihood (501 ) as the resulting fetal fraction (F1).
  • the precomputed likelihoods are used to compute intervals of confidence. Since the likelihoods are not symmetric around the highest point (most probable fetal fraction), they are computed from the selected fetal fraction outward. In the beginning, the confidence interval is just a single point - the selected fetal fraction. In next steps, the next bigger likelihood is added to the interval (either from left or from right of the interval depending on which likelihood is bigger) and the confidence of the interval is evaluated.
  • the confidence of an interval is simply a fraction of likelihood that it covers from the whole likelihood, so if the interval is from fetal fraction then its confidence is computed as fraction of these likelihoods to all likelihoods:
  • the method of the present invention can be largely automatized.
  • At least the bioinformatics part of the method i.e. processing of sequencing data and all subsequent determinations and calculations
  • may be performed using any suitable computing system such as for example PC equipped with a processor, peripheral input/output devices (e.g. ports, interfaces), memories (e.g. system memory, hard disk), keyboard, monitor, mouse etc.
  • computing system may be in data communication with the sequencing system providing the sequence data, preferably in the form of plurality of sequence reads (by a wire or wireless networking, bluetooth, internet, cloud etc.). It means that the computing system is configured for receiving sequence data from the sequencing system.
  • the suitable computing systems as well as means for connection with sequencing system are well known to the persons skilled in the art.
  • At least part of the method can be implemented as a software code, i.e. a plurality of instructions (computer programme) to be executed by a processor of a computing system.
  • the code may be comprised in the computer readable medium for storage or transmission such as for example RAM, ROM, hard-drive, SDS, CD, DVD, flash memory etc.
  • the code may be transmitted via any suitable wired, optical or wireless network, for example via internet.
  • the whole computer programme can be downloaded by the user (customer) via the internet.
  • the present invention relates also to a computer program product comprising a computer readable medium comprising a plurality of instructions for controlling a computing system to perform at least a portion of the method according to the invention, preferably portion thereof starting with the step of receiving sequence information from the random sequencing step performed with automated sequencing system.
  • the computer program mentioned above can be preferably introduced into computer of the sequencing system.
  • Figure 1 shows log-likelihood of observing all reads for a single sample for different fetal fractions. It this example, the best fetal fraction was 20.46% depicted by a black line. 95% confidence interval is from 16.25% to 25.22% as depicted by black dashed lines.
  • Figure 2 shows a flowchart describing the preparation of files as well as the determination of fetal fraction and confidence intervals.
  • DNA isolation is performed according to DNA Purification from Blood or Body Fluids (Spin protocol).
  • Biological input is prepared from 200 ⁇ I volume of blood sample. All centrifugation steps are carried out at room temperature. Pipet proteinase K into bottom of microcentrifuge tube and add blood sample to the microcentrifuge tube. Add AL Buffer and subsequently vortex for 15 seconds. Incubate at 56°C for 10 min. Briefly centrifuge to remove drops from the inside of the lid. Add 100% EtOH and vortex again for 15 seconds. After mixing briefly centrifuge to remove drops from the inside of the lid. Carefully pour obtained mixture to QIAamp Mini spin column, close cap and centrifuge at 6000 x g for 1 minute. Then place mini spin column in a new clean collection tube and discard the collection tube containing the filtrate.
  • Mini spin column 500 ⁇ I AW1 Buffer. Close the cap and centrifuge at 6000 x g for 1 minute. Place Mini spin column in another collection tube and discard the collection tube containing the filtrate. Add 500 ⁇ I AW2 Buffer to Mini spin column and centrifuge at full speed for 1 minute. Discard filtrate and reuse the same collection tube. Place Mini spin column in the same collection tube, add 500 ⁇ I EtOH. Subsequently centrifuge at full speed for 1 minute. Place Mini spin column in new collection tube and centrifuge at 14 000 rpm for 3 minutes. Place Mini spin column in new clean collection tube and discard collection tube containing the filtrate. Add 220 ⁇ I AE Buffer, incubate open column at 56°C for 3 minutes.
  • DNA isolation is performed according to TruSeq Nano Protocol (lllumina, Claim 1 c).
  • 68 ⁇ I of proteinase K is pipetted into the microcentrifuge tube and 680 ⁇ I of sample is added.
  • 680 ⁇ I Buffer AL is added to sample. Mixed by pulse-vortexing for 30 seconds. Incubated at 56°C for 18-20 minutes.
  • 680 ⁇ I EtOH (96-100%) is added and then vortexed for 1 minute and briefly centrifuged. 3x 685 ⁇ I mixture is carefully applied to the Mini spin column. Centrifuged at 6000g rpm for 1 minute. Mini spin column is placed in a clean collection tube. 600 ⁇ I Buffer AW1 is added and centrifuged at 6000g for 1 minute. Mini spin column is placed in a clean collection tube.
  • 710 ⁇ I Buffer AW2 is added and centrifuged at 6000g for 1 minute.
  • Mini spin column is placed in a clean microcentrifuge tube.
  • 710 ⁇ I EtOH (96-100%) is added and centrifuged at 6000g for 1 minute. Centrifuged at 20 OOOg for 3 minutes.
  • Mini spin column is placed in a clean microcentrifuge tube. Incubated at 56°C for 3 minutes (opened column). 37 ⁇ I MiliPoreWater is added. Incubated at room temperature for 30 minutes. Centrifuged at 6 OOOg for 1 minute.
  • End repair and Size selection 20 ⁇ I End repair mix 2 is added to 30 ⁇ I of sample from previous step. Shaked and centrifuged at 280g for 1 minute. Placed on the thermal cycler at the ERP program. Centrifuged at 280g for 1 minute. 55 ⁇ I Magnetic beads are added, shaked and spinned at 300g for 2-3 seconds. Incubated at room temperature for 5 minutes. Incubated at magnetic stand for 5 minutes. Stored 105 ⁇ I supernatant and rest supernatant is discarded. 189 ⁇ I magnetic beads are added to stored supernatant, shaked at 2000rpm for 1 minute and spinned. Incubated at room temperature for 5 minutes and on magnetic stand for same time. Supernatant is removed.
  • A-tailing 6,25 ⁇ I A-tail mix is added, shaked and spinned. Placed on the thermal cycler at the ATAIL70 program. Centrifuged at 280g for 1 minute. Adapter ligation: 1.25 ⁇ I of RSB, Ligation mix 2 and DNA adapters are added. Centrifuged at 280g for 1 minute. Placed on thermal cycler and the LIG program is run. 2,5 ⁇ I Stop ligation buffer is added, mixed, shaked and spinned. 21 ,25 ⁇ I magnetic beads are added in Round 1 , subsequently vortexed and shaked at 2000 rpm for 2 minutes. Incubated at room temperature for 5 minutes. Placed on magnetic stand for 5 minutes. All the supernatant is removed.
  • PCR enrichment Supernatant is placed on ice and 2,5 ⁇ I PPC and 10 ⁇ I EPM are added, mixed, shaked and spinned and subsequently placed on thermal cycler and the PCRNano program is run. Centrifuged at 280g for 1 minute. 25 ⁇ I magnetic beads are added and shaked at 2000 rpm for 1 minute. Incubated at room temperature for 5 minutes and subsequently incubated on magnetic stand for 5 minutes. All supernatant is removed. Washed 2 times with 200 ⁇ I 80% EtOH. Air-dried on the magnetic stand for 5 minutes. Added 200 ⁇ I RSB and mixed at 2000rpm for 1 minute. Incubated at room temperature for 2 minutes and subsequently on magnetic stand for 5 minutes. 19 ⁇ I supernatant to is transferred TSP1 plate.
  • Samples are sequenced on NextSeq 500 with read length is 2x35bp and output size is approximately 300-700 MB per sample in FASTQ format.
  • the Device also maps input FQP to human reference genome and subsequently creates file for FQp (BAMp). As a following step, the device loads variants from both BAMivi and BAMF (hereinafter also referred to as parents) into hashmaps, where key is a chromosome and a position of a given variant.
  • BAMp file for FQp
  • Inputs for this function are variant hashes from BAMM and from BAMF and subsequent inputs BAMM, BAMF, BAMp and information on which of BAMM, BAMF was inserted first.
  • Device's function Find starts by initializing a global empty array (filtered_reads) to store BAMP reads that match parent's variant position so the same read will not be processed twice.
  • Second step in the function is the iteration through records of one parent's variants. For every parent's variant record, the function checks if this variant is heterozygous - if not, continue with next variant record. Next the function checks if it is not genome insert or delete - if yes, continue with next variant record.
  • the function subsequently collects all BAMP'S reads matching this variant position. Following this step, for every read from this collection, the function sets alleles' bases for second parent from second parent's variant records and set which parent is the source of variant in BAMp read. Next the function computes the coverages of the parent's variant position for both parents separately.
  • this BAMP read is not in filtered_reads array (meaning the device's function is not processing the same read twice)
  • following row will be written into output file: read, read's index, reference, base from first father's allele, base from second father's allele, base from first mother's allele, base from second mother's allele, father's coverage of given position, mother's coverage of given position, which parent has the variant (mother, father, both).
  • This function is used for each parent's variants (i.e. is called two times, once with parents in reversed order).
  • the output of this program is a Tab-separated values file with information on BAMP'S reads matching parents' variant positions.
  • the tab-separated file with reads from BAMP that contain SNP and their matching parents' variant positions is the read with a separate program.
  • This program identifies reads, that carry valuable information: the parents' variants differ at the position of the SNP.
  • the reads are grouped by the mother and father counts explained in the description of the invention. For one sample, we have gathered 3861 reads with SNPs. The counts of these reads depending on the mother and father counts are summarized in the Table 1 .
  • log-likelihood curve is quite flat in the top, which is caused by low number of reads used.
  • the fetal fraction according to a standard fragment-length method for this sample is 17.01%, thus this estimate is well within 95% confidence interval.
  • Table 2 Table describes number of reads for several sample families and resulting fetal fraction with confidence intervals.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Wood Science & Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Genetics & Genomics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Pathology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

La présente invention concerne le test prénatal non invasif (NIPT) de plusieurs troubles génomiques du fœtus chez une femme enceinte. L'invention concerne un calcul et une estimation de la fraction fœtale (rapport de la quantité d'ADN fœtal acellulaire à la quantité d'ADN acellulaire dans l'échantillon) sur la base de données de séquençage obtenues à partir de plasma maternel et paternel qui est ensuite utilisé dans le NIPT. Plus particulièrement, l'invention est basée sur l'analyse d'un génome séquencé à partir d'une mère non enceinte et d'un père et, par la suite, à nouveau de la mère enceinte. Ensuite, toutes les données séquencées sont mises en correspondance avec le génome de référence et analysées pour des polymorphismes mononucléotidiques qu'elles contiennent. Enfin, avec toutes les données obtenues dans les étapes précédentes, l'estimation de la fraction foetale est calculée.
PCT/SK2019/050016 2019-12-30 2019-12-30 Procédé d'estimation de fraction foetale basé sur la détection et l'interprétation de variants nucléotidiques simples WO2021137770A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/SK2019/050016 WO2021137770A1 (fr) 2019-12-30 2019-12-30 Procédé d'estimation de fraction foetale basé sur la détection et l'interprétation de variants nucléotidiques simples

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/SK2019/050016 WO2021137770A1 (fr) 2019-12-30 2019-12-30 Procédé d'estimation de fraction foetale basé sur la détection et l'interprétation de variants nucléotidiques simples

Publications (1)

Publication Number Publication Date
WO2021137770A1 true WO2021137770A1 (fr) 2021-07-08

Family

ID=69165471

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SK2019/050016 WO2021137770A1 (fr) 2019-12-30 2019-12-30 Procédé d'estimation de fraction foetale basé sur la détection et l'interprétation de variants nucléotidiques simples

Country Status (1)

Country Link
WO (1) WO2021137770A1 (fr)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012088456A2 (fr) * 2010-12-22 2012-06-28 Natera, Inc. Procédés de recherche de paternité prénatale, non invasive
WO2013177581A2 (fr) * 2012-05-24 2013-11-28 University Of Washington Through Its Center For Commercialization Séquençage du génome complet d'un fœtus humain
AU2011218382B2 (en) 2010-02-19 2015-07-30 Sequenom, Inc. Methods for detecting fetal nucleic acids and diagnosing fetal abnormalities
WO2016127944A1 (fr) * 2015-02-10 2016-08-18 The Chinese University Of Hong Kong Détection de mutations utilisées pour le dépistage du cancer et l'analyse fœtale
US9493828B2 (en) 2010-01-19 2016-11-15 Verinata Health, Inc. Methods for determining fraction of fetal nucleic acids in maternal samples
US20170081720A1 (en) * 2015-09-22 2017-03-23 The Chinese University Of Hong Kong Accurate deduction of fetal dna fraction with shallow-depth sequencing of maternal plasma
WO2019020180A1 (fr) * 2017-07-26 2019-01-31 Trisomytest, S.R.O. Procédé de détection prénatal non effractif d'aneuploïdie chromosomique fœtale à partir du sang maternel sur la base d'un réseau bayésien
US10208348B2 (en) 2007-07-23 2019-02-19 The Chinese University Of Hong Kong Determining percentage of fetal DNA in maternal sample

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10208348B2 (en) 2007-07-23 2019-02-19 The Chinese University Of Hong Kong Determining percentage of fetal DNA in maternal sample
US9493828B2 (en) 2010-01-19 2016-11-15 Verinata Health, Inc. Methods for determining fraction of fetal nucleic acids in maternal samples
AU2011218382B2 (en) 2010-02-19 2015-07-30 Sequenom, Inc. Methods for detecting fetal nucleic acids and diagnosing fetal abnormalities
WO2012088456A2 (fr) * 2010-12-22 2012-06-28 Natera, Inc. Procédés de recherche de paternité prénatale, non invasive
WO2013177581A2 (fr) * 2012-05-24 2013-11-28 University Of Washington Through Its Center For Commercialization Séquençage du génome complet d'un fœtus humain
WO2016127944A1 (fr) * 2015-02-10 2016-08-18 The Chinese University Of Hong Kong Détection de mutations utilisées pour le dépistage du cancer et l'analyse fœtale
US20170081720A1 (en) * 2015-09-22 2017-03-23 The Chinese University Of Hong Kong Accurate deduction of fetal dna fraction with shallow-depth sequencing of maternal plasma
WO2019020180A1 (fr) * 2017-07-26 2019-01-31 Trisomytest, S.R.O. Procédé de détection prénatal non effractif d'aneuploïdie chromosomique fœtale à partir du sang maternel sur la base d'un réseau bayésien

Non-Patent Citations (17)

* Cited by examiner, † Cited by third party
Title
CHIUROSSA W, K ET AL.: "Non-invasive prenatal assessment of trisomy 21 by multiplexed maternal plasma DNA sequencing: large scale validity study", BMJ, 2011
GARRISON, ERIKGABOR MARTH, HOPLOTYPE-BASED VARIANT DETECTION FROM SHORT-READ SEQUENCING, 2012, Retrieved from the Internet <URL:http://arxiv.org/abs/1207.3907>
HUDECOVAIRENA ET AL.: "Maternal Plasma Fetal DNA Fractions in Pregnancies with Low and High Risks for Fetal Chromosomal Aneuploidies", PLOS ONE, 2014
JIANGPEIYONG,-XIANLU PENGXIAOXI SUKUN SUNSTEPHANIE C. Y.YUWENG IN CHU. TAK Y. LEUNG ET AL.: "FetalGuantSD: Accurate Quantification of Fetal DNA Fraction by Shallow-Depth Sequencing of Maternal Plasma DNA", NPJ GENOMIC MEDICINE, 2016, Retrieved from the Internet <URL:https://doi.org/10.1038/npjgenmed.2016.13>
JIANQPEIYONG. K. C. ALLEN CHANGARY J. W. LIAOYAMA W. L. ZHENGTAK Y. LEUNGROSSA W, K, CHIUYUK MING DENNIS LOHAO SUN: "FetalQuant: Deducing Fractional Fetal DNA Concentratioil from Massively Parallel Sequencing of DNA in Maternal Plasma", BIOINFORMATICS, vol. 28, no. 22, 2012, pages 2883 - 90, XP055127069, DOI: 10.1093/bioinformatics/bts549
KIM, KYUNG ET AL.: "Effect of Next-Generation Exome Sequencing Depth for Discovery of Diagnostic Variants", GENOMICS INFORM., vol. 13, no. 2, June 2015 (2015-06-01), pages 31 - 39
KIM, SUNG K.GREGORY HANNUMJENNIFER GEISJOHN TYNANGRANT HOGGCHEN ZHAOTAYLOR J. JENSEN: "Determination of Fetal DNA Fraction from the Plasfi)a of Pregnant Women Using Sequence Read Counts", PRENATAL DIAGNOSIS, vol. 35, no. 8, 2015, pages 810 - 15, XP055215002, DOI: 10.1002/pd.4615
KOBOLDT, D. C.Q. ZHANGD. E, LARSOND, SLIEN. MD. MCLELLANL. LINC. A. MILLIERE, B, MARDISL. DINGR, K. WILSON: "VarScan 2: Somatic Mutation and Copy Number Alteration Discovery in Cancer by Exome Sequencing", GENOME RESEARCH, 2012, Retrieved from the Internet <URL:https://doi.org/10.1101/gr129684.111>
LAI, ZHONGWUALEKSANDRA MARKOVETSMIIKAAHDESMAKIBRAD_CHAPMANOLIVER HOFMANNROBERT MCEWENJUSTIN JOHNISONBRIAN DOUGHERTYJ. CARL BARRETT: "VarDíct: A Novel and Versatile Variant Caller for next-Generation Sequencing in Cancer Research", NUCLEIC ACIDS RESEARCH, 2016, Retrieved from the Internet <URL:https://doi.org/10.1093/nar/gkw227>
LANGMEAD, BEN L. SALZBERG: "Fast Gapped-Read Alignment with Bowtie 2", NATURE METHODS, 2012, Retrieved from the Internet <URL:hups://doi.org/10.1038/nmeth.1923>
LI, HENGRICHARD DURBIN, FAST AND ACCURATE SHORT READ ALIGNMENT WITH BURROWS-WHEELER TRANS, vol. 25, no. 14, 2009, pages 1754 - 60
LO, YM DENNISNOEMI CORBETTAPAUL F. CHAMBERLAINVIK RAIIAN L. SARGENTCHRISTOPHER WG REDMANJAMES S. WAINSCOAT: "Presence of fetal DNA in maternal plasma and serum", THE LANCET, vol. 350, no. 9076, 1997, pages 485 - 487
MAYERPASCAL ET AL.: "DNA colony massively parallel sequencing ams98 presentation ''A very large scale, high throughput and low cost DNA sequencing method based on a new 2-dimensional DNA auto-patterning process", FIFTH INTERNATIONAL AUTOMATION IN MAPPING AND DNA SEQUENCING CONFERENCE, 1998
MCKENNA. A.M. HANNAE. BANKSA. SIVACHENKOK. CIBULSKISA. KERNVTSKYK, GARIMELLA ET AL.: "The Genome Analysis Toolkit: A Map Reduce Framework for Analyzing next-Generation DNA Sequencing Data", GENOME RESEARCH, 2010, Retrieved from the Internet <URL:https:/doi.org/10.1101/gr,107524.110>
PENGXIANLU LAURAPEIYONG JIANG: "Bioinformatics Approaches for Fetal DNA Fraction Estimation in Noninvasive Prenatal Testina", INT. J. MOL., 2017
STRAVERROYCEES B. M. OUDEJANSERIK ASISLERMANSMARCEL J. T. REINDERS: "Calculating the Fetal Fraction for Noninvasive Prenatal Testing Based on Genome-Wide Nudeosome Profiles", PRENATAL DIAGNOSIS, vol. 36, no. 7, 2016, pages 6 - 1,4-21, XP055478984, DOI: 10.1002/pd.4816
YU, STEPLIATIIE C. Y.K. C. ALLEN CHANYAMA W. L. ZHENGPEIYONG JIANGGARY J. W. LIAOHAO SUNRANJIT AKOLEKAR ET AL.: "Size-Based Molecular Diagnostics Using Plasma DNA for Noninvasive Prenatal Testing", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, vol. 111, no. 23, 2014, pages 0

Similar Documents

Publication Publication Date Title
US11725245B2 (en) Determining a nucleic acid sequence imbalance using multiple markers
AU2022200046B2 (en) Maternal plasma transcriptome analysis by massively parallel RNA sequencing
US10083273B2 (en) System and method for cleaning noisy genetic data and determining chromosome copy number
EP3800272B1 (fr) Analyse basée sur la taille de fraction d&#39;adn f tal dans le plasma maternel
US20190338349A1 (en) Methods and systems for high fidelity sequencing
EA033752B1 (ru) Способ определения по меньшей мере части генома плода на основе анализа материнского биологического образца
US20190338362A1 (en) Methods for non-invasive prenatal determination of aneuploidy using targeted next generation sequencing of biallelic snps
EP3564391A1 (fr) Procédé, dispositif et kit pour la détection d&#39;une mutation génétique chez un f tus
US20180142300A1 (en) Universal haplotype-based noninvasive prenatal testing for single gene diseases
EP3658689B1 (fr) Procédé de détection prénatal non effractif d&#39;aneuploïdie chromosomique f tale à partir du sang maternel sur la base d&#39;un réseau bayésien
WO2019025004A1 (fr) Procédé de détection prénatale non invasive d&#39;anomalies chromosomiques du sexe du fœtus et de détermination du sexe du fœtus en vue d&#39;une grossesse unique et d&#39;une grossesse gémellaire
WO2015042980A1 (fr) Procédé, système et support lisible par un ordinateur pour la détermination d&#39;informations de snp dans une région chromosomique prédéfinie
JP2020512000A (ja) 胎児の染色体異常を検出する方法
Russo et al. Comparative study of aCGH and Next Generation Sequencing (NGS) for chromosomal microdeletion and microduplication screening
JP7333838B2 (ja) 胚における遺伝パターンを決定するためのシステム、コンピュータプログラム及び方法
US11869630B2 (en) Screening system and method for determining a presence and an assessment score of cell-free DNA fragments
WO2021137770A1 (fr) Procédé d&#39;estimation de fraction foetale basé sur la détection et l&#39;interprétation de variants nucléotidiques simples
Rabinowitz et al. Genome-Wide Noninvasive Prenatal Diagnosis of SNPs and Indels
KR20170036648A (ko) 비침습적 태아 염색체 이수성 판별 방법
IL298246A (en) Noninvasive identification of fetal variants using haplotype analysis

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19836617

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19836617

Country of ref document: EP

Kind code of ref document: A1