EP3679156A1 - Systeme und verfahren zur nicht-invasiven genetischen präimplantationsdiagnose - Google Patents

Systeme und verfahren zur nicht-invasiven genetischen präimplantationsdiagnose

Info

Publication number
EP3679156A1
EP3679156A1 EP18778768.4A EP18778768A EP3679156A1 EP 3679156 A1 EP3679156 A1 EP 3679156A1 EP 18778768 A EP18778768 A EP 18778768A EP 3679156 A1 EP3679156 A1 EP 3679156A1
Authority
EP
European Patent Office
Prior art keywords
genomic
sequence
concatenated
genomic fragment
embryo
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP18778768.4A
Other languages
English (en)
French (fr)
Inventor
Santiago MUNNE-BLANCO
Dhruti Ashokbhai BABARIYA
Arun Prasad MANOHARAN
Dagan Wells
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CooperGenomics Inc
Original Assignee
CooperGenomics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CooperGenomics Inc filed Critical CooperGenomics Inc
Publication of EP3679156A1 publication Critical patent/EP3679156A1/de
Withdrawn legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6853Nucleic acid amplification reactions using modified primers or templates
    • C12Q1/6855Ligating adaptors
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/10Ploidy or copy number detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2521/00Reaction characterised by the enzymatic activity
    • C12Q2521/50Other enzymatic activities
    • C12Q2521/501Ligase
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2535/00Reactions characterised by the assay type for determining the identity of a nucleotide base or a sequence of oligonucleotides
    • C12Q2535/122Massive parallel sequencing

Definitions

  • the embodiments disclosed herein are generally directed towards systems and methods for non invasive genetic screening and/or diagnosis of embryos prior to implantation in an in vitro fertilization procedure. More specifically, there is a need for non invasive preimplantation screening and/or diagnostic systems and methods which can aid clinicians in the selection of embryos with the lowest risk of genetic abnormalities/defects and have the highest probability of uterine implantation success.
  • IVF In vitro fertilization
  • the process of fertilization involves extracting eggs, retrieving a sperm sample, and then manually combining an egg and sperm in a laboratory setting. The embryo(s) is then implanted in the host uterus to carry the embryo to term.
  • IVF procedures are expensive and can exact a significant emotional/physical toll on patients, so genetic screening of embryos prior to implantation is becoming an increasingly common for patients undergoing an IVF procedure.
  • Current methods of diagnosing genetic abnormalities in embryos and screening for viability of transfer i.e., embryo implantation viability
  • NI PGS non-invasive genetic screening and/or diagnostic
  • a method for determining copy number variation in an embryo candidate for in vitro fertilization (IVF) implantation is disclosed.
  • An embryo candidate is isolated from a plurality of embryos.
  • the embryo candidate is incubated in media that is substantially free of DNA.
  • a portion of the media is transferred to an amplification vessel, wherein the portion of media includes genomic fragments shed or secreted from the embryo candidate.
  • a plurality of genomic linker segments and ligase enzyme is added to the amplification vessel in conditions that catalyze the formation of concatenated genomic fragments containing at least one genomic linker segment and at least one genomic fragment from the isolated embryo candidate.
  • the concatenated genomic fragments are amplified in the amplification vessel.
  • Sequence information is obtained from the amplified concatenated genomic fragments.
  • the sequence information is aligned (mapped) against a reference genome. Copy number variations are identified in the embryo candidate when a frequency of genomic fragment sequence reads aligned to a chromosomal position on the reference genome deviates from a frequency threshold.
  • a method for identifying genomic features in an embryo candidate is disclosed.
  • An embryo candidate is isolated from a plurality of embryo candidates.
  • the embryo candidate is incubated in media that is substantially free of DNA.
  • a portion of the media is transferred to an amplification vessel, wherein the portion of media includes one more genomic fragments shed or secreted from the embryo candidate.
  • a plurality of genomic linker segments and a ligase enzyme is added to the amplification vessel in conditions that catalyze the formation of concatenated genomic fragments containing at least one genomic linker segment and at least one genomic fragment from the isolated embryo candidate.
  • the concatenated genomic fragments are amplified in the amplification vessel.
  • Sequence information is obtained from the concatenated genomic fragments.
  • the sequence information is aligned against a reference genome. Genomic features are identified on the aligned genomic fragment sequences.
  • a system for identifying genomic features in an embryo candidate includes a genomics sequencer, a computing device and a display.
  • the genomic sequencer is configured to obtain sequence information from concatenated genomic fragments derived from an embryo candidate.
  • the concatenated genomic fragments each contain at least one genomic linker segment and at least one genomic fragment from the embryo candidate.
  • the computing device is communicatively connected to the genomic sequencer and includes a sequence alignment engine and a genomic features identification engine.
  • the sequence alignment engine is configured to subtract out sequence information related to the genomic linker segment portion of the concatenated genomic fragments and align the genomic fragment sequences to a reference genome.
  • the genomic features identification engine is configured to identify genomic features in the aligned genomic fragment sequences.
  • the display is communicatively connected to the computing device and configured to display a report containing the identified genomic features.
  • a method for identifying genomic features in a tissue sample is disclosed.
  • Concatenated genomic fragment sequence reads are received containing at least one genomic linker segment sequence and at least one genomic fragment sequence from a tissue sample.
  • the genomic linker segment sequence portion of the concatenated genomic fragment sequence reads is subtracted out.
  • the concatenated genomic fragment sequence reads are aligned (mapped) to a reference genome. Genomic features are identified on the aligned genomic fragment sequences.
  • a non-transitory computer-readable medium in which a program is stored for causing a computer to perform a method for identifying genomic features in a tissue sample.
  • Concatenated genomic fragment sequence reads are received containing at least one genomic linker segment sequence and at least one genomic fragment sequence from a tissue sample.
  • the genomic linker segment sequence portion of the concatenated genomic fragment sequence reads are subtracted out.
  • the concatenated genomic fragment sequence reads are aligned (mapped) to a reference genome. Genomic features are identified on the aligned genomic fragment sequences.
  • Figure 1 illustrates a workflow for non-invasive preimplantation genetic screening of embryos, in accordance with some embodiments of the disclosure.
  • Figure 2 is an exemplary flowchart depicting an amplification protocol for amplifying short genomic fragments, in accordance with some embodiments of the disclosure.
  • Figure 3 illustrates the formation of concatenated fragments, in accordance with some embodiments of the disclosure.
  • Figure 4 is a block diagram that illustrates a computer system, in accordance with various embodiments.
  • FIG. 5 is a schematic diagram of a system for non- invasive preimplantation genetic screening of embryos, in accordance with various embodiments
  • Figure 6 is a depiction of how concatenated fragment reads are mapped to a reference genome, in accordance with various embodiments.
  • Figure 7 is an exemplary flowchart showing a method for aligning genomic fragment reads to identify various types of genomic features, in accordance with various embodiments.
  • Figure 8 is a flowchart showing a method for determining copy number variation in an embryo candidate, in accordance with various embodiments.
  • Figure 9 is a flowchart showing a method of identifying genomic features in an embryo candidate, in accordance with various embodiments.
  • Figure 10 is a flowchart showing a method for identifying genomic features from concatenated genomic fragment reads, in accordance with various embodiments.
  • one element e.g., a material, a layer, a substrate, etc.
  • one element can be “on,” “attached to,” “connected to,” or “coupled to” another element regardless of whether the one element is directly on, attached to, connected to, or coupled to the other element or there are one or more intervening elements between the one element and the other element.
  • elements e.g., elements a, b, c
  • such reference is intended to include any one of the listed elements by itself, any combination of less than all of the listed elements, and/or a combination of all of the listed elements. Section divisions in the specification are for ease of review only and do not limit any combination of elements discussed.
  • Enzymatic reactions and purification techniques are performed according to manufacturer's specifications or as commonly accomplished in the art or as described herein.
  • the techniques and procedures described herein are generally performed according to conventional methods well known in the art and as described in various general and more specific references that are cited and discussed throughout the instant specification. See, e.g., Sambrook et al, Molecular Cloning: A Laboratory Manual (Third ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. 2000).
  • the nomenclatures utilized in connection with, and the laboratory procedures and techniques described herein are those well known and commonly used in the art.
  • next generation sequencing refers to sequencing technologies having increased throughput as compared to traditional Sanger- and capillary electrophoresis-based approaches, for example with the ability to generate hundreds of thousands of relatively small sequence reads at a time.
  • next generation sequencing techniques include, but are not limited to, sequencing by synthesis, sequencing by ligation, and sequencing by hybridization. More specifically, the MISEQ, fflSEQ and NEXTSEQ Systems of Illumina and the Personal Genome Machine (PGM) and SOLiD Sequencing System of Life Technologies Corp, provide massively parallel sequencing of whole or targeted genomes.
  • PGM Personal Genome Machine
  • SOLiD Sequencing System of Life Technologies Corp
  • sequencing run refers to any step or portion of a sequencing experiment performed to determine some information relating to at least one biomolecule (e.g., nucleic acid molecule).
  • genomic features can refer to a genome region with some annotated function (e.g., a gene, protein coding sequence, mRNA, tRNA, rRNA, repeat sequence, inverted repeat, miRNA, siRNA, etc.) or a genetic/genomic variant (e.g., single nucleotide polymorphism/variant, insertion/deletion sequence, copy number variation, inversion, etc.) which denotes a single or a grouping of genes (in DNA or RNA) that have undergone changes as referenced against a particular species or sub-populations within a particular species due to mutations, recombination/crossover or genetic drift.
  • some annotated function e.g., a gene, protein coding sequence, mRNA, tRNA, rRNA, repeat sequence, inverted repeat, miRNA, siRNA, etc.
  • a genetic/genomic variant e.g., single nucleotide polymorphism/variant, insertion/deletion sequence,
  • Genomic variants can be identified using a variety of techniques, including, but not limited to: array-based methods (e.g., DNA microarrays, etc.), real-time/digital/quantitative PCR instrument methods and whole or targeted nucleic acid sequencing systems (e.g., NGS systems, Capillary Electrophoresis systems, etc.). With nucleic acid sequencing, coverage data can be available at single base resolution.
  • array-based methods e.g., DNA microarrays, etc.
  • real-time/digital/quantitative PCR instrument methods e.g., whole or targeted nucleic acid sequencing systems
  • whole or targeted nucleic acid sequencing systems e.g., NGS systems, Capillary Electrophoresis systems, etc.
  • coverage data can be available at single base resolution.
  • DNA deoxyribonucleic acid
  • A adenine
  • T thymine
  • C cytosine
  • G guanine
  • RNA ribonucleic acid
  • A U
  • U uracil
  • G guanine
  • nucleic acid sequencing data denotes any information or data that is indicative of the order of the nucleotide bases (e.g., adenine, guanine, cytosine, and thymine/uracil) in a molecule (e.g., whole genome, whole transcriptome, exome, oligonucleotide, polynucleotide, fragment, etc.) of DNA or RNA.
  • nucleotide bases e.g., adenine, guanine, cytosine, and thymine/uracil
  • sequence information obtained using all available varieties of techniques, platforms or technologies, including, but not limited to: capillary electrophoresis, microarrays, ligation-based systems, polymerase-based systems, hybridization-based systems, direct or indirect nucleotide identification systems, pyrosequencing, ion- or pH-based detection systems, electronic signature-based systems, etc.
  • a "polynucleotide”, “nucleic acid”, or “oligonucleotide” refers to a linear polymer of nucleosides (including deoxyribonucleosides, ribonucleosides, or analogs thereof) joined by internucleosidic linkages.
  • a polynucleotide comprises at least three nucleosides.
  • oligonucleotides range in size from a few monomeric units, e.g. 3-4, to several hundreds of monomeric units.
  • a polynucleotide such as an oligonucleotide is represented by a sequence of letters, such as "ATGCCTG,” it will be understood that the nucleotides are in 5'->3' order from left to right and that "A” denotes deoxyadenosine, “C” denotes deoxycytidine, “G” denotes deoxyguanosine, and “T” denotes thymidine, unless otherwise noted.
  • the letters A, C, G, and T may be used to refer to the bases themselves, to nucleosides, or to nucleotides comprising the bases, as is standard in the art.
  • fragment library refers to a collection of nucleic acid fragments, wherein one or more fragments are used as a sequencing template.
  • a fragment library can be generated, for example, by cutting or shearing a larger nucleic acid into smaller fragments.
  • Fragment libraries can be generated from naturally occurring nucleic acids, such as mammalian or bacterial nucleic acids. Libraries comprising similarly sized synthetic nucleic acid sequences can also be generated to create a synthetic fragment library.
  • a sequence alignment method can align a fragment sequence to a reference sequence or another fragment sequence.
  • the fragment sequence can be obtained from a fragment library, a paired-end library, a mate-pair library, a concatenated fragment library, or another type of library that may be reflected or represented by nucleic acid sequence information including for example, RNA, DNA, and protein based sequence information.
  • the length of the fragment sequence can be substantially less than the length of the reference sequence.
  • the fragment sequence and the reference sequence can each include a sequence of symbols.
  • the alignment of the fragment sequence and the reference sequence can include a limited number of mismatches between the symbols of the fragment sequence and the symbols of the reference sequence.
  • the fragment sequence can be aligned to a portion of the reference sequence in order to minimize the number of mismatches between the fragment sequence and the reference sequence.
  • the symbols of the fragment sequence and the reference sequence can represent the composition of biomolecules.
  • the symbols can correspond to identity of nucleotides in a nucleic acid, such as RNA or DNA, or the identity of amino acids in a protein.
  • the symbols can have a direct correlation to these subcomponents of the biomolecules.
  • each symbol can represent a single base of a polynucleotide.
  • each symbol can represent two or more adjacent subcomponent of the biomolecules, such as two adjacent bases of a polynucleotide.
  • the symbols can represent overlapping sets of adjacent subcomponents or distinct sets of adjacent subcomponents.
  • each symbol represents two adjacent bases of a polynucleotide
  • two adjacent symbols representing overlapping sets can correspond to three bases of polynucleotide sequence
  • two adjacent symbols representing distinct sets can represent a sequence of four bases.
  • the symbols can correspond directly to the subcomponents, such as nucleotides, or they can correspond to a color call or other indirect measure of the subcomponents.
  • the symbols can correspond to an incorporation or non-incorporation for a particular nucleotide flow.
  • a computer program product can include instructions to select a contiguous portion of a fragment sequence; instructions to map the contiguous portion of the fragment sequence to a reference sequence using an approximate string matching method that produces at least one match of the contiguous portion to the reference sequence.
  • a system for nucleic acid sequence analysis can include a data analysis unit.
  • the data analysis unit can be configured to obtain a fragment sequence from a sequencing instrument, obtain a reference sequence, select a contiguous portion of the fragment sequence, and map the contiguous portion of the fragment sequence to the reference sequence using an approximate string mapping method that produces at least one match of the contiguous potion to the reference sequence.
  • substantially means sufficient to work for the intended purpose.
  • the term “substantially” thus allows for minor, insignificant variations from an absolute or perfect state, dimension, measurement, result, or the like such as would be expected by a person of ordinary skill in the field but that do not appreciably affect overall performance.
  • substantially means within ten percent.
  • the term "plurality” can be 2, 3, 4, 5, 6, 7, 8, 9, 10, or more.
  • biological cells include eukaryotic cells, plant cells, animal cells, such as mammalian cells, reptilian cells, avian cells, fish cells, or the like, prokaryotic cells, bacterial cells, fungal cells, protozoan cells, or the like, cells dissociated from a tissue, such as muscle, cartilage, fat, skin, liver, lung, neural tissue, and the like, immunological cells, such as T cells, B cells, natural killer cells, macrophages, and the like, embryos (e.g., zygotes), oocytes, ova, sperm cells, hybridomas, cultured cells, cells from a cell line, cancer cells, infected cells, transfected and/or transformed cells, reporter cells, and the like.
  • a mammalian cell can be, for example, from a human, a mouse, a rat, a horse, a goat, a sheep
  • FIG. 1 illustrates a workflow 100 for non- invasive preimplantation genetic screening of embryos, in accordance with some embodiments of the disclosure.
  • an embryo candidate 104 for IVF implantation can be isolated from a pool of embryos and incubated for a period of time in a sample holder containing media that is substantially free of DNA 106 or other polynucleotides that can interfere with the genetic screening analysis.
  • a sample holder may include, but are not limited to, a test tube, pipette tube, petri dish, or a well/partition within a multi-partition/well plate.
  • the embryo candidate 104 can also be incubated in a continuous culture system whereby "fresh" culture media 106 is introduced using a continuous media feed line to the sample holder and "old” culture media 106 is continuously removed (and sampled) from the sample holder to maintain a substantially constant volume of media in the sample holder.
  • genomic fragments are regularly secreted by and/or shed from the embryo into the surrounding DNA-free media.
  • DNA free media that can be utilized in this workflow is ORIGIO SEQUENTIAL BLASTTM culture media of The Cooper Companies.
  • the embryo can be incubated in the culture media for a minimum of about 18 hrs. In other embodiments, the embryo can be incubated in the culture media between about 18 hours and about 144 hours. It should be understood that the embryos can be incubated in DNA free media for as long a period of time as is necessary for a sufficient quantity of genomic fragments to be secreted by and/or shed from the embryo to allow for a genetic screening analysis to be performed using the workflow 100.
  • the embryo is in the blastocyst stage of development when it is isolated and incubated in the DNA free media. In other embodiments, the embryo is in a multi-cell pre-blastocyst stage of development when it is isolated and incubated in the DNA free media.
  • the amplification protocol 108 uses a multiple displacement amplification (MDA) based whole genome amplification (WGA) technique.
  • MDA multiple displacement amplification
  • WGA whole genome amplification
  • MDA relies on priming of target DNA with random primers and the use of the strand-displacing cp29 polymerase (or its equivalent) to amplify substantially the entire DNA in a given sample. Compared with PCR-based WGA methods, MDA reduces amplification bias by orders of magnitude, generates longer genomic fragments and exhibits better genome coverage.
  • the amplification protocol 108 uses a multiple annealing and looping-based amplification cycles (MALBAC) based WGA technique.
  • MALBAC amplification technique uses special primers that allow amplicons to have complementary ends and therefore to loop, preventing DNA from being copied exponentially. This results in amplification of only the original genomic DNA. This controlled amplification consequently can reduce amplification bias and, by extension, can lower production of artifacts and lower incidences of false positive and false negative mutation calls on the isolated embryo candidate.
  • any type of WGA technique can be used in amplification protocol 108 as long as the technique generates sufficient quality and/or quantities of genomic fragments to be sequenced for a genetic screening analysis to be run using workflow 100.
  • genomic fragments from the isolated embryo 104) have been amplified to a sufficient quantity, they are sequenced 110 using a NGS or equivalent genomic sequencing system.
  • the sequencing workflow can begin with the fragments being sequenced 110 on a nucleic acid sequencer to provide hundreds, thousands or millions of nucleic acid sequence reads (i.e., sequence reads).
  • the genomic fragment sequence information can then be processed using a genomic data analytics pipeline 112 whereby the genomic fragment sequences are aligned (mapped) 114 against a reference genome and one or more secondary analytics tools/pipelines are used to help identify one or more genomic features 116 present in the genome of the embryo 104.
  • the genomic features 116 can be genomic variants such as insertions/deletions (INDEL), copy number variations (CNV), single nucleotide polymorphisms (SNP), duplications, inversions, translocations, etc.
  • the genomic features 116 can be genomic regions that have some annotated function such as a gene, protein coding sequence, mRNA, tRNA, rRNA, repeat sequence, inverted repeat, miRNA, siRNA, etc.
  • the genomic features 116 can be epigenetic changes on the genome (e.g., methylation, acetylation, ubiquitylation, phosphorylation, sumoylation, ribosylation, citrullination, etc.) that can affect gene expression and activity.
  • epigenetic changes on the genome e.g., methylation, acetylation, ubiquitylation, phosphorylation, sumoylation, ribosylation, citrullination, etc.
  • the reference genome is a human genome. In other embodiments, the reference genome is a genome of the animal species that the embryo originates from. It should be appreciated, however, that the reference genome can be an artificially created genome that is not associated with any particular animal species, but rather created for a particular analysis/application.
  • the analytics pipeline 112 can generate a genetic diagnostics report 118 providing information regarding inherited or non- inherited genetic conditions that the isolate embryo 104 has or is at risk for.
  • a "blank" or control sample is run side by side with the embyro candidate 104 through the entire workflow 100. That is, a portion of DNA free media (which was not used to incubate an embryo 104) is run through all the steps and processes of workflow 100. The results from analyzing the blank sample can serve as a control to ensure that the genomic features identified in the genome of the embryo is not an artifact of the amplification and/or systemic errors during sequencing.
  • Figure 2 is an exemplary flowchart depicting an amplification protocol 200 for amplifying short genomic fragments, in accordance with some embodiments of the disclosure.
  • the genomic fragments 202 (in the portion of media incubating the embryo) are combined with enzymes 204 and genomic linker segments 206 in conditions that catalyze the formation of concatenated fragments 208.
  • the ligation reaction is carried out at room temperature (without agitation) for about 16-18 hours (overnight incubation).
  • the ligation reaction mixture consists of 1 unit of DNA ligase in a buffer containing 50mM Tris HC1, lOmM MgCl 2 , ImM ATP and lOmM DTT at a pH of about 7.5 and a temperature of between about 20°C and about 25°C temperature.
  • the resulting concatenated fragments 208 are longer than the original genomic fragments 202, which helps to reduce amplification errors (when compared to amplifying the genomic fragments 202 individually) when the genomic fragments are amplified later in the protocol 200.
  • Concatenation can provide long templates (i.e., concatenated fragments) that are optimal for amplification using the cp29 enzyme, which isothermally amplifies DNA by multiple displacement amplification.
  • cp29 enzyme cannot efficiently and/or accurately amplify short fragments (i.e., amplicons shorter than about 30 base pairs), which has been demonstrated in validation experiments and hence it is pertinent that we create long concatenated fragments to capture the entirety of the short fragments of DNA extruded by the embryo into the culture media.
  • concatenation also helps in creating adequate templates for successful amplification by other whole genome amplification strategies such as Sureplex system (Illumina), MALBAC and DOP PCR.
  • the genomic fragment is a short genomic fragment that has a length of between about 30 base pairs (bps) and about 800 bps. In other embodiments, the genomic fragment is a short genomic fragment that has a length of between bout 150 bps to about 400 bps. In still other embodiments, the genomic fragment is a short genomic fragment that has a length of less than about 1000 bps.
  • the genomic linker segments 206 are essentially artificially created double-stranded "conjoint" oligonucleotide segments of a known length and nucleotide sequence. In some embodiments, the genomic linker segments 206 are between about 30 to 1000 bps in length. In other embodiments, the genomic linker segments 206 are between about 30 bps and about 500 bps in length. In still other embodiments, the genomic linker segments 206 are between about 50 bps to about 150 bps. In some embodiments, the genomic linker segments 206 are homopolymer oligonucleotide segments. In other embodiments, the genomic linker segments 206 are heteropolymer oligonucleotide segments.
  • the genomic linker segments 206 are blunt ended double-stranded oligonucleotide segments. In some embodiments, the genomic fragments 202 are enzymatically blunt ended prior to being ligated to the genomic linker segments 206.
  • ligases can be used to ligate the genomic fragments 202 to the genomic linker segments 206 to form the concatenated genomic fragments 208.
  • Some examples of ligases that can be used here include, but are not limited to, T3, T4, T7, or Ligase 1.
  • the concatenated fragments are formed in their container (e.g., well, pipette tube, etc.) they can be amplified 210 on a thermal cycler (or similar device) using WGA techniques such as MDA, MALBAC, etc.
  • FIG. 3 illustrates the formation of concatenated fragments, in accordance with some embodiments of the disclosure.
  • the genomic fragments 302 are first blunt ended using a blunting enzyme to fill-in or remove the 3' or 5' overhangs (i.e., unpaired nucleotides) 306 prior to the introduction of the genomic linker segments 308 and their ligation with a ligase 310 to form concatenated fragments 312.
  • the blunting enzyme employed can exhibit exonuclease activity to digest (remove) the overhangs or polymerase activity to synthesize (fill-in) the missing complementary bases on the overhang.
  • blunting enzymes include, but are not limited to, DNA Polymerase I Klenow fragment, T4 DNA Polymerase, and Mung Bean Nuclease.
  • the blunting reagent mixture used to blunt the dsDNA concatenated fragments includes T4 DNA polymerase (which has 3' ⁇ 5' exonuclease activity and 5' ⁇ 3' polymerase activity) and T4 Polynucleotide Kinase (which aids in phosphorylation of 5' ends of blunt ended DNA, necessary for subsequent ligation reaction).
  • DNA ligase can be introduced to ligate the genomic fragments 302 to the genomic linker segments 308.
  • the DNA ligase seals the 5' and 3' polynucleotide ends via nucleotidyl transfer steps involving ligase-adenylate and DNA-adenylate intermediates.
  • DNA ligases fall into two general categories: ATP-dependent DNA ligases (EC 6.5.1.1), and NAD (+) dependent DNA ligases (EC 6.5.1.2). NAD (+) dependent DNA ligases are found only in bacteria (and some viruses) while ATP-dependent DNA ligases are ubiquitous.
  • DNA ligase I links Okazaki fragments to form a continuous strand of DNA
  • DNA ligase II is an alternatively spliced form of DNA ligase III, found only in non-dividing cells
  • DNA ligase III is involved in base excision repair
  • DNA ligase IV is involved in the repair of DNA double-strand breaks by non-homologous end joining (NHEJ).
  • ligases there are two types of prokaryotic and one type of eukaryotic ligases that are particularly well suited for facilitating the blunt ended double stranded DNA ligation: Prokaryotic DNA ligases (T3 and T4) and Eukaryotic DNA ligase (Ligase 1).
  • T4 DNA ligase is used in the blunt end ligation process 310 for this protocol.
  • Bacteriophage T4 DNA ligase is a single polypeptide with a M.W of about 68,000 Daltons requiring ATP as energy source.
  • the maximal activity pH range is between about 7.5 to about 8.0.
  • the presence of Mg++ ion is preferred and the optimal concentration is about lOmM.
  • T4 DNA ligase has the unique ability to join sticky and blunt ended fragments.
  • T4 DNA ligase catalyzes phosphodiester bond formation between juxtaposed 5 'and 3' termini in the genomic fragments 302 and genomic linker segments 308 in three steps: 1) enzyme-adenylylate formation by reaction with ATP; 2) adenylyl transfer to a 5-phosphorylated polynucleotide to generate adenylylated DNA; and 3) phosphodi ester bond formation with release of AMP.
  • the ligation reaction can be carried out using 1 unit of T4 DNA ligase in a buffer consisting of 50mM Tris HC1, lOmM MgCl 2 , ImM ATP and lOmM DTT at a pH of about 7.5 and at a temperature of about 23°C.
  • the reaction mixture containing the T4 ligase, blunt ended DNA and the linker segments can be incubated for 16-18 hours, without agitation.
  • the concentration of the linker segment can range from about lpg to about lng.
  • a concatenated fragment 312 forms once a genomic fragment 302 is ligated to a genomic linker segment 308.
  • the concatenated fragment 312 includes a least one genomic fragment 302 that is ligated to at least one genomic linker segment 308.
  • the concatenated fragment 312 includes two or more genomic fragments 302 and at least one genomic linker segment 308, whereby the at least one genomic fragment 302 is ligated to each end of the genomic linker segment 308. It should be appreciated, however, that a concatenated fragment 312 can have essentially any combination of genomic fragments 312 and genomic linker segments 308 as long as the combination is suitable for the purposes of sequencing and subsequent genomic feature analysis
  • the concatenated fragments 312 After the formation of the concatenated fragments 312, they are amplified using WGA amplification technique 313 (such as PicoPlex, MDA, MALBAC, DOPlify etc.) and subsequently sequenced using a NGS (or equivalent) genomic sequencing system 316.
  • WGA amplification technique 313 such as PicoPlex, MDA, MALBAC, DOPlify etc.
  • FIG. 4 is a block diagram that illustrates a computer system 400, upon which embodiments of the present teachings may be implemented.
  • computer system 400 can include a bus 402 or other communication mechanism for communicating information, and a processor 404 coupled with bus 402 for processing information.
  • computer system 400 can also include a memory, which can be a random access memory (RAM) 406 or other dynamic storage device, coupled to bus 402 for determining instructions to be executed by processor 404.
  • RAM random access memory
  • Memory also can be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 404.
  • computer system 400 can further include a read only memory (ROM) 408 or other static storage device coupled to bus 402 for storing static information and instructions for processor 404.
  • ROM read only memory
  • a storage device 410 such as a magnetic disk or optical disk, can be provided and coupled to bus 402 for storing information and instructions.
  • computer system 400 can be coupled via bus 402 to a display 412, such as a cathode ray tube (CRT) or liquid crystal display (LCD), for displaying information to a computer user.
  • a display 412 such as a cathode ray tube (CRT) or liquid crystal display (LCD)
  • An input device 414 can be coupled to bus 402 for communicating information and command selections to processor 404.
  • a cursor control 416 such as a mouse, a trackball or cursor direction keys for communicating direction information and command selections to processor 404 and for controlling cursor movement on display 412.
  • This input device 414 typically has two degrees of freedom in two axes, a first axis (i.e., x) and a second axis (i.e., y), that allows the device to specify positions in a plane.
  • a first axis i.e., x
  • a second axis i.e., y
  • input devices 414 allowing for 3 dimensional (x, y and z) cursor movement are also contemplated herein.
  • results can be provided by computer system 400 in response to processor 404 executing one or more sequences of one or more instructions contained in memory 406.
  • Such instructions can be read into memory 406 from another computer-readable medium or computer-readable storage medium, such as storage device 410.
  • Execution of the sequences of instructions contained in memory 406 can cause processor 404 to perform the processes described herein.
  • hard-wired circuitry can be used in place of or in combination with software instructions to implement the present teachings.
  • implementations of the present teachings are not limited to any specific combination of hardware circuitry and software.
  • computer-readable medium e.g., data store, data storage, etc.
  • computer-readable storage medium refers to any media that participates in providing instructions to processor 404 for execution.
  • Such a medium can take many forms, including but not limited to, non-volatile media, volatile media, and transmission media.
  • nonvolatile media can include, but are not limited to, optical, solid state, magnetic disks, such as storage device 410.
  • volatile media can include, but are not limited to, dynamic memory, such as memory 406.
  • transmission media can include, but are not limited to, coaxial cables, copper wire, and fiber optics, including the wires that comprise bus 402.
  • Computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, or any other tangible medium from which a computer can read.
  • instructions or data can be provided as signals on transmission media included in a communications apparatus or system to provide sequences of one or more instructions to processor 404 of computer system 400 for execution.
  • a communication apparatus may include a transceiver having signals indicative of instructions and data.
  • the instructions and data are configured to cause one or more processors to implement the functions outlined in the disclosure herein.
  • Representative examples of data communications transmission connections can include, but are not limited to, telephone modem connections, wide area networks (WAN), local area networks (LAN), infrared data connections, NFC connections, etc.
  • FIG. 5 is a schematic diagram of a system for non-invasive preimplantation genetic screening of embryos 500, in accordance with various embodiments.
  • the system 500 includes a genomic sequencing system 502, a computing device 504 and a display/client terminal 510.
  • the computing device 504 can be communicatively connected to the genomic sequencing system 502 via a network connection that can be either a "hardwired" physical network connection (e.g., Internet, LAN, WAN, VPN, etc.) or a wireless network connection (e.g., Wi-Fi, WLAN, etc.).
  • a network connection can be either a "hardwired" physical network connection (e.g., Internet, LAN, WAN, VPN, etc.) or a wireless network connection (e.g., Wi-Fi, WLAN, etc.).
  • the computing device 504 can be a workstation, mainframe computer, distributed computing node (part of a "cloud computing" or distributed networking system), personal computer, mobile device, etc.
  • the genomic sequencing system 504 can be a nucleic acid sequencer (e.g., NGS, Capillary Electrophoresis system, etc.), real-time/digital/quantitative PCR instrument, microarray scanner, etc. It should be understood, however, that the genomic sequencing system 504 can essentially be any type of instrument that can generate nucleic acid sequence data from samples containing genomic fragments.
  • genomic sequencing system 502 can be used to practice variety of sequencing methods including ligation- based methods, sequencing by synthesis, single molecule methods, nanopore sequencing, and other sequencing techniques.
  • Ligation sequencing can include single ligation techniques, or change ligation techniques where multiple ligation are performed in sequence on a single primary nucleic acid sequence strand.
  • Sequencing by synthesis can include the incorporation of dye labeled nucleotides, chain termination, ion/proton sequencing, pyrophosphate sequencing, or the like.
  • Single molecule techniques can include continuous sequencing, where the identity of the nuclear type is determined during incorporation without the need to pause or delay the sequencing reaction, or staggered sequence, where the sequencing reactions is paused to determine the identity of the incorporated nucleotide.
  • the genomic sequencing system 502 can determine the sequence of a nucleic acid, such as a polynucleotide or an oligonucleotide.
  • the nucleic acid can include DNA or RNA, and can be single stranded, such as ssDNA and RNA, or double stranded, such as dsDNA or a RNA/cDNA pair.
  • the nucleic acid can include or be derived from a fragment library, a mate pair library, a chromatin immuno-precipitation (ChIP) fragment, or the like.
  • the genomic sequencing instrument 502 can obtain the sequence information from a single nucleic acid molecule or from a group of substantially identical nucleic acid molecules.
  • the genomic sequencing system 502 can output nucleic acid sequencing read data (genomic sequence information) in a variety of different output data file types/formats, including, but not limited to: *.fasta, *.csfasta, *.xsq, *seq.txt, *qseq.txt, *.fastq, *.sff, *prb.txt, *.sms, *srs and/or *.qv.
  • the analytics computing device 504 can be configured to host a sequence read alignment engine 506 and a genomic features identification engine 508.
  • the read alignment engine 506 can be configure to receive genomic fragment sequence information generated by the genomic sequence system 502 and align (map) the genomic fragment sequences to a reference genome. Examples of publically available sequence alignment software that can be used to align the fragment sequences include BLAT, BLAST, Bowtie, BWA, drFAST LAST, MOSAIK, NEXTGENMAP, etc.
  • the genomic features identification engine 508 can be configured to identify genomic features on the aligned sequences.
  • the genomic features identification engine 508 can be communicatively connected (e.g., a network connection to the analytics computing device 504, a serial bus connection to database storage that is local to the analytics computing device 504, a peripheral device connection to a peripheral storage device connected to the analytics computing device 504, etc.) to various public (e.g., the RefGene Database (UCSC), the Alternative Splicing Database (EBI), the dbSNP database (NCBI), the Genomic Structural Variation database (NCBI), the GENCODE database (UCSC), the PolyPhen database (Harvard), the SIFT database (NCBI), the 3000 Genomes Project database, the Database of Genomic Variants database (EBI), the Biomart database (EBI), Gene Ontology database (public), the BioCyc/HumanCyc database, the KEGG pathway database, the Reactome database, the Pathway Interaction Database (NIH), the Biocarta database, PANTHER database, etc.) and private databases to identify the genomic features in the align
  • the genomic features can be genomic variants such as insertions/deletions (INDEL), copy number variations (CNV), single nucleotide polymorphisms (SNP), duplications, inversions, translocations, etc.
  • the genomic features can be genomic regions that have some annotated function such as a gene, protein coding sequence, mRNA, tRNA, rRNA, repeat sequence, inverted repeat, miRNA, siRNA, etc.
  • the genomic features can be epigenetic changes on the genome (e.g., methylation, acetylation, ubiquitylation, phosphorylation, sumoylation, ribosylation, citrullination, etc.) that can affect gene expression and activity.
  • epigenetic changes on the genome e.g., methylation, acetylation, ubiquitylation, phosphorylation, sumoylation, ribosylation, citrullination, etc.
  • the functionalities of the read alignment engine 506 and genomic features identification engine 508 can be implemented as hardware, firmware, software, or any combination thereof.
  • the various engines depicted in Figure 5 can be combined or collapsed into a single engine, component or module, depending on the requirements of the particular application or system architecture.
  • the read alignment engine 506 and genomic features identification engine 508 can comprise additional engines or components as needed by the particular application or system architecture.
  • the results can be displayed on a display or client terminal 510 that is communicatively connected to the computing device 504.
  • client terminal 510 can be a thin client computing device.
  • client terminal 510 can be a personal computing device having a web browser (e.g., INTERNET EXPLORERTM, FIREFOXTM, SAFARITM, etc) that can be used to control the operation of the sequence alignment engine 506 and/or genomic features identification engine 508. That is, the client terminal 510 can access the sequence alignment engine 506 using a browser to control the operation of the sequence alignment engine 506.
  • a web browser e.g., INTERNET EXPLORERTM, FIREFOXTM, SAFARITM, etc
  • the sequence alignment criteria or logic can be modified depending on the requirements of the particular application.
  • client terminal 510 can access the genomic features identification engine 508 using a browser to control the database sources (e.g., the RefGene Database (UCSC), the Alternative Splicing Database (EBI), the dbSNP database (NCBI), the Genomic Structural Variation database (NCBI), the GENCODE database (UCSC), the PolyPhen database (Harvard), the SIFT database (NCBI), the 3000 Genomes Project database, the Database of Genomic Variants database (EBI), the Biomart database (EBI), Gene Ontology database (public), the BioCyc/HumanCyc database, the KEGG pathway database, the Reactome database, the Pathway Interaction Database (NTH), the Biocarta database, PANTHER database, etc.) used to identify the genomic features in the aligned sequences or the modify the summary reports generated.
  • the database sources e.g., the RefGene Database (UCSC), the Alternative Splicing Database (EBI), the dbSNP database (NCBI), the Geno
  • FIG. 6 is a depiction of how concatenated fragment reads are mapped to a reference genome, in accordance with various embodiments.
  • concatenated fragments are comprised of both genomic fragments that the candidate embryo has secreted or shed (in the media that it was incubated in) and artificially created double-stranded "conjoint" oligonucleotide segments (i.e., genomic linker segments) of a known length and nucleotide (base) sequence. Therefore, as depicted herein Figure 6, the concatenated fragment reads 602 are comprised of sequence reads of both the artificially synthesized genomic linker segments 604 and the genomic fragments 606 obtained from the embryo test media.
  • the concatenated fragment reads 602 are aligned (mapped) 608 to a reference genome 610 using any number of publically available sequence alignment tools including, but not limited to: BLAT, BLAST, BWA, Bowtie, drFAST LAST, MOSAIK, NEXTGENMAP, etc.
  • the parameters of the sequence alignment tool are modified to accommodate short fragment sequence read alignments.
  • the short genomic fragment reads have a length of between about 30 base pairs (bps) and about 800 bps. In other embodiments, the short genomic fragment reads have a length of between bout 150 bps to about 400 bps. In still other embodiments, the short genomic fragment reads have a length of less than about 1000 bps.
  • the genomic linker segments sequence reads are between about 30 to 1000 bps in length. In other embodiments, the genomic linker segment sequence reads are between about 30 bps and about 500 bps in length. In still other embodiments, the genomic linker segment sequence reads are between about 50 bps to about 150 bps. In some embodiments, the genomic linker segment sequence reads are homopolymer sequences. In other embodiments, the genomic linker segment sequence reads are heteropolymer oligonucleotide sequences.
  • genomic linker segment sequence reads are not naturally occurring they are algorithmically filtered out during the alignment of the concatenated fragment reads to the reference genome. That is, the alignment tool subtracts out the known sequences associated with the genomic linker segments and only aligns the sequences associated with the genomic fragments portion of the concatenated fragment reads to the reference genome.
  • the alignment tool selects the best alignment for each genomic fragment sequence read by determining the longest matching alignment position on the reference genome for each genomic fragment sequence read. That is, the alignment location where the longest consecutive sequence of bases on the genomic fragment sequence read matches to the reference genome. In other embodiments, the alignment tool selects the best alignment for each genomic fragment sequence read by determining the position on the reference genome where the most number of bases from the genomic fragment sequence reads match, regardless of whether they are consecutive or not.
  • genomic fragment sequence reads that align equally well to multiple locations on the reference genome are automatically discarded and not used in the identification of genomic features (e.g., SNPs, CNVs, Indels, etc.).
  • FIG. 7 is an exemplary flowchart showing a method for aligning concatenated genomic fragment sequence reads to identify various types of genomic features, in accordance with various embodiments.
  • the concatenated genomic fragment sequence reads 702 are first aligned to a reference genome 704.
  • the alignments are made using any number of publically available sequence alignment tools including, but not limited to: BLAT, BLAST, BWA, Bowtie, drFAST LAST, MOSAIK, NEXTGENMAP, etc.
  • the concatenated genomic fragment reads are sequence reads of both the artificially synthesized genomic linker segments and the genomic fragments obtained from the test sample (e.g., tissue, embryo, etc.).
  • genomic linker segments are not naturally occurring (in the human genome) they are algorithmically filtered out during the alignment of the concatenated fragment reads to the reference genome. That is, the alignment tool subtracts out the known sequences associated with the genomic linker segments and only aligns the sequences associated with the genomic fragments portion of the concatenated fragment reads to the reference genome.
  • the alignment tool selects the best alignment for each genomic fragment sequence read based on a set of parameters or factors 706, including, but not limited to, alignment score and whether there are multiple alignments for the genomic fragment reads.
  • the alignment score for a genomic fragment read alignment can be calculated (using Equation 1) as a function of a match criteria (e.g., a number of consecutive bases of the genomic fragment sequence read that matches to the reference genome, the absolute number of bases from the genomic fragment sequence read that matches to the reference genome, the percent sequence identity between the sequence and its match in the genome, etc.), a mismatch criteria and gap penalties.
  • a match criteria e.g., a number of consecutive bases of the genomic fragment sequence read that matches to the reference genome, the absolute number of bases from the genomic fragment sequence read that matches to the reference genome, the percent sequence identity between the sequence and its match in the genome, etc.
  • mismatch criteria e.g., a number of consecutive bases of the genomic fragment sequence read that matches to the reference genome, the absolute number of bases from the genomic fragment sequence read that
  • genomic fragment sequence reads that align equally well (e.g., have the same alignment score, etc.) to multiple locations on the reference genome are automatically discarded and not used in the identification of genomic features.
  • various analytics tools or callers can be used to identify genomic features on the aligned sequences 708.
  • these tools or callers can be configured to access various public (e.g., the RefGene Database (UCSC), the Alternative Splicing Database (EBI), the dbSNP database (NCBI), the Genomic Structural Variation database (NCBI), the GENCODE database (UCSC), the PolyPhen database (Harvard), the SIFT database (NCBI), the 3000 Genomes Project database, the Database of Genomic Variants database (EBI), the Biomart database (EBI), Gene Ontology database (public), the BioCyc/HumanCyc database, the KEGG pathway database, the Reactome database, the Pathway Interaction Database (NIH), the Biocarta database, PANTHER database, etc.) and/or private databases to identify the genomic features.
  • public e.g., the RefGene Database (UCSC), the Alternative Splicing Database (EBI), the dbSNP database (NCBI),
  • the genomic features can be genomic variants such as insertions/deletions (INDEL), copy number variations (CNV), single nucleotide polymorphisms (SNP), duplications, inversions, translocations, etc.
  • the genomic features can be genomic regions that have some annotated function such as a gene, protein coding sequence, mRNA, tRNA, rRNA, repeat sequence, inverted repeat, miRNA, siRNA, etc.
  • the genomic features can be epigenetic changes on the genome (e.g., methylation, acetylation, ubiquitylation, phosphorylation, sumoylation, ribosylation, citrullination, etc.) that can affect gene expression and activity.
  • epigenetic changes on the genome e.g., methylation, acetylation, ubiquitylation, phosphorylation, sumoylation, ribosylation, citrullination, etc.
  • SNPs can be called via local de-novo assembly of haplotypes 710.
  • aneuploiday can be called using an aneuploidy caller 714.
  • Copy Number Variants CNVs
  • the modified CNV caller can be configured to differentiate between biological and technical variation by normalization to a normal sample. Technical variations can occur due to bias in technology, for example, some regions in the genome can have more or less reads when sequenced due to high GC content bias (i.e., the proportion of G and C bases in a region and the count of fragments mapped to it), amplification bias, linker ligation etc.
  • CNV deletions or duplications are not real CNV deletions or duplications; but instead, are merely experimental artifacts.
  • biological variations are due to actual CNV deletions/duplications in the genome. For example, when the genome region (i.e., chromosomal position) of the sample (e.g., tissue, embryo, etc.) being tested has a CNV deletion it will have less reads in that region and when the genome has a CNV duplication it means that it has more reads in that region.
  • CBS circular binary segmentation
  • normalizations are performed to compare regions of one sample to all other samples that have been previously tested.
  • the logic being if there are technical variations they will affect all the samples within a sample test batch (i.e., the samples that are run through the amplification and sequencing workflow steps together) and not just one sample within a batch of samples. So if a sample shows a drop in the quantity of reads in a region which is also seen in other samples of the same sample batch then it is safe to conclude that it was a technical variation. However, if the drop is only seen in one sample in a sample batch and in no other sample in the same sample batch then it is highly likely to be a biological variation. This comparison can be done only when all samples are normalized to the same scale.
  • gene regions of interest are typically split into many small intervals of approximately 100 bps and the average depths (i.e., quantity of aligned reads) of the samples are calculated for each region. Even if individual interval shows variation, the Spline normalization performed smooths over the region, so that it removes smaller errors so that only significant variations in each region will be detectable. CNVs can then be identified by measuring significance using techniques such as Principal Component Analysis (PCA).
  • PCA Principal Component Analysis
  • the CBS algorithm is configured to identify the start and end positions for CNVs in a sample. That is, the CBS algorithm performs multiple passes through a sample whereby on the first pass the algorithm searches the entire sample, compiling a list of (start, end) position tuples in which statistically significant changes in read depth appear to have occurred. Among these tuples, the tuple containing the most dramatic change is identified as a CNV, and then the algorithm is reapplied recursively to the two pieces of the sample on either side of this tuple. The algorithm terminates when no statistically significant changes in read depth occur in any of the portions of the sample currently under evaluation.
  • the CBS algorithm compares the intervals before and after it and if they both show the same drop/increase it moves to the next interval. At the boundary of the variation, one side will have the signal while the other won't, which helps define the boundaries.
  • a quantiling function is used to partition by depth the reads for a particular sample to ascertain what constitutes a low, average and deep read depth for each genome region. The same procedure is then repeated for the median read depth at each genome region in the genome across all samples in the batch.
  • the read depth for a particular region in said sample is evaluated against the curve, by looking at the height on the curve corresponding to its region on the x-axis.
  • samples which have, for example, a large percentage of low coverage regions when compared to the median across samples will be modified in such a way that the upper portion of their low coverage regions will be re-interpreted as being of average coverage.
  • a sample shows a drop in reads in a region which is also seen in other samples then it can be classified as a technical variation, however if the drop is only seen in one sample and in no other sample in the batch then it can be classified as a biological variation. This is accounted for by dividing a sample's read depth at a particular region by the median read depth at that same region across all samples in a batch.
  • FIG. 8 is a flowchart showing a method for determining copy number variation in an embryo candidate, in accordance with various embodiments.
  • method 800 details an exemplary workflow for identifying copy number variations in an embryo candidate.
  • an embryo candidate is isolated from a plurality of fertilized embryos and placed into a container.
  • the embryo candidate can be isolated from a plurality of fertilized embryos each of which can be a candidate for IVF implantation.
  • the embryo candidate is in the blastocyst stage of embryongenesis.
  • the embryo candidate is a human embryo.
  • isolation step 802 is performed using conventional sterile techniques or in a sterile hood to ensure that the isolated embryo candidate is not contaminated with genomic matter that may lead to erroneous test results.
  • the embryo candidate is incubated in media that is substantially free of DNA.
  • the embryo is incubated for as long of a period of time as is required (while still keeping the embryo candidate viable for IVF implantation) for a sufficient quantity of DNA fragments (i.e., genomic fragments) to be secreted or shed from the embryo candidate to the DNA free media for a copy number variation analysis to be performed using method 800.
  • the embryo can be incubated in the culture media for a minimum of about 18 hrs. In other embodiments, the embryo can be incubated in the culture media for between about 18 hours and about 144 hours.
  • DNA free media that can be utilized in this workflow is ORIGIO SEQUENTIAL BLASTTM culture media of The Cooper Companies.
  • the media can be substantially free of oligonucletides and not just DNA to ensure the lowest possible chance of erroneous analysis results or artifact formation during amplification.
  • a portion of the media is transferred to an amplification vessel, wherein the portion of media includes one or more genomic fragments (i.e., DNA fragment) shed or secreted from the embryo candidate.
  • genomic fragments i.e., DNA fragment
  • an amplification vessel that can be used include, but are not limited to, a test tube, pipette tube, petri dish, or a well/partition within a multi- partition/well plate.
  • a plurality of linker segments and ligase enzyme is added to the amplification vessel in conditions that catalyze the formation of concatenated genomic fragments containing at least one genomic linker segment and at least one genomic fragment (from the embryo candidate).
  • the genomic fragments obtained from the media are considered "short" genomic fragments.
  • the short genomic fragments have lengths of between about 30 base pairs (bps) and about 800 bps.
  • the short genomic fragments have a length of between about 150 bps to about 400 bps.
  • the short genomic fragments have a length of less than about 1000 bps.
  • the genomic linker segments are essentially artificially created double-stranded "conjoint" oligonucleotide segments of a known length and nucleotide sequence.
  • the genomic linker segments are between about 30 to 1000 bps in length.
  • the genomic linker segments are between about 30 bps and about 500 bps in length.
  • the genomic linker segments are between about 50 bps to about 150 bps.
  • the genomic linker segments are homopolymer oligonucleotide segments.
  • the genomic linker segments are heteropolymer oligonucleotide segments.
  • the genomic linker segments are blunt ended double-stranded oligonucleotide segments. In some embodiments, the genomic fragments are enzymatically blunt ended prior to being ligated to the genomic linker segments using methods that were previously disclosed above.
  • ligases Various types of prokaryotic and eukaryotic enzymes (i.e., ligases) can be used to ligate the genomic fragments to the genomic linker segments to form the concatenated genomic fragments.
  • Some examples of ligases that can be used here include, but are not limited to, T3, T4, T7, or Ligase 1.
  • the concatenated genomic fragments are amplified in the amplification vessel.
  • the concatenated genomic fragments are amplified on a thermal cycler (or similar device) using WGA techniques such as MDA, MALBAC, etc.
  • sequence information from the amplified concatenated genomic fragments is obtained from sequencing the concatenated fragments on a NGS or equivalent genomic sequencing system.
  • the sequence information includes both genomic fragment sequence reads (obtained from genomic fragments isolated from the embryo candidate) and genomic linker segment sequence reads (which were artificially created and ligated to the genomic fragments prior to amplification in step 810).
  • the sequence information is aligned against a reference genome using a publically available or proprietary sequence alignment tool.
  • publically available sequence alignment tools that can be used to align the fragment sequences include, but are not limited to, BLAT, BLAST, BWA, Bowtie, drFAST LAST, MOSAIK, NEXTGENMAP, etc.
  • the alignment tool subtracts out the known sequences associated with the genomic linker segments and only aligns the sequences associated with the genomic fragments portion of the concatenated fragment reads to the reference genome.
  • the alignment tool selects the best alignment for each genomic fragment sequence read by determining the longest matching alignment position on the reference genome for each genomic fragment sequence read. That is, the alignment location where the longest consecutive sequence of bases on the genomic fragment sequence read matches to the reference genome. In other embodiments, the alignment tool selects the best alignment for each genomic fragment sequence read by determining the position on the reference genome where the most number of bases from the genomic fragment sequence reads match, regardless of whether they are consecutive or not. In some embodiments, genomic fragment sequence reads that align equally well to multiple locations on the reference genome are automatically discarded and not used.
  • step 816 copy number variations in the embryo candidate's genome are identified when a frequency of genomic fragment sequence reads aligned to a chromosomal position on the reference genome deviates from a frequency threshold.
  • a deviance occurs when the frequency of genomic fragment sequences aligned to a chromosomal position is below the frequency threshold (i.e., fragment alignment frequency in a normal genome). That is, when the chromosomal position of the sample (e.g., tissue, embryo, etc.) being tested has a CNV deletion it will have less reads (i.e. frequency of reads aligned) in that region than in a normal genome.
  • a deviance occurs when the frequency of genomic fragment sequences aligned to a chromosomal position is above the frequency threshold. That is, when the chromosomal position has CNV duplication it means that it has more reads in that region than in a normal genome.
  • FIG. 9 is a flowchart showing a method of identifying genomic features in an embryo candidate, in accordance with various embodiments.
  • method 900 details an exemplary workflow for identifying genomic features in an embryo candidate.
  • an embryo candidate is isolated from a plurality of embryo candidates.
  • the embryo candidate can be isolated from a plurality of fertilized embryos each of which can be a candidate for IVF implantation.
  • the embryo candidate is in the blastocyst stage of embryongenesis.
  • the embryo candidate is a human embryo.
  • the embryo candidate is incubated in media that is substantially free of DNA.
  • the embryo is incubated for as long of a period of time as is required (while still keeping the embryo candidate viable for IVF implantation) for a sufficient quantity of DNA fragments (i.e., genomic fragments) to be secreted or shed from the embryo candidate to the DNA free media for a copy number variation analysis to be performed using method 900.
  • DNA free media that can be utilized in this workflow is ORIGIO SEQUENTIAL BLASTTM culture media of The Cooper Companies.
  • the media can be substantially free of oligonucleotides and not just DNA to ensure the lowest possible chance of erroneous analysis results or artifact formation during amplification.
  • a portion of the media is transferred to an amplification vessel, wherein the portion of media includes one or more genomic fragments (i.e., DNA fragment) shed or secreted from the embryo candidate.
  • genomic fragments i.e., DNA fragment
  • an amplification vessel that can be used include, but are not limited to, a test tube, pipette tube, petri dish, or a well/partition within a multi- partition/well plate.
  • a plurality of linker segments and ligase enzyme is added to the amplification vessel in conditions that catalyze the formation of concatenated genomic fragments containing at least one genomic linker segment and at least one genomic fragment from the embryo candidate.
  • the genomic fragments isolated from the media are considered "short" genomic fragments.
  • the short genomic fragments have lengths of between about 30 base pairs (bps) and about 800 bps. In other embodiments, the short genomic fragments have lengths of between bout 150 bps to about 400 bps. In still other embodiments, the short genomic fragments have lengths of less than about 1000 bps.
  • the genomic linker segments are essentially artificially created double-stranded "conjoint" oligonucleotide segments of a known length and nucleotide sequence.
  • the genomic linker segments are between about 30 to about 1000 bps in length.
  • the genomic linker segments are between about 30 bps and about 500 bps in length.
  • the genomic linker segments are between about 50 bps to about 150 bps.
  • the genomic linker segments are homopolymer oligonucleotide segments.
  • the genomic linker segments are heteropolymer oligonucleotide segments.
  • the genomic linker segments are blunt ended double-stranded oligonucleotide segments. In some embodiments, the genomic fragments are enzymatically blunt ended prior to being ligated to the genomic linker segments using methods that were previously disclosed above.
  • ligases Various types of prokaryotic and eukaryotic enzymes (i.e., ligases) can be used to ligate the genomic fragments to the genomic linker segments to form the concatenated genomic fragments.
  • Some examples of ligases that can be used here include, but are not limited to, T3, T4, T7, or Ligase 1.
  • the concatenated genomic fragments are amplified in the amplification vessel.
  • the concatenated genomic fragments are amplified on a thermal cycler (or similar device) using WGA techniques such as MDA, MALBAC, etc.
  • sequence information from the amplified concatenated genomic features are obtained from sequencing the concatenated fragments on a NGS or equivalent genomic sequencing system.
  • the sequence information includes both genomic fragment sequence reads (obtained from genomic fragments isolated from the embryo candidate) and genomic linker segment sequence reads (which were artificially created and ligated to the genomic fragments prior to amplification in step 910).
  • the sequence information is aligned against a reference genome using a publically available or proprietary sequence alignment tool.
  • publically available sequence alignment tools that can be used to align the fragment sequences include, but are not limited to, BLAT, BLAST, BWA, Bowtie, drFAST LAST, MOSAIK, NEXTGENMAP, etc.
  • the alignment tool subtracts out the known sequences associated with the genomic linker segments and only aligns the sequences associated with the genomic fragments portion of the concatenated fragment reads to the reference genome.
  • the alignment tool selects the best alignment for each genomic fragment sequence read by determining the longest matching alignment position on the reference genome for each genomic fragment sequence read. That is, the alignment location where the longest consecutive sequence of bases on the genomic fragment sequence read matches to the reference genome. In other embodiments, the alignment tool selects the best alignment for each genomic fragment sequence read by determining the position on the reference genome where the most number of bases from the genomic fragment sequence reads match, regardless of whether they are consecutive or not. In some embodiments, genomic fragment sequence reads that align equally well to multiple locations on the reference genome are automatically discarded and not used.
  • genomic features are identified on the aligned genomic fragment sequences using a various publically available or proprietary genomic features analytics tools or callers.
  • these tools or callers can be configured to access various public (e.g., the RefGene Database (UCSC), the Alternative Splicing Database (EBI), the dbSNP database (NCBI), the Genomic Structural Variation database (NCBI), the GENCODE database (UCSC), the PolyPhen database (Harvard), the SIFT database (NCBI), the 3000 Genomes Project database, the Database of Genomic Variants database (EBI), the Biomart database (EBI), Gene Ontology database (public), the BioCyc/HumanCyc database, the KEGG pathway database, the Reactome database, the Pathway Interaction Database (NIH), the Biocarta database, PANTHER database, etc.) and/or private databases to identify the genomic features.
  • UCSC RefGene Database
  • EBI Alternative Splicing Database
  • NCBI the dbSNP database
  • the genomic features can be genomic variants such as insertions/deletions (INDEL), copy number variations (CNV), single nucleotide polymorphisms (SNP), duplications, inversions, translocations, etc.
  • the genomic features can be genomic regions that have some annotated function such as a gene, protein coding sequence, mRNA, tRNA, rRNA, repeat sequence, inverted repeat, miRNA, siRNA, etc.
  • the genomic features can be epigenetic changes on the genome (e.g., methylation, acetylation, ubiquitylation, phosphorylation, sumoylation, ribosylation, citrullination, etc.) that can affect gene expression and activity.
  • epigenetic changes on the genome e.g., methylation, acetylation, ubiquitylation, phosphorylation, sumoylation, ribosylation, citrullination, etc.
  • FIG 10 is a flowchart showing a method for identifying genomic features from concatenated genomic fragment sequence reads, in accordance with various embodiments.
  • method 1000 details an exemplary workflow for identifying genomic features on genomic fragment sequence reads that were obtained from concatenated fragments (created by ligating artificial genomic linker segments to genomic fragments that were extracted from a tissue sample) that were amplified and later sequenced on a NGS or equivalent genomic sequencing system.
  • step 1002 concatenated genomic fragment reads containing at least one genomic linker segment sequence and at least one genomic fragment sequence from a tissue sample is received on a computing device/server programmed with instructions (software or hardware) to analyze genomic sequence information (sequence reads) generated by a genomic sequencing system configured to determine the base sequence information of genomic fragments.
  • the genomic linker segments are artificially created so their length and base sequence isn known.
  • the genomic linker segment reads are between about 30 to about 1000 bps in length. In other embodiments, the genomic linker segment reads are between about 30 bps and about 500 bps in length. In still other embodiments, the genomic linker segment reads are between about 50 bps to about 150 bps. In some embodiments, the genomic linker segment reads are homopolymer sequences. In other embodiments, the genomic linker segment reads are heteropolymer sequences.
  • step 1004 the genomic linker segment sequence portion of the concatenated genomic fragment sequence reads is subtracted out prior to the concatenated genomic fragment sequence reads being aligned to a reference genome in step 1006. That is, the known sequences associated with the genomic linker segments is subtracted out from the concatenated genomic fragment sequence reads first and then only the genomic fragments portion of the concatenated fragment reads are aligned to the reference genome.
  • genomic features are identified on the aligned genomic fragment sequences using various publically available or proprietary genomic features analytics tools or callers.
  • these tools or callers can be configured to access various public (e.g., the RefGene Database (UCSC), the Alternative Splicing Database (EBI), the dbSNP database (NCBI), the Genomic Structural Variation database (NCBI), the GENCODE database (UCSC), the PolyPhen database (Harvard), the SIFT database (NCBI), the 3000 Genomes Project database, the Database of Genomic Variants database (EBI), the Biomart database (EBI), Gene Ontology database (public), the BioCyc/HumanCyc database, the KEGG pathway database, the Reactome database, the Pathway Interaction Database (NIH), the Biocarta database, PANTHER database, etc.) and/or private databases to identify the genomic features.
  • UCSC RefGene Database
  • EBI Alternative Splicing Database
  • NCBI the dbSNP database
  • NCBI
  • the genomic features can be genomic variants such as insertions/deletions (INDEL), copy number variations (CNV), single nucleotide polymorphisms (SNP), duplications, inversions, translocations, etc.
  • the genomic features can be genomic regions that have some annotated function such as a gene, protein coding sequence, mRNA, tRNA, rRNA, repeat sequence, inverted repeat, miRNA, siRNA, etc.
  • the genomic features can be epigenetic changes on the genome (e.g., methylation, acetylation, ubiquitylation, phosphorylation, sumoylation, ribosylation, citrullination, etc.) that can affect gene expression and activity.
  • epigenetic changes on the genome e.g., methylation, acetylation, ubiquitylation, phosphorylation, sumoylation, ribosylation, citrullination, etc.
  • the methodologies described herein may be implemented by various means depending upon the application. For example, these methodologies may be implemented in hardware, firmware, software, or any combination thereof.
  • the processing unit may be implemented within one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), processors, controllers, microcontrollers, microprocessors, electronic devices, other electronic units designed to perform the functions described herein, or a combination thereof.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGAs field programmable gate arrays
  • processors controllers, microcontrollers, microprocessors, electronic devices, other electronic units designed to perform the functions described herein, or a combination thereof.
  • the methods of the present teachings may be implemented as firmware and/or a software program and applications written in conventional programming languages such as C, C++, Python, etc. If implemented as firmware and/or software, the embodiments described herein can be implemented on a non-transitory computer-readable medium in which a program is stored for causing a computer to perform the methods described above. It should be understood that the various engines described herein can be provided on a computer system, such as computer system 400 of Figure 4, whereby processor 404 would execute the analyses and determinations provided by these engines, subject to instructions provided by any one of, or a combination of, memory components 406/4008/410 and user input provided via input device 414.
  • the embodiments described herein can be practiced with other computer system configurations including hand-held devices, microprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers and the like.
  • the embodiments can also be practiced in distributing computing environments where tasks are performed by remote processing devices that are linked through a network.
  • any of the operations that form part of the embodiments described herein are useful machine operations.
  • the embodiments, described herein also relate to a device or an apparatus for performing these operations.
  • the systems and methods described herein can be specially constructed for the required purposes or it may be a general purpose computer selectively activated or configured by a computer program stored in the computer.
  • various general purpose machines may be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.
  • Certain embodiments can also be embodied as computer readable code on a computer readable medium.
  • the computer readable medium is any data storage device that can store data, which can thereafter be read by a computer system. Examples of the computer readable medium include hard drives, network attached storage (NAS), read-only memory, random-access memory, CD-ROMs, CD-Rs, CD-RWs, magnetic tapes, and other optical, FLASH memory and non-optical data storage devices.
  • the computer readable medium can also be distributed over a network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.
  • Embodiment 1 A method for determining copy number variation in an embryo candidate for in vitro fertilization (IVF) implantation is disclosed.
  • An embryo candidate is isolated from a plurality of embryos.
  • the embryo candidate is incubated in media that is substantially free of DNA.
  • a portion of the media is transferred to an amplification vessel, wherein the portion of media includes genomic fragments shed or secreted from the embryo candidate.
  • a plurality of genomic linker segments and ligase enzyme is added to the amplification vessel in conditions that catalyze the formation of concatenated genomic fragments containing at least one genomic linker segment and at least one genomic fragment from the isolated embryo candidate.
  • the concatenated genomic fragments are amplified in the amplification vessel.
  • Sequence information is obtained from the amplified concatenated genomic fragments.
  • the sequence information is aligned (mapped) against a reference genome. Copy number variations are identified in the embryo candidate when a frequency of genomic fragment sequence reads aligned to a chromosomal position on the reference genome deviates from a frequency threshold.
  • Embodiment 2 The method of Embodiment 1, further including: subtracting sequence information related to the genomic linker segment from the concatenated genomic fragment sequence prior to aligning the concatenated genomic fragment sequence to the reference genome.
  • Embodiment 3 The method of Embodiment 2, further including: normalizing the frequency of genomic fragment sequence reads aligned to each chromosomal position; and determining a frequency threshold for each chromosomal position.
  • Embodiment 4 The method of Embodiment 3, further including: applying a circular binary segmentation (CBS) analysis to determine whether the identified deviance from the frequency threshold identified is due to technical bias.
  • CBS circular binary segmentation
  • Embodiment 5 The method of Embodiment 3, wherein the normalization is performed using a Spline normalization method.
  • Embodiment 6 The method of Embodiment 1, further including: blunting the genomic fragment ends using a modified polymerase prior to ligating them to the genomic linker segments.
  • Embodiment 7. The method of Embodiment 6, wherein the modified polymerase is a Klenow T4 DNA polymerase.
  • Embodiment 8 The method of Embodiment 1, wherein the ligase enzyme is one of a T3, T4 or T7 prokaryotic DNA ligase.
  • Embodiment 9 The method of Embodiment 1, wherein the embryo candidate is a human embryo.
  • Embodiment 10 The method of Embodiment 1, wherein the embryo candidate is a blastocyst.
  • Embodiment 11 The method of Embodiment 1, wherein the frequency threshold is a frequency of genomic fragment reads that map to a normal chromosome.
  • Embodiment 12 A method is provided for identifying genomic features in an embryo candidate is disclosed.
  • An embryo candidate is isolated from a plurality of embryo candidates.
  • the embryo candidate is incubated in media that is substantially free of DNA.
  • a portion of the media is transferred to an amplification vessel, wherein the portion of media includes one more genomic fragments shed or secreted from the embryo candidate.
  • a plurality of genomic linker segments and a ligase enzyme is added to the amplification vessel in conditions that catalyze the formation of concatenated genomic fragments containing at least one genomic linker segment and at least one genomic fragment from the isolated embryo candidate.
  • the concatenated genomic fragments are amplified in the amplification vessel.
  • Sequence information is obtained from the concatenated genomic fragments.
  • the sequence information is aligned against a reference genome. Genomic features are identified on the aligned genomic fragment sequences.
  • Embodiment 13 The method of Embodiment 12, further including: subtracting sequence information related to the genomic linker segment from the concatenated genomic fragment sequence prior to aligning the concatenated genomic fragment sequence to the reference genome.
  • Embodiment 14 The method of Embodiment 12, further including: blunting the genomic fragment ends using a modified polymerase prior to ligating them to the genomic linker segments.
  • Embodiment 15 The method of Embodiment 14, wherein the modified polymerase is a Klenow T4 DNA polymerase.
  • Embodiment 16 The method of Embodiment 12, wherein the ligase enzyme is one of a T3, T4 or T7 prokaryotic DNA ligase.
  • Embodiment 17 The method of Embodiment 12, wherein the embryo candidate is a human embryo.
  • Embodiment 18 The method of Embodiment 12, wherein the embryo candidate is a blastocyst.
  • Embodiment 19 The method of Embodiment 12, wherein the genomic feature is a single nulceotide polymorphism.
  • Embodiment 20 The method of Embodiment 12, wherein the genomic feature is an indel.
  • Embodiment 21 The method of Embodiment 12, wherein the genomic feature is an inversion.
  • Embodiment 22 A system is provided for identifying genomic features in an embryo candidate.
  • the system includes a genomics sequencer, a computing device and a display.
  • the genomic sequencer is configured to obtain sequence information from concatenated genomic fragments derived from an embryo candidate.
  • the concatenated genomic fragments each contain at least one genomic linker segment and at least one genomic fragment from the embryo candidate.
  • the computing device is communicatively connected to the genomic sequencer and includes a sequence alignment engine and a genomic features identification engine.
  • the sequence alignment engine is configured to subtract out sequence information related to the genomic linker segment portion of the concatenated genomic fragments and align the genomic fragment sequences to a reference genome.
  • the genomic features identification engine is configured to identify genomic features in the aligned genomic fragment sequences.
  • the display is communicatively connected to the computing device and configured to display a report containing the identified genomic features.
  • Embodiment 23 The system of Embodiment 22, wherein the genomic feature is a copy number variation.
  • Embodiment 24 The system of Embodiment 23, wherein the genomic features identification engine is further configured to: normalize a frequency of genomic fragment sequences aligned to each chromosomal position on the reference genome; determine a genomic fragment sequence alignment frequency threshold to make a copy number variation call for each chromosomal position; and make a copy number variation call for each chromosomal positon with genomic fragment sequence alignment frequencies that deviate from the frequency threshold.
  • Embodiment 25 The system of Embodiment 24, wherein the genomic features identification engine is further configured to apply a circular binary segmentation (CBS) analysis to determine whether the identified deviance from the frequency threshold identified is due to technical bias.
  • CBS circular binary segmentation
  • Embodiment 26 The system of Embodiment 24, wherein the normalization is performed using a Spline normalization method.
  • Embodiment 27 The system of Embodiment 24, wherein a deviance occurs when the frequency of genomic fragment sequences aligned to a chromosomal position is below the frequency threshold.
  • Embodiment 28 The system of Embodiment 24, wherein a deviance occurs when the frequency of genomic fragment sequences aligned to a chromosomal position is above the frequency threshold.
  • Embodiment 29 The system of Embodiment 22, wherein the embryo candidate is a human embryo.
  • Embodiment 30 The system of Embodiment 22, wherein the embryo candidate is a blastocyst.
  • Embodiment 31 The system of Embodiment 22, wherein the genomic feature is a single nulceotide polymorphism.
  • Embodiment 32 The system of Embodiment 22, wherein the genomic feature is an indel.
  • Embodiment 33 The system of Embodiment 22, wherein the genomic feature is an inversion.
  • Embodiment 34 The system of Embodiment 22, wherein the genomic linker segment sequence is a known sequence.
  • Embodiment 35 A method is provided for identifying genomic features in a tissue sample is disclosed. Concatenated genomic fragment sequence reads are received containing at least one genomic linker segment sequence and at least one genomic fragment sequence from a tissue sample. The genomic linker segment sequence portion of the concatenated genomic fragment sequence reads is subtracted out. The concatenated genomic fragment sequence reads are aligned (mapped) to a reference genome. Genomic features are identified on the aligned genomic fragment sequences.
  • Embodiment 36 The method of Embodiment 35, further including: deleting concatenated genomic fragment sequence reads that map to more than one location on a reference genome.
  • Embodiment 37 The method of Embodiment 35, wherein the genomic feature is a copy number variation.
  • Embodiment 38 The method of Embodiment 37, further including: normalizing a frequency of genomic fragment sequences aligned to each chromosomal position; determining a genomic fragment sequence alignment frequency threshold to make a copy number variation call for each chromosomal position; and making a copy number variation call for each chromosomal positon with genomic fragment sequence alignment frequencies that deviate from the frequency threshold.
  • Embodiment 39 The method of Embodiment 38, further including: applying a circular binary segmentation (CBS) analysis to determine whether the identified deviance from the frequency threshold is identified due to technical bias.
  • CBS circular binary segmentation
  • Embodiment 40 The method of Embodiment 38, wherein a deviance occurs when the frequency of genomic fragment sequences aligned to a chromosomal position is below the frequency threshold.
  • Embodiment 41 The method of Embodiment 38, wherein a deviance occurs when the frequency of genomic fragment sequences aligned to a chromosomal position is above the frequency threshold.
  • Embodiment 42 The method of Embodiment 35, wherein the tissue sample is an embryonic tissue.
  • Embodiment 43 The method of claim 35, wherein the tissue sample is a blastocyst.
  • Embodiment 44 The method of claim 35, wherein the genomic feature is a single nulceotide polymorphism.
  • Embodiment 45 The method of claim 35, wherein the genomic feature is an indel.
  • Embodiment 46 The method of claim 35, wherein the genomic feature is an inversion.
  • Embodiment 47 A non-transitory computer-readable medium is provided in which a program is stored for causing a computer to perform a method for identifying genomic features in a tissue sample.
  • Concatenated genomic fragment sequence reads are received containing at least one genomic linker segment sequence and at least one genomic fragment sequence from a tissue sample.
  • the genomic linker segment sequence portion of the concatenated genomic fragment sequence reads are subtracted out.
  • the concatenated genomic fragment sequence reads are aligned (mapped) to a reference genome. Genomic features are identified on the aligned genomic fragment sequences.
  • Embodiment 48 The method of Embodiment 47, further including: deleting concatenated genomic fragment sequence reads that map to more than one location on a reference genome.
  • Embodiment 49 The method of Embodiment 47, wherein the genomic feature is a copy number variation.
  • Embodiment 50 The method of Embodiment 47, wherein the genomic feature is an indel.
  • Embodiment 51 The method of Embodiment 47, wherein the genomic feature is an inversion.
  • Embodiment 52 The method of Embodiment 49, further including: normalizing a frequency of genomic fragment sequences aligned to each chromosomal position; determining a genomic fragment sequence alignment frequency threshold to make a copy number variation call for each chromosomal position; and making a copy number variation call for each chromosomal positon with genomic fragment sequence alignment frequencies that deviate from the frequency threshold.
  • Embodiment 53 The method of Embodiment 52, further including: applying a circular binary segmentation (CBS) analysis to determine whether the identified deviance from the frequency threshold is identified due to technical bias.
  • CBS circular binary segmentation
  • Embodiment 54 The method of Embodiment 52, wherein a deviance occurs when the frequency of genomic fragment sequences aligned to a chromosomal position is below the frequency threshold.
  • Embodiment 55 The method of Embodiment 52, wherein a deviance occurs when the frequency of genomic fragment sequences aligned to a chromosomal position is above the frequency threshold.
  • Embodiment 56 The method of Embodiment 47, wherein the tissue sample is an embryonic tissue.
  • Embodiment 57 The method of Embodiment 47, wherein the tissue sample is a blastocyst.
  • Embodiment 58 The method of Embodiment 47, wherein the genomic feature is a single nulceotide polymorphism.
  • Embodiment 59 The method of Embodiment 47, wherein the genomic feature is an indel.
  • Embodiment 60 The method of Embodiment 47, wherein the genomic feature is an inversion.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • General Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)
EP18778768.4A 2017-09-07 2018-09-07 Systeme und verfahren zur nicht-invasiven genetischen präimplantationsdiagnose Withdrawn EP3679156A1 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201762555466P 2017-09-07 2017-09-07
PCT/US2018/049976 WO2019051244A1 (en) 2017-09-07 2018-09-07 SYSTEMS AND METHODS FOR NON-EFFRACTIVE PREIMPLANTATORY GENETIC DIAGNOSIS

Publications (1)

Publication Number Publication Date
EP3679156A1 true EP3679156A1 (de) 2020-07-15

Family

ID=63684601

Family Applications (1)

Application Number Title Priority Date Filing Date
EP18778768.4A Withdrawn EP3679156A1 (de) 2017-09-07 2018-09-07 Systeme und verfahren zur nicht-invasiven genetischen präimplantationsdiagnose

Country Status (8)

Country Link
US (1) US20210062256A1 (de)
EP (1) EP3679156A1 (de)
JP (1) JP2020532999A (de)
KR (1) KR20200060410A (de)
AU (1) AU2018327337A1 (de)
CA (1) CA3074689A1 (de)
SG (1) SG11202003557YA (de)
WO (1) WO2019051244A1 (de)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020061637A1 (en) * 2018-09-27 2020-04-02 Monash Ivf Group Limited Dna from cell-free medium
CN114402392A (zh) * 2019-06-21 2022-04-26 酷博尔外科器械有限公司 使用单核苷酸变异密度验证人类胚胎中拷贝数变异的系统和方法
CN112582022B (zh) * 2020-07-21 2021-11-23 序康医疗科技(苏州)有限公司 用于无创胚胎移植优先级评级的系统和方法
JP7377842B2 (ja) * 2021-08-11 2023-11-10 医療法人浅田レディースクリニック 胚培養用ディッシュ

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2272983A1 (de) 2005-02-01 2011-01-12 AB Advanced Genetic Analysis Corporation Reagentien, Verfahren und Bibliotheken zur Sequenzierung mit Kügelchen
EP2958574A4 (de) * 2013-01-23 2016-11-02 Reproductive Genetics And Technology Solutions Llc Zusammensetzungen und verfahren zur genetischen analyse von embryonen
JP6765960B2 (ja) * 2013-06-18 2020-10-07 アンスティチュ ナショナル ドゥ ラ サンテ エ ドゥ ラ ルシェルシュ メディカル 胚の品質を決定するための方法
CN115433769A (zh) * 2015-08-12 2022-12-06 香港中文大学 血浆dna的单分子测序
GB2541904B (en) * 2015-09-02 2020-09-02 Oxford Nanopore Tech Ltd Method of identifying sequence variants using concatenation

Also Published As

Publication number Publication date
JP2020532999A (ja) 2020-11-19
SG11202003557YA (en) 2020-05-28
AU2018327337A1 (en) 2020-04-30
CA3074689A1 (en) 2019-03-14
WO2019051244A1 (en) 2019-03-14
US20210062256A1 (en) 2021-03-04
KR20200060410A (ko) 2020-05-29

Similar Documents

Publication Publication Date Title
US11560586B2 (en) Methods and processes for non-invasive assessment of genetic variations
US20230112134A1 (en) Methods and processes for non-invasive assessment of genetic variations
US10465245B2 (en) Nucleic acids and methods for detecting chromosomal abnormalities
US10504613B2 (en) Methods and processes for non-invasive assessment of genetic variations
US20210062256A1 (en) Systems and methods for non-invasive preimplantation genetic diagnosis
CA3115273C (en) Systems and methods for identifying chromosomal abnormalities in an embryo
JP7333838B2 (ja) 胚における遺伝パターンを決定するためのシステム、コンピュータプログラム及び方法
US20200399701A1 (en) Systems and methods for using density of single nucleotide variations for the verification of copy number variations in human embryos
JP7446343B2 (ja) ゲノム倍数性を判定するためのシステム、コンピュータプログラム及び方法
CA3143723C (en) Systems and methods for determining pattern of inheritance in embryos
JP2021524736A (ja) 核酸混合物および混合細胞集団を解析するための方法および試薬ならびに関連用途

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: UNKNOWN

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20200406

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20210419

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20210831