WO2020006431A1 - Method and system for sample identity assurance - Google Patents

Method and system for sample identity assurance Download PDF

Info

Publication number
WO2020006431A1
WO2020006431A1 PCT/US2019/039859 US2019039859W WO2020006431A1 WO 2020006431 A1 WO2020006431 A1 WO 2020006431A1 US 2019039859 W US2019039859 W US 2019039859W WO 2020006431 A1 WO2020006431 A1 WO 2020006431A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequencing
allelotype
sample
dna
nucleotides
Prior art date
Application number
PCT/US2019/039859
Other languages
French (fr)
Inventor
Yan Ding
Sergey BATALOV
Original Assignee
Rady Children's Hospital Research Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rady Children's Hospital Research Center filed Critical Rady Children's Hospital Research Center
Priority to EP19826231.3A priority Critical patent/EP3815091A4/en
Priority to AU2019291926A priority patent/AU2019291926A1/en
Priority to JP2020571427A priority patent/JP2021530203A/en
Publication of WO2020006431A1 publication Critical patent/WO2020006431A1/en
Priority to IL279770A priority patent/IL279770A/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/10Signal processing, e.g. from mass spectrometry [MS] or from PCR
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Definitions

  • the invention relates generally to genetic analysis and more specifically to a method and system for allelotyping to ensure sample identity.
  • NGS single nucleotide polymorphism
  • SNP single nucleotide polymorphism
  • the present invention provides a method and system for conducting genetic analysis via allelotyping.
  • the method utilizes a combination of different types of allelotyping techniques to ensure correct sample identity.
  • the invention provides a method for performing genetic analysis.
  • the method includes:
  • the method further includes generating an allele profiling concordance table. In one embodiment, the method includes calculating a statistical probability to determine whether the first allelotype and the second allelotype are of a single subject.
  • genetic sequencing includes whole genome sequencing (WGS) or rapid whole genome sequencing (rWGS) or whole exome sequencing (WES), next-generation sequence (NGS), targeted gene panel sequencing, or a combination thereof.
  • WES whole genome sequencing
  • rWGS rapid whole genome sequencing
  • WES whole exome sequencing
  • NGS next-generation sequence
  • targeted gene panel sequencing or a combination thereof.
  • sequencing includes WES or targeted gene panel sequencing
  • a panel having one or more oligonucleotides selected from SEQ ID NOs: 1-41 is utilized which enables allelotyping in these applications.
  • the invention further provides a panel having one or more oligonucleotides selected from SEQ ID NOs: 1-41.
  • each oligonucleotide is between about 50 to 120 nucleotides in length. In one embodiment, each oligonucleotide is between about 50 nucleotides in length or greater. In one embodiment, each oligonucleotide is 120 nucleotides in length or less.
  • the invention provides a genetic analysis system configured to perform a method of the disclosure.
  • the system includes: a) at least one processor operatively connected to a memory; b) a receiver component configured to receive DNA analysis information including sequence information generated from PCR amplification of DNA in a DNA sample; and c) an analysis component, executed by the at least one processor, configured to perform a method of the disclosure, such as determining an allelotype, generating an allele profiling concordance table and calculating a statistical probability to determine whether a first allelotype and a second allelotype are of a single subject.
  • the invention provides a system for performing the method of the invention.
  • the system includes a controller having at least one processor and non- transitory memory.
  • the controller is configured to perform one or more of the processes of the method as described herein.
  • the invention provides a non-transitory computer readable storage medium encoded with a computer program.
  • the program includes instructions that, when executed by one or more processors, cause the one or more processors to perform operations that implement a method of the disclosure.
  • the invention provides a computing system.
  • the system includes a memory, and one or more processors coupled to the memory, with the one or more processors being configured to perform operations that implement a method of the disclosure.
  • the present invention is based on an innovative method for ensuring sample identity which includes a combination of multiple allelotyping techniques.
  • the presently disclosed methodology includes comparing the concordance of STR (Short Tandem Repeat) allele profiling generated by the GlobalFilerTM PCR Amplification kit and by NGS using LobSTRTM software to assure sample identity and to detect potential cross contamination among the different samples.
  • STR Short Tandem Repeat
  • GlobalFilerTM panel allows the determination of allelic states of 24 positions in the human genome, as well as to identify an event of contamination (mix) of more than one sample.
  • Computational workflow on the WGS or WES or NGS Panel (in which the SEQ ID NOs: 1-41 oligonucleotides have been included in pool down probe design) data set using an in silico STR inference software (such as lobSTRTM) allows the independent determination of allelic states of the same 24 positions in human genome.
  • Statistical framework allows one to rule out any reasonable doubt (the probability of error less than 1 / 1,000,000,000,000,000) that the two samples came from the same individual if no less than 18 out of 24 positions match.
  • STR genotyping using GlobalFilerTM can generate consistent loci profiling with high accuracy and sensitivity.
  • the work flow is simpler and easier for laboratory technologist to complete within 4-6 hours. Setting up STR reactions does not require as large a batching set as microarray. Additionally, reagents are not lost with a smaller sample set in a batch.
  • references to“the method” includes one or more methods, and/or steps of the type described herein which will become apparent to those persons skilled in the art upon reading this disclosure and so forth.
  • the present invention provides a method for conducting genetic analysis via allelotyping.
  • the method utilizes a combination of different types of allelotyping techniques to ensure sample identity.
  • the invention provides a method for performing genetic analysis.
  • the method includes:
  • the method of the disclosure contemplates genetic sequencing to generate an allelotype.
  • Sequencing may be by any method known in the art. Sequencing methods include, but are not limited to, Maxam-Gilbert sequencing-based techniques, chain-termination-based techniques, shotgun sequencing, bridge PCR sequencing, single-molecule real-time sequencing, ion semiconductor sequencing (Ion TorrentTM sequencing), nanopore sequencing, pyrosequencing (454), sequencing by synthesis, sequencing by ligation (SOLiDTM sequencing), sequencing by electron microscopy, dideoxy sequencing reactions (Sanger method), massively parallel sequencing, polony sequencing, and DNA nanoball sequencing.
  • sequencing involves hybridizing a primer to the template to form a template/primer duplex, contacting the duplex with a polymerase enzyme in the presence of a detectably labeled nucleotides under conditions that permit the polymerase to add nucleotides to the primer in a template-dependent manner, detecting a signal from the incorporated labeled nucleotide, and sequentially repeating the contacting and detecting steps at least once, wherein sequential detection of incorporated labeled nucleotide determines the sequence of the nucleic acid.
  • the sequencing comprises obtaining paired end reads.
  • sequencing of nucleic acid is performed using whole genome sequencing (WGS), rapid WGS, whole exome sequencing (WES), targeted gene panel sequencing, next-generation sequencing (NGS), or any combination thereof.
  • targeted sequencing is performed and may be either DNA or RNA sequencing.
  • the targeted sequencing may be to a subset of the whole genome.
  • the targeted sequencing is to introns, exons, non-coding sequences or a combination thereof.
  • the DNA is sequenced using a NGS platform, which is massively parallel sequencing.
  • NGS technologies provide high throughput sequence information, and provide digital quantitative information, in that each sequence read that aligns to the sequence of interest is countable.
  • clonally amplified DNA templates or single DNA molecules are sequenced in a massively parallel fashion within a flow cell (e.g., as described in WO 2014/015084).
  • NGS provides quantitative information, in that each sequence read is countable and represents an individual clonal DNA template or a single DNA molecule.
  • the sequencing technologies of NGS include pyrosequencing, sequencing-by -synthesis with reversible dye terminators, sequencing by oligonucleotide probe ligation and ion semiconductor sequencing.
  • DNA from individual samples can be sequenced individually (i.e., singleplex sequencing) or DNA from multiple samples can be pooled and sequenced as indexed genomic molecules (i.e., multiplex sequencing) on a single sequencing run, to generate up to several hundred million reads of DNA sequences.
  • Commercially available platforms include, e.g., platforms for sequencing- by-synthesis, ion semiconductor sequencing, pyrosequencing, reversible dye terminator sequencing, sequencing by ligation, single-molecule sequencing, sequencing by hybridization, and nanopore sequencing.
  • the methodology of the disclosure utilizes systems such as those provided by Illumina, Inc, (HiSeqTM XI 0, HiSeqTM 1000, HiSeqTM 2000, HiSeqTM 2500, HiSeqTM 4000, NovaSeqTM 5000, NovaSeqTM 6000, Genome AnalyzersTM, MiSeqTM systems), Applied Biosystems Life Technologies (ABI PRISMTM Sequence detection systems, SOLiDTM System, Ion PGMTM Sequencer, ion ProtonTM Sequencer).
  • systems such as those provided by Illumina, Inc, (HiSeqTM XI 0, HiSeqTM 1000, HiSeqTM 2000, HiSeqTM 2500, HiSeqTM 4000, NovaSeqTM 5000, NovaSeqTM 6000, Genome AnalyzersTM, MiSeqTM systems), Applied Biosystems Life Technologies (ABI PRISMTM Sequence detection systems, SOLiDTM System, Ion PGMTM Sequencer,
  • polynucleotide refers to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof. Polynucleotides may have any three-dimensional structure, and may perform any function, known or unknown.
  • Polynucleotides may be single- or multi-stranded (e.g., single-stranded, double-stranded, and triple-helical) and contain deoxyribonucleotides, ribonucleotides, and/or analogs or modified forms of deoxyribonucleotides or ribonucleotides, including modified nucleotides or bases or their analogs. Because the genetic code is degenerate, more than one codon may be used to encode a particular amino acid, and the present invention encompasses polynucleotides which encode a particular amino acid sequence.
  • modified nucleotide or nucleotide analog may be used, so long as the polynucleotide retains the desired functionality under conditions of use, including modifications that increase nuclease resistance (e.g., deoxy, 2'-0-Me, phosphorothioates, and the like). Labels may also be incorporated for purposes of detection or capture, for example, radioactive or nonradioactive labels or anchors, e.g., biotin.
  • the term polynucleotide also includes peptide nucleic acids (PNA).
  • Polynucleotides may be naturally occurring or non-naturally occurring. Polynucleotides may contain RNA, DNA, or both, and/or modified forms and/or analogs thereof.
  • a sequence of nucleotides may be interrupted by non-nucleotide components.
  • One or more phosphodiester linkages may be replaced by alternative linking groups.
  • These alternative linking groups include, but are not limited to, embodiments wherein phosphate is replaced by P(0)S (“thioate”), P(S)S (“dithioate”), (0)NR 2 (“amidate”), P(0)R, P(0)OR, CO or CH2 (“formacetal”), in which each R or R' is independently H or substituted or unsubstituted alkyl (1-20 C) optionally containing an ether (—0—) linkage, aryl, alkenyl, cycloalkyl, cycloalkenyl or araldyl.
  • polynucleotides coding or non-coding regions of a gene or gene fragment, intergenic DNA, loci (locus) defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, short interfering RNA (siRNA), short-hairpin RNA (shRNA), micro- RNA (miRNA), small nucleolar RNA, ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, adapters, and primers.
  • loci locus
  • a polynucleotide may include modified nucleotides, such as methylated nucleotides and nucleotide analogs. If present, modifications to the nucleotide structure may be imparted before or after assembly of the polymer. The sequence of nucleotides may be interrupted by non-nucleotide components. A polynucleotide may be further modified after polymerization, such as by conjugation with a labeling component, tag, reactive moiety, or binding partner. Polynucleotide sequences, when provided, are listed in the 5' to 3' direction, unless stated otherwise.
  • sequencing includes use of a panel of oligonucleotides.
  • a panel is useful where sequencing includes WES or targeted gene panel sequencing.
  • the invention provides a panel having one or more oligonucleotides.
  • the oligonucleotides include one or more oligonucleotides selected from SEQ ID NOs: 1-41 as shown in Table I.
  • Polynucleotides of the present invention such as oligonucleotides of the panel of the invention may be DNA or RNA molecules of any suitable length.
  • oligonucleotides of the panel of the invention may be DNA or RNA molecules of any suitable length.
  • Such molecules are typically from about 50 to 150, 50 to 140, 50 to 130, 50 to 120, 50 to 110, 50 to 100, 50 to 100, 50 to 90, 50 to 80, 50 to 70 or 50 to 60 nucleotides in length.
  • the molecule may be about 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115 or 120 nucleotides in length.
  • Such polynucleotides may include from at least about 50 to about 120 nucleotides or more, including at least about 50 nucleotides, at least about 55 nucleotides, at least about 60 nucleotides, at least about 65 nucleotides, at least about 70 nucleotides, at least about 75 nucleotides, at least about 80 nucleotides, at least about 85 nucleotides, at least about 90 nucleotides, at least about 95 nucleotides, at least about 100 nucleotides, at least about 110 nucleotides, at least about 120 nucleotides or greater than 120 nucleotides.
  • polypeptide refers to a composition comprised of amino acids and recognized as a protein by those of skill in the art.
  • the conventional one-leter or three- leter code for amino acid residues is used herein.
  • the terms“polypeptide” and“protein” are used interchangeably herein to refer to polymers of amino acids of any length.
  • the polymer may be linear or branched, it may include modified amino acids, and it may be interrupted by non-amino acids.
  • the terms also encompass an amino acid polymer that has been modified naturally or by intervention; for example, disulfide bond formation, glycosylation, bpidation, acetylation, phosphorylation, or any other manipulation or modification, such as conjugation with a labeling component.
  • polypeptides containing one or more analogs of an amino acid including, for example, unnatural amino acids, synthetic amino acids and the like), as well as other modifications known in the art.
  • sample refers to any substance containing or presumed to contain nucleic acid.
  • the sample can be a biological sample obtained from a subject.
  • the nucleic acids can be RNA, DNA, e.g., genomic DNA, mitochondrial DNA, viral DNA, synthetic DNA, or cDNA reverse transcribed from RNA.
  • the nucleic acids in a nucleic acid sample generally serve as templates for extension of a hybridized primer.
  • the biological sample is a biological fluid sample.
  • the fluid sample can be whole blood, plasma, serum, ascites, cerebrospinal fluid, sweat, urine, tears, saliva, buccal sample, cavity rinse, feces or organ rinse.
  • the fluid sample can be an essentially cell-free liquid sample (e.g., plasma, serum, sweat, urine, and tears).
  • the biological sample is a solid biological sample, e.g., feces or tissue biopsy, e.g, a tumor biopsy.
  • a sample can also comprise in vitro cell culture constituents (including but not limited to conditioned medium resulting from the growth of cells in cell culture medium, recombinant cells and cell components).
  • the sample is a biological sample that is a mixture of nucleic acids from multiple sources, /. e.. there is more than one contributor to a biological sample, e.g., two or more individuals.
  • the biological sample is a dried blood spot.
  • the subject is typically a human but also can be any species with methylation marks on its genome, including, but not limited to, a dog, cat, rabbit, cow, bird, rat, horse, pig, or monkey.
  • the present invention is described partly in terms of functional components and various processing steps. Such functional components and processing steps may be realized by any number of components, operations and techniques configured to perform the specified functions and achieve the various results.
  • the present invention may employ various biological samples, biomarkers, elements, materials, computers, data sources, storage systems and media, information gathering techniques and processes, data processing criteria, statistical analyses, regression analyses and the like, which may carry out a variety of functions.
  • the invention is described in relation to genetic analysis, the present invention may be practiced in conjunction with any number of applications, environments and data analyses; the systems described herein are merely exemplary applications for the invention.
  • Methods for genetic analysis may be implemented in any suitable manner, for example using a computer program operating on the computer system.
  • An exemplary genetic analysis system may be implemented in conjunction with a computer system, for example a conventional computer system comprising a processor and a random access memory, such as a remotely-accessible application server, network server, personal computer or workstation.
  • the computer system also suitably includes additional memory devices or information storage systems, such as a mass storage system and a user interface, for example a conventional monitor, keyboard and tracking device.
  • the computer system may, however, comprise any suitable computer system and associated equipment and may be configured in any suitable manner.
  • the computer system comprises a stand-alone system.
  • the computer system is part of a network of computers including a server and a database.
  • the software required for receiving, processing, and analyzing genetic information may be implemented in a single device or implemented in a plurality of devices.
  • the software may be accessible via a network such that storage and processing of information takes place remotely with respect to users.
  • the genetic analysis system according to various aspects of the present invention and its various elements provide functions and operations to facilitate genetic analysis, such as data gathering, processing and/or analysis.
  • the present genetic analysis system maintains information relating to samples and facilitates analysis,
  • the computer system executes the computer program, which may receive, store, search, analyze, and report information relating to the genome.
  • the computer program may comprise multiple modules performing various functions or operations, such as a processing module for processing raw data and generating supplemental data and an analysis module for analyzing raw data and supplemental data to perform genetic analysis.
  • the procedures performed by the genetic analysis system may comprise any suitable processes to facilitate genetic analysis.
  • the genetic analysis system is configured to determine allele concordance.
  • the genetic analysis system may also provide various additional modules and/or individual functions.
  • the genetic analysis system may also include a reporting function, for example to provide information relating to the processing and analysis functions.
  • the genetic analysis system may also provide various administrative and management functions, such as controlling access and performing other administrative functions.
  • Step 1 STR amplification workflow.
  • Result from STR amplification workflow 24 pairs of numbers - also known as an allelotype that now comprise a digital fingerprint of the individual DNA.
  • warning flags may be raised if the sample contains no DNA, non human DNA, or more than one individual DNA.
  • Step 2 WGS or WES workflow (may be performed in parallel with Step 1).
  • PCR-free Illumina WGSTM library naturally includes the biometric marker DNA fragments.
  • solution capture targeting approach making the WESTM library using commercially available KAPAHyperTM barcoded paired end library coupled with the IDT xGENTM WES probes, the mitochondrial panel and/or the custom biometric marker probes shown in Table I).
  • the custom biometric marker probes of Table I capture sample DNA in the vicinity of the biomarkers for WESTM library.
  • Result from WGS workflow at least 21 pairs of numbers (and additional N/D ("not determined") calls) - an allelotype that comprise an independent digital fingerprint of the individual DNA.
  • warning flags may be raised if the sample contains no DNA, non human DNA, or more than one individual DNA.
  • Step 3 Generate concordance using STR allele profiling called by GlobalFilerTM and called by WGS or WES or Panel sequencing.
  • Core biometric DNA capture reagent sequences may be synthesized as shown in Table I and added to the gene targeting sequence panel (including WES) DNA capture reagents. In some embodiments longer DNA reagent sequences can be designed using the reference human genome sequence that surrounds the core DNA sequences shown above. In one embodiment, using IDT xGENTM Exome Research Panel vl.O with the IDT xGENTM Lockdown Custom Probes, oligonucleotides of length 120 are used whose sequence include the Core biometric DNA capture reagent sequences shown above in Table I.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Physics & Mathematics (AREA)
  • Organic Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • Analytical Chemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Genetics & Genomics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Immunology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Bioethics (AREA)
  • Signal Processing (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)

Abstract

The present disclosure provides a method for genetic analysis including allelotyping as well as a system for implementing such analysis.

Description

METHOD AND SYSTEM FOR SAMPLE IDENTITY ASSURANCE
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims benefit of priority under 35 U.S.C. §119(e) of U.S. Serial No. 62/692,366, filed June 29, 2018, the entire contents of which is incorporated herein by reference in its entirety.
INCORPORATION OF SEQUENCE LISTING
[0002] The material in the accompanying sequence listing is hereby incorporated by reference into this application. The accompanying sequence listing text file, name RADY_lWO_Sequence_Listing.txt, was created on June 25, 2019, and is 8 kb. The file can be accessed using Microsoft Word on a computer that uses Windows OS.
BACKGROUND OF THE INVENTION FIELD OF THE INVENTION
[0003] The invention relates generally to genetic analysis and more specifically to a method and system for allelotyping to ensure sample identity.
BACKGROUND INFORMATION
[0004] Whole Genome Sequencing (WGS), Whole Exome Sequencing (WES) and Targeted Gene Panel Sequencing using the Next Generation Sequencing (NGS) platforms are complicated processes involving multiple procedural steps. Sample swap or contamination during the process in NGS may result in false positive variant detections and genotype misclassification. The assurance of sample identity throughout the process is a critical quality control component. The process to ensure correct sample identity is a challenge for sequencing facilities.
[0005] Currently, some NGS facilities are performing array-based genotyping and using single nucleotide polymorphism (SNP) to obtain the concordance between genotype profiling called from NGS data and that from array-based genotype data (SNP microarrays). It is known that errors related or unrelated to specific processes may occur in array-based genotyping and lead to disconcordant genotype calls between SNP array data and NGS data. Meanwhile, the depth coverage of NGS impacts SNP calls from NGS data especially for lower minor allele frequencies (MAF) SNPs. For NGS panel sequencing, a custom designed array workflow has to be created to optimize concordance between NGS panel data and SNP microarray data. The work flow requires 2-3 days to complete. Additionally, for laboratories with relatively small sample volumes, initial instrumentation, modifications of cost and staffing models may need to be developed.
[0006] Improved methods for assuring correct sample identity are needed when performing genetic analysis.
SUMMARY OF THE INVENTION
[0007] The present invention provides a method and system for conducting genetic analysis via allelotyping. The method utilizes a combination of different types of allelotyping techniques to ensure correct sample identity.
[0008] Accordingly, in one aspect, the invention provides a method for performing genetic analysis. The method includes:
a) determining a first allelotype for a sample via short tandem repeat (STR) amplification;
b) determining a second allelotype for the sample via genetic sequencing; and c) determining allele concordance between the first allelotype and the second allelotype.
[0009] In embodiments, the method further includes generating an allele profiling concordance table. In one embodiment, the method includes calculating a statistical probability to determine whether the first allelotype and the second allelotype are of a single subject.
[00010] In various embodiments, genetic sequencing includes whole genome sequencing (WGS) or rapid whole genome sequencing (rWGS) or whole exome sequencing (WES), next-generation sequence (NGS), targeted gene panel sequencing, or a combination thereof.
[00011] In embodiments where sequencing includes WES or targeted gene panel sequencing, a panel having one or more oligonucleotides selected from SEQ ID NOs: 1-41 is utilized which enables allelotyping in these applications.
[00012] Accordingly, the invention further provides a panel having one or more oligonucleotides selected from SEQ ID NOs: 1-41. In embodiments, each oligonucleotide is between about 50 to 120 nucleotides in length. In one embodiment, each oligonucleotide is between about 50 nucleotides in length or greater. In one embodiment, each oligonucleotide is 120 nucleotides in length or less.
[00013] In an embodiment the invention provides a genetic analysis system configured to perform a method of the disclosure. The system includes: a) at least one processor operatively connected to a memory; b) a receiver component configured to receive DNA analysis information including sequence information generated from PCR amplification of DNA in a DNA sample; and c) an analysis component, executed by the at least one processor, configured to perform a method of the disclosure, such as determining an allelotype, generating an allele profiling concordance table and calculating a statistical probability to determine whether a first allelotype and a second allelotype are of a single subject.
[00014] In another embodiment, the invention provides a system for performing the method of the invention. The system includes a controller having at least one processor and non- transitory memory. The controller is configured to perform one or more of the processes of the method as described herein.
[00015] In still another embodiment, the invention provides a non-transitory computer readable storage medium encoded with a computer program. The program includes instructions that, when executed by one or more processors, cause the one or more processors to perform operations that implement a method of the disclosure.
[00016] In yet another embodiment, the invention provides a computing system. The system includes a memory, and one or more processors coupled to the memory, with the one or more processors being configured to perform operations that implement a method of the disclosure.
DETAILED DESCRIPTION OF THE INVENTION
[00017] The present invention is based on an innovative method for ensuring sample identity which includes a combination of multiple allelotyping techniques. The presently disclosed methodology includes comparing the concordance of STR (Short Tandem Repeat) allele profiling generated by the GlobalFiler™ PCR Amplification kit and by NGS using LobSTR™ software to assure sample identity and to detect potential cross contamination among the different samples.
[00018] GlobalFiler™ panel allows the determination of allelic states of 24 positions in the human genome, as well as to identify an event of contamination (mix) of more than one sample. Computational workflow on the WGS or WES or NGS Panel (in which the SEQ ID NOs: 1-41 oligonucleotides have been included in pool down probe design) data set using an in silico STR inference software (such as lobSTR™) allows the independent determination of allelic states of the same 24 positions in human genome. Statistical framework allows one to rule out any reasonable doubt (the probability of error less than 1 / 1,000,000,000,000,000) that the two samples came from the same individual if no less than 18 out of 24 positions match. [00019] Concordance between allelotype profiling called by STR and by WGS or WES is high and consistent. STR genotyping using GlobalFiler™ can generate consistent loci profiling with high accuracy and sensitivity. The work flow is simpler and easier for laboratory technologist to complete within 4-6 hours. Setting up STR reactions does not require as large a batching set as microarray. Additionally, reagents are not lost with a smaller sample set in a batch.
[00020] Before the present compositions and methods are described, it is to be understood that this invention is not limited to particular methods and experimental conditions described, as such compositions, methods, and conditions may vary. It is also to be understood that the terminology used herein is for purposes of describing particular embodiments only, and is not intended to be limiting, since the scope of the present invention will be limited only in the appended claims.
[00021] As used in this specification and the appended claims, the singular forms“a”,“an”, and“the” include plural references unless the context clearly dictates otherwise. Thus, for example, references to“the method” includes one or more methods, and/or steps of the type described herein which will become apparent to those persons skilled in the art upon reading this disclosure and so forth.
[00022] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the invention, the preferred methods and materials are now described.
[00023] METHODS
[00024] The present invention provides a method for conducting genetic analysis via allelotyping. The method utilizes a combination of different types of allelotyping techniques to ensure sample identity.
[00025] Accordingly, in one aspect, the invention provides a method for performing genetic analysis. The method includes:
a) determining a first allelotype for a sample via short tandem repeat (STR) amplification;
b) determining a second allelotype for the sample via genetic sequencing; and c) determining allele concordance between the first allelotype and the second allelotype. [00026] The method of the disclosure contemplates genetic sequencing to generate an allelotype.
[00027] Sequencing may be by any method known in the art. Sequencing methods include, but are not limited to, Maxam-Gilbert sequencing-based techniques, chain-termination-based techniques, shotgun sequencing, bridge PCR sequencing, single-molecule real-time sequencing, ion semiconductor sequencing (Ion Torrent™ sequencing), nanopore sequencing, pyrosequencing (454), sequencing by synthesis, sequencing by ligation (SOLiD™ sequencing), sequencing by electron microscopy, dideoxy sequencing reactions (Sanger method), massively parallel sequencing, polony sequencing, and DNA nanoball sequencing. In some embodiments, sequencing involves hybridizing a primer to the template to form a template/primer duplex, contacting the duplex with a polymerase enzyme in the presence of a detectably labeled nucleotides under conditions that permit the polymerase to add nucleotides to the primer in a template-dependent manner, detecting a signal from the incorporated labeled nucleotide, and sequentially repeating the contacting and detecting steps at least once, wherein sequential detection of incorporated labeled nucleotide determines the sequence of the nucleic acid. In some embodiments, the sequencing comprises obtaining paired end reads.
[00028] In some embodiments, sequencing of nucleic acid is performed using whole genome sequencing (WGS), rapid WGS, whole exome sequencing (WES), targeted gene panel sequencing, next-generation sequencing (NGS), or any combination thereof. In some embodiments, targeted sequencing is performed and may be either DNA or RNA sequencing. The targeted sequencing may be to a subset of the whole genome. In some embodiments the targeted sequencing is to introns, exons, non-coding sequences or a combination thereof. The DNA is sequenced using a NGS platform, which is massively parallel sequencing. NGS technologies provide high throughput sequence information, and provide digital quantitative information, in that each sequence read that aligns to the sequence of interest is countable. In certain embodiments, clonally amplified DNA templates or single DNA molecules are sequenced in a massively parallel fashion within a flow cell (e.g., as described in WO 2014/015084). In addition to high-throughput sequence information, NGS provides quantitative information, in that each sequence read is countable and represents an individual clonal DNA template or a single DNA molecule. The sequencing technologies of NGS include pyrosequencing, sequencing-by -synthesis with reversible dye terminators, sequencing by oligonucleotide probe ligation and ion semiconductor sequencing. DNA from individual samples can be sequenced individually (i.e., singleplex sequencing) or DNA from multiple samples can be pooled and sequenced as indexed genomic molecules (i.e., multiplex sequencing) on a single sequencing run, to generate up to several hundred million reads of DNA sequences. Commercially available platforms include, e.g., platforms for sequencing- by-synthesis, ion semiconductor sequencing, pyrosequencing, reversible dye terminator sequencing, sequencing by ligation, single-molecule sequencing, sequencing by hybridization, and nanopore sequencing. In embodiments, the methodology of the disclosure utilizes systems such as those provided by Illumina, Inc, (HiSeq™ XI 0, HiSeq™ 1000, HiSeq™ 2000, HiSeq™ 2500, HiSeq™ 4000, NovaSeq™ 5000, NovaSeq™ 6000, Genome Analyzers™, MiSeq™ systems), Applied Biosystems Life Technologies (ABI PRISM™ Sequence detection systems, SOLiD™ System, Ion PGM™ Sequencer, ion Proton™ Sequencer).
[00029] The terms “polynucleotide,” “nucleotide sequence,” “nucleic acid,” and “oligonucleotide” are used interchangeably. They refer to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof. Polynucleotides may have any three-dimensional structure, and may perform any function, known or unknown. Polynucleotides may be single- or multi-stranded (e.g., single-stranded, double-stranded, and triple-helical) and contain deoxyribonucleotides, ribonucleotides, and/or analogs or modified forms of deoxyribonucleotides or ribonucleotides, including modified nucleotides or bases or their analogs. Because the genetic code is degenerate, more than one codon may be used to encode a particular amino acid, and the present invention encompasses polynucleotides which encode a particular amino acid sequence. Any type of modified nucleotide or nucleotide analog may be used, so long as the polynucleotide retains the desired functionality under conditions of use, including modifications that increase nuclease resistance (e.g., deoxy, 2'-0-Me, phosphorothioates, and the like). Labels may also be incorporated for purposes of detection or capture, for example, radioactive or nonradioactive labels or anchors, e.g., biotin. The term polynucleotide also includes peptide nucleic acids (PNA). Polynucleotides may be naturally occurring or non-naturally occurring. Polynucleotides may contain RNA, DNA, or both, and/or modified forms and/or analogs thereof. A sequence of nucleotides may be interrupted by non-nucleotide components. One or more phosphodiester linkages may be replaced by alternative linking groups. These alternative linking groups include, but are not limited to, embodiments wherein phosphate is replaced by P(0)S (“thioate”), P(S)S (“dithioate”), (0)NR2 (“amidate”), P(0)R, P(0)OR, CO or CH2 (“formacetal”), in which each R or R' is independently H or substituted or unsubstituted alkyl (1-20 C) optionally containing an ether (—0—) linkage, aryl, alkenyl, cycloalkyl, cycloalkenyl or araldyl. The following are non-limiting examples of polynucleotides: coding or non-coding regions of a gene or gene fragment, intergenic DNA, loci (locus) defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, short interfering RNA (siRNA), short-hairpin RNA (shRNA), micro- RNA (miRNA), small nucleolar RNA, ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, adapters, and primers. A polynucleotide may include modified nucleotides, such as methylated nucleotides and nucleotide analogs. If present, modifications to the nucleotide structure may be imparted before or after assembly of the polymer. The sequence of nucleotides may be interrupted by non-nucleotide components. A polynucleotide may be further modified after polymerization, such as by conjugation with a labeling component, tag, reactive moiety, or binding partner. Polynucleotide sequences, when provided, are listed in the 5' to 3' direction, unless stated otherwise.
[00030] In embodiments, sequencing includes use of a panel of oligonucleotides. For example, a panel is useful where sequencing includes WES or targeted gene panel sequencing.
[00031] As such, the invention provides a panel having one or more oligonucleotides. In embodiments, the oligonucleotides include one or more oligonucleotides selected from SEQ ID NOs: 1-41 as shown in Table I.
Table I
Figure imgf000009_0001
[00032] Polynucleotides of the present invention, such as oligonucleotides of the panel of the invention may be DNA or RNA molecules of any suitable length. For example, one of skill in the art would understand what lengths are suitable for oligonucleotides to be utilized in targeted gene panels. Such molecules are typically from about 50 to 150, 50 to 140, 50 to 130, 50 to 120, 50 to 110, 50 to 100, 50 to 100, 50 to 90, 50 to 80, 50 to 70 or 50 to 60 nucleotides in length. For example the molecule may be about 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115 or 120 nucleotides in length. Such polynucleotides may include from at least about 50 to about 120 nucleotides or more, including at least about 50 nucleotides, at least about 55 nucleotides, at least about 60 nucleotides, at least about 65 nucleotides, at least about 70 nucleotides, at least about 75 nucleotides, at least about 80 nucleotides, at least about 85 nucleotides, at least about 90 nucleotides, at least about 95 nucleotides, at least about 100 nucleotides, at least about 110 nucleotides, at least about 120 nucleotides or greater than 120 nucleotides.
[00033] As used herein,“polypeptide” refers to a composition comprised of amino acids and recognized as a protein by those of skill in the art. The conventional one-leter or three- leter code for amino acid residues is used herein. The terms“polypeptide” and“protein” are used interchangeably herein to refer to polymers of amino acids of any length. The polymer may be linear or branched, it may include modified amino acids, and it may be interrupted by non-amino acids. The terms also encompass an amino acid polymer that has been modified naturally or by intervention; for example, disulfide bond formation, glycosylation, bpidation, acetylation, phosphorylation, or any other manipulation or modification, such as conjugation with a labeling component. Also included within the definition are, for example, polypeptides containing one or more analogs of an amino acid (including, for example, unnatural amino acids, synthetic amino acids and the like), as well as other modifications known in the art.
[00034] As used herein, the term“sample” herein refers to any substance containing or presumed to contain nucleic acid. The sample can be a biological sample obtained from a subject. The nucleic acids can be RNA, DNA, e.g., genomic DNA, mitochondrial DNA, viral DNA, synthetic DNA, or cDNA reverse transcribed from RNA. The nucleic acids in a nucleic acid sample generally serve as templates for extension of a hybridized primer. In some embodiments, the biological sample is a biological fluid sample. The fluid sample can be whole blood, plasma, serum, ascites, cerebrospinal fluid, sweat, urine, tears, saliva, buccal sample, cavity rinse, feces or organ rinse. The fluid sample can be an essentially cell-free liquid sample (e.g., plasma, serum, sweat, urine, and tears). In other embodiments, the biological sample is a solid biological sample, e.g., feces or tissue biopsy, e.g, a tumor biopsy. A sample can also comprise in vitro cell culture constituents (including but not limited to conditioned medium resulting from the growth of cells in cell culture medium, recombinant cells and cell components). In some embodiments, the sample is a biological sample that is a mixture of nucleic acids from multiple sources, /. e.. there is more than one contributor to a biological sample, e.g., two or more individuals. In one embodiment the biological sample is a dried blood spot.
[00035] In the present invention, the subject is typically a human but also can be any species with methylation marks on its genome, including, but not limited to, a dog, cat, rabbit, cow, bird, rat, horse, pig, or monkey.
[00036] COMPUTER SYSTEMS
[00037] The present invention is described partly in terms of functional components and various processing steps. Such functional components and processing steps may be realized by any number of components, operations and techniques configured to perform the specified functions and achieve the various results. For example, the present invention may employ various biological samples, biomarkers, elements, materials, computers, data sources, storage systems and media, information gathering techniques and processes, data processing criteria, statistical analyses, regression analyses and the like, which may carry out a variety of functions. In addition, although the invention is described in relation to genetic analysis, the present invention may be practiced in conjunction with any number of applications, environments and data analyses; the systems described herein are merely exemplary applications for the invention.
[00038] Methods for genetic analysis according to various aspects of the present invention may be implemented in any suitable manner, for example using a computer program operating on the computer system. An exemplary genetic analysis system, according to various aspects of the present invention, may be implemented in conjunction with a computer system, for example a conventional computer system comprising a processor and a random access memory, such as a remotely-accessible application server, network server, personal computer or workstation. The computer system also suitably includes additional memory devices or information storage systems, such as a mass storage system and a user interface, for example a conventional monitor, keyboard and tracking device. The computer system may, however, comprise any suitable computer system and associated equipment and may be configured in any suitable manner. In one embodiment, the computer system comprises a stand-alone system. In another embodiment, the computer system is part of a network of computers including a server and a database.
[00039] The software required for receiving, processing, and analyzing genetic information may be implemented in a single device or implemented in a plurality of devices. The software may be accessible via a network such that storage and processing of information takes place remotely with respect to users. The genetic analysis system according to various aspects of the present invention and its various elements provide functions and operations to facilitate genetic analysis, such as data gathering, processing and/or analysis. The present genetic analysis system maintains information relating to samples and facilitates analysis, For example, in the present embodiment, the computer system executes the computer program, which may receive, store, search, analyze, and report information relating to the genome. The computer program may comprise multiple modules performing various functions or operations, such as a processing module for processing raw data and generating supplemental data and an analysis module for analyzing raw data and supplemental data to perform genetic analysis.
[00040] The procedures performed by the genetic analysis system may comprise any suitable processes to facilitate genetic analysis. In one embodiment, the genetic analysis system is configured to determine allele concordance.
[00041] The genetic analysis system may also provide various additional modules and/or individual functions. For example, the genetic analysis system may also include a reporting function, for example to provide information relating to the processing and analysis functions. The genetic analysis system may also provide various administrative and management functions, such as controlling access and performing other administrative functions.
[00042] The following example is provided to further illustrate the advantages and features of the present invention, but it is not intended to limit the scope of the invention. While this example is typical of those that might be used, other procedures, methodologies, or techniques known to those skilled in the art may alternatively be used. EXAMPLE I
NGS WGS SAMPLE IDENTITY ASSURANCE
[00043] The following methodology was utilized to determine sample identity.
[00044] Step 1. STR amplification workflow.
1. Setup STR PCR amplification using genomic DNA or a blood spot (1.2 mm) with ThermoFisher GlobalFiler™ PCR amplification kit.
2. Running the cycling using AB Veriti™ PCR machine.
3. Set up for electrophoresis on AB Genetic Analyzer™.
4. Generate STR allele profiling using GeneMapper™ software.
5. Report STR allele profiling.
[00045] Result from STR amplification workflow: 24 pairs of numbers - also known as an allelotype that now comprise a digital fingerprint of the individual DNA.
[00046] Additionally, warning flags may be raised if the sample contains no DNA, non human DNA, or more than one individual DNA.
[00047] Step 2. WGS or WES workflow (may be performed in parallel with Step 1).
1. Using the same sample (or another sample from the same individual), making PCR-free Illumina WGS™ library (WGS™ library naturally includes the biometric marker DNA fragments). Alternatively, or in addition to, using the same sample (or another sample from the same individual), using solution capture targeting approach, making the WES™ library using commercially available KAPAHyper™ barcoded paired end library coupled with the IDT xGEN™ WES probes, the mitochondrial panel and/or the custom biometric marker probes shown in Table I). The custom biometric marker probes of Table I capture sample DNA in the vicinity of the biomarkers for WES™ library.
2. Loading the sample on HiSeq™2500 or 4000 or NovaSeq™ 6000 or other high- throughput sequencer.
3. Perform post-sequencing analysis including: demultiplexing, mapping, and diagnostic variant call.
4. Computationally determine the STR allele profile (using specialized software, such as lobSTRTM).
5. Report STR called by WGS data.
[00048] Result from WGS workflow: at least 21 pairs of numbers (and additional N/D ("not determined") calls) - an allelotype that comprise an independent digital fingerprint of the individual DNA. [00049] Additionally, warning flags may be raised if the sample contains no DNA, non human DNA, or more than one individual DNA.
[00050] Step 3. Generate concordance using STR allele profiling called by GlobalFiler™ and called by WGS or WES or Panel sequencing.
[00051] Using a statistically derived inference from past samples (as well as extensive scientific validation of both aforementioned workflows), it is then possible to ascertain the level of probability that the two allelotypes come from the same individual and assure than no accidental swaps, mixes or other interference occurred while processing the individual DNA. This assures that the clinical genetic diagnosis produced in Step 2 is for the intended individual.
[00052] The allele profiling concordance table shown below was generated using the method described herein.
Table II
STR vs WGS Sample 1 vs others
SampleOl 97 50
Sample02 100 50
Sample03 97 38.2
Sample04 97 29.4
Sample05 97 35.3
Sample06 95 32.4
Sample07 97 29.4
Sample08 97 41.2
Sample09 97 38.2
SamplelO 100 38.2
Samplel l 91 32.4
Samplel2 97 29.4
Samplel3 94 29.4
Samplel4 97 41.2
Samplel5 94 38.2
Samplel6 97 35.3
Samplel7 93.1 29.4
Samplel8 86.2 35.3
Samplel9 85.7 38.2 Sample20 100 32.4
Sample2l 100 38.2
Sample22 100 44.1
Sample23 100 26.5
Sample24 100 29.4
Sample25 93.8 44.1
[00053] Core biometric DNA capture reagent sequences may be synthesized as shown in Table I and added to the gene targeting sequence panel (including WES) DNA capture reagents. In some embodiments longer DNA reagent sequences can be designed using the reference human genome sequence that surrounds the core DNA sequences shown above. In one embodiment, using IDT xGEN™ Exome Research Panel vl.O with the IDT xGEN™ Lockdown Custom Probes, oligonucleotides of length 120 are used whose sequence include the Core biometric DNA capture reagent sequences shown above in Table I.
[00054] Although the invention has been described with reference to the above examples, it will be understood that modifications and variations are encompassed within the spirit and scope of the invention. Accordingly, the invention is limited only by the following claims.

Claims

What is claimed is:
1. A method comprising:
a) determining a first allelotype for a sample via short tandem repeat (STR) amplification;
b) determining a second allelotype for the sample via genetic sequencing; and c) determining allele concordance between the first allelotype and the second allelotype.
2. The method of claim 1, wherein (c) comprises generating an allele profiling concordance table.
3. The method of claim 1, further comprising calculating a statistical probability to determine whether the first allelotype and the second allelotype are of a single subject.
4. The method of claim 3, wherein the subject is human.
5. The method of claim 1, wherein the first allelotype is generated via GeneMapperTM.
6. The method of claim 1, wherein the second allelotype is generated via lobSTRTM.
7. The method of claim 1, wherein the sample is a biological sample.
8. The method of claim 1, wherein the sample is whole blood, plasma, serum, ascites, cerebrospinal fluid, sweat, urine, tears, saliva, buccal sample, cavity rinse, feces, organ rinse, hair or skin.
9. The method of claim 1, wherein the sample is blood.
10. The method of claim 1, wherein genetic sequencing comprises whole genome sequencing (WGS), rapid whole genome sequencing (rWGS), whole exome sequencing (WES), next-generation sequence (NGS), targeted gene panel sequencing, or a combination thereof.
11. The method of claim 10, wherein WES or targeted gene panel sequencing comprises a panel having one or more oligonucleotides selected from the group consisting of SEQ ID NOs: 1-41.
12. The method of claim 11, wherein each oligonucleotide is between about 50 to 120 nucleotides in length.
13. The method of claim 11, wherein each oligonucleotide is 50 nucleotides in length or greater.
14. The method of claim 11, wherein each oligonucleotide is 120 nucleotides in length or less.
15. The method of claim 1, wherein (a) and (b) are performed in parallel.
16. A panel comprising one or more oligonucleotides selected from the group consisting of SEQ ID NOs: 1-41.
17. The panel of claim 16, wherein each oligonucleotide is between about 50 to 120 nucleotides in length.
18. The panel of claim 16, wherein each oligonucleotide is 50 nucleotides in length or greater.
19. The panel of claim 16, wherein each oligonucleotide is 120 nucleotides in length or less.
20. A genetic analysis system comprising: a) at least one processor operatively connected to a memory; b) a receiver component configured to receive DNA analysis information including sequence information generated from PCR amplification of DNA in a DNA sample; and c) an analysis component, executed by the at least one processor, configured to determine: i) an allelotype from the sequence information; ii) generate an allele profiling concordance table; and iii) calculate a statistical probability to determine whether a first allelotype and a second allelotype are from a single subject.
21. A genetic analysis system comprising: a) at least one processor operatively connected to a memory; b) a receiver component configured to receive DNA analysis information including sequence information generated from PCR amplification of DNA in a DNA sample; and c) an analysis component, executed by the at least one processor, configured to perform (a)-(c) of claim 1.
PCT/US2019/039859 2018-06-29 2019-06-28 Method and system for sample identity assurance WO2020006431A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP19826231.3A EP3815091A4 (en) 2018-06-29 2019-06-28 Method and system for sample identity assurance
AU2019291926A AU2019291926A1 (en) 2018-06-29 2019-06-28 Method and system for sample identity assurance
JP2020571427A JP2021530203A (en) 2018-06-29 2019-06-28 Methods and systems for guaranteeing sample identity
IL279770A IL279770A (en) 2018-06-29 2020-12-24 Method and system for sample identity assurance

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201862692366P 2018-06-29 2018-06-29
US62/692,366 2018-06-29

Publications (1)

Publication Number Publication Date
WO2020006431A1 true WO2020006431A1 (en) 2020-01-02

Family

ID=68987607

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2019/039859 WO2020006431A1 (en) 2018-06-29 2019-06-28 Method and system for sample identity assurance

Country Status (6)

Country Link
US (1) US20200005894A1 (en)
EP (1) EP3815091A4 (en)
JP (1) JP2021530203A (en)
AU (1) AU2019291926A1 (en)
IL (1) IL279770A (en)
WO (1) WO2020006431A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110230358A1 (en) * 2010-01-19 2011-09-22 Artemis Health, Inc. Identification of polymorphic sequences in mixtures of genomic dna by whole genome sequencing
WO2014015084A2 (en) * 2012-07-17 2014-01-23 Counsyl, Inc. System and methods for detecting genetic variation
US20170016075A1 (en) * 2015-07-14 2017-01-19 Personal Genome Diagnostics, Inc. Neoantigen analysis
WO2017070497A1 (en) * 2015-10-21 2017-04-27 Dana-Farber Cancer Institute, Inc. Methods and compositions for use of driver mutations in cll

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140163900A1 (en) * 2012-06-02 2014-06-12 Whitehead Institute For Biomedical Research Analyzing short tandem repeats from high throughput sequencing data for genetic applications
US9181583B2 (en) * 2012-10-23 2015-11-10 Illumina, Inc. HLA typing using selective amplification and sequencing
KR101533792B1 (en) * 2015-02-24 2015-07-06 대한민국 Method for Autosomal Analysing Human Subject of Analytes based on a Next Generation Sequencing Technology
KR101667526B1 (en) * 2015-12-30 2016-10-19 대한민국 Method for Extended Autosomal STR Analysing Human Subject of Analytes using a Next Generation Sequencing Technology
US20170226594A1 (en) * 2016-02-08 2017-08-10 Wafa Ali Rashed Altayari Short tandem repeat (str) dna fingerprint method and kit
US10822647B2 (en) * 2016-07-12 2020-11-03 Biodynamics S.R.L. Methods for using long ssDNA polynucleotides as primers (superprimers) in PCR assays

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110230358A1 (en) * 2010-01-19 2011-09-22 Artemis Health, Inc. Identification of polymorphic sequences in mixtures of genomic dna by whole genome sequencing
WO2014015084A2 (en) * 2012-07-17 2014-01-23 Counsyl, Inc. System and methods for detecting genetic variation
US20170016075A1 (en) * 2015-07-14 2017-01-19 Personal Genome Diagnostics, Inc. Neoantigen analysis
WO2017070497A1 (en) * 2015-10-21 2017-04-27 Dana-Farber Cancer Institute, Inc. Methods and compositions for use of driver mutations in cll

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GYMREK ET AL.: "lobSTR: A Short Tandem Repeat Profiler for Personal Genomes", GENOMIC RESEARCH, vol. 22, 20 April 2012 (2012-04-20), pages 1154 - 1162, XP055560142, DOI: 10.1101/gr.135780.111 *
See also references of EP3815091A4 *

Also Published As

Publication number Publication date
US20200005894A1 (en) 2020-01-02
EP3815091A1 (en) 2021-05-05
IL279770A (en) 2021-03-01
AU2019291926A1 (en) 2021-02-18
EP3815091A4 (en) 2022-03-23
JP2021530203A (en) 2021-11-11

Similar Documents

Publication Publication Date Title
Kumar et al. Next-generation sequencing and emerging technologies
CN110870016B (en) Verification method and system for sequence variant exhalations
CN116218988A (en) Method for diagnosing tuberculosis
CN110997944A (en) Method and system for detecting large fragment rearrangement in BRCA1/2
US20160319347A1 (en) Systems and methods for detection of genomic variants
Garosi et al. Defining best practice for microarray analyses in nutrigenomic studies
CN110914456A (en) Method for detecting chromosomal abnormalities in a fetus
CN110904220A (en) Composition, kit and method for detecting CYP2D6 gene polymorphism and copy number
CN101575639A (en) DNA sequencing method capable of verifying base information for second time
CN116716397A (en) Method and device for detecting DMD gene variation, probe and kit
JP7532396B2 (en) Methods for partner-independent gene fusion detection
CN112592981A (en) Primer group, kit and method for DNA archive construction
JP2022537442A (en) Systems, computer program products and methods using density of single nucleotide mutations to verify copy number variation in human embryos
WO2020006431A1 (en) Method and system for sample identity assurance
US20220076784A1 (en) Systems and methods for identifying feature linkages in multi-genomic feature data from single-cell partitions
Kekeç et al. New generation genome sequencing methods
RU2818323C2 (en) Method of producing full-length human mitochondrial dna sequence using set of oligonucleotides by multiplex amplification for working with degraded dna samples
US20190373871A1 (en) Method for assaying genetic variants
Vecoli Next-generation sequencing technology in the genetics of cardiovascular disease
WO2024010809A2 (en) Methods and systems for detecting recombination events
CN118186097A (en) SNP composite system for degrading and detecting individual identification of material
WO2024163553A1 (en) Methods for detecting gene level copy number variation in brca1 and brca2
WO2024163233A1 (en) Copy number variant calling and recovery
Ludwick Snp Genotyping of Native DNA Using Oxford Nanopore Minion Sequencing
CN116875703A (en) Molecular marker related to calf growth and development and application thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19826231

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020571427

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2019826231

Country of ref document: EP

Effective date: 20210129

ENP Entry into the national phase

Ref document number: 2019291926

Country of ref document: AU

Date of ref document: 20190628

Kind code of ref document: A