US20210050071A1 - Methods and systems for prediction of a dna profile mixture ratio - Google Patents

Methods and systems for prediction of a dna profile mixture ratio Download PDF

Info

Publication number
US20210050071A1
US20210050071A1 US17/082,098 US202017082098A US2021050071A1 US 20210050071 A1 US20210050071 A1 US 20210050071A1 US 202017082098 A US202017082098 A US 202017082098A US 2021050071 A1 US2021050071 A1 US 2021050071A1
Authority
US
United States
Prior art keywords
ratio
dna
sample
mixture
contributors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US17/082,098
Inventor
Michael Marciano
Jonathan D. Adelman
Laura C. Haarer
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Syracuse University
Original Assignee
Syracuse University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Syracuse University filed Critical Syracuse University
Priority to US17/082,098 priority Critical patent/US20210050071A1/en
Assigned to SYRACUSE UNIVERSITY reassignment SYRACUSE UNIVERSITY ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: Adelman, Jonathan D., MARCIANO, Michael, HAARER, LAURA C.
Publication of US20210050071A1 publication Critical patent/US20210050071A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/30Unsupervised data analysis

Definitions

  • the present disclosure is directed generally to methods and systems for identifying nucleic acid in a sample and, more particularly, to methods and systems for determining the ratio of contributors within a DNA mixture.
  • a DNA sample mixture can be defined as a mixture of two or more biological samples, and mastery of their interpretation can greatly impact the course of criminal investigations and/or quality of intelligence.
  • the ability to identify the ratio of the contributors in a DNA sample may substantial improve the ability to identify the individual contributors within a mixed DNA sample.
  • the present disclosure is directed to methods and systems for determining the ratio of contributors within a DNA mixture.
  • the ratio of contributors within a DNA mixture is the one of the key metrics used to separate the individual contributors during mixture deconvolution.
  • the ratio is typically calculated using simple mathematical operations and based on known biological phenomena such as genetic dosage. This method, although effective, is limited by the capacity of human computation and fails to utilize much of the information contained within the profile.
  • the systems described herein combine statistical and biological approaches which are made feasible through a processor.
  • the system includes a combinatorial algorithm to enumerate all potential DNA mixture scenarios within a single DNA marker.
  • the system further includes an outlier removal algorithm, and a clustering algorithm to identify the most similar ratios among DNA markers.
  • a method for determining a ratio of the proportion of DNA from each contributor within a mixed DNA sample comprising the steps of: (i) characterizing a parameter of the DNA mixture; (ii) characterizing a plurality of markers within the DNA mixture; (iii) identifying which of the plurality of markers exhibits a maximum number of alleles, wherein at least one of the plurality of markers is identified; (iv) enumerating, based on the identification, all possible scenarios for contributors to the DNA mixture; (v) determining a mixture ratio for each enumerated scenario, wherein every allele found in a given marker must be represented in the scenario; (vi) identifying all possible clusters for the determined mixture ratios, wherein a cluster is a group of ratios comprising just one ratio from the at least one identified marker; (vii) removing any statistical outliers from each of the identified clusters; (viii) identifying candidate clusters, wherein a cluster is identified as a candidate if the variance of the distance from
  • the method further includes the step of characterizing a parameter of the DNA mixture.
  • the method further includes the step of characterizing the plurality of markers within the DNA mixture.
  • the method further includes the step of preparing the sample for analysis.
  • a system configured to characterize a ratio of contributors to a DNA mixture within a sample.
  • the system includes: a sample preparation module configured to generate initial data about the DNA mixture within the sample; a processor comprising a ratio of contributors determination module, the ratio of contributors determination module configured to: (i) receive the generated initial data; (ii) analyze the generated initial data to determine the ratio of contributors to the DNA mixture within the sample; and an output device configured to receive the determined ratio of contributors from the processor, and further configured to output information about the received determined ratio of contributors.
  • the output device comprises a monitor.
  • the sample preparation module comprises amplification of DNA within the sample. According to an embodiment, the sample preparation module comprises amplification of one or more DNA markers within the sample.
  • analyzing the generated initial data to determine the ratio of contributors to the DNA mixture comprises the steps of: (i) identifying which of a plurality of markers within the DNA mixture exhibit a maximum number of alleles; (ii) enumerating, based on the identification, all possible scenarios for contributors to the DNA mixture; (iii) determining a mixture ratio for each enumerated scenario; (iv) identifying all possible clusters for the determined mixture ratios; (v) removing any statistical outliers from each of the identified clusters; (vi) identifying candidate clusters, wherein a cluster is identified as a candidate if the variance of the distance from each mixture ratio to the cluster's centroid is below a certain threshold; and (vii) comparing all of the candidate clusters to all the of the mixture ratios, wherein the candidate ratio with the highest number of markers containing at least one similar ratio at each marker is identified as the DNA profile mixture ratio.
  • a second aspect is a system configured to characterize a ratio of contributors to a DNA mixture within a sample.
  • the system includes a processor configured to receive data about the DNA within the sample, and further configured to perform the steps of: identifying, using the received data, which of a plurality of markers within the DNA mixture exhibit a maximum number of alleles, or the maximum minus one, wherein at least one of the plurality of markers is identified; enumerating, based on the identification, all possible scenarios for contributors to the DNA mixture; determining a mixture ratio for each enumerated scenario, wherein every allele for the at least one identified marker is represented in the mixture ratio; identifying all possible clusters for the determined mixture ratios, wherein a cluster is a group of ratios comprising just one ratio from the at least one identified marker; removing any statistical outliers from each of the identified clusters; identifying candidate clusters, wherein a cluster is identified as a candidate if the variance of the distance from each mixture ratio to the cluster's centroid is below
  • FIG. 1 is a flowchart of a method for DNA mixture analysis, in accordance with an embodiment.
  • FIG. 2 is a schematic representation of a system for DNA mixture analysis, in accordance with an embodiment.
  • FIG. 3 is a schematic representation of a system for DNA mixture analysis, in accordance with an embodiment.
  • the present disclosure is directed to methods and systems for determining the ratio of contributors within a DNA mixture, namely by combining statistical and biological approaches.
  • the method and system enumerates all potential DNA mixture scenarios within a single DNA marker, removes outliers, and clusters the results to identify the most similar ratios among DNA markers.
  • a sample is provided.
  • the sample can previously be known to include a mixture of DNA from two or more individuals, for example.
  • the sample can be obtained from a location or source that is suspected of containing DNA from two or more individuals.
  • the sample can be obtained from a location or source where it is merely possible that it could contain DNA from two or more individuals.
  • the sample can be obtained directly in the field and then analyzed, or can be obtained at a distant location and/or time prior to analysis. Any sample that could possibly contain DNA therefore could be utilized in the analysis.
  • the sample contains a mixture of DNA from two or more species.
  • the sample may be processed, such as by a DNA extraction and/or separation or purification step, prior to analysis.
  • the sample may be analyzed without a processing step.
  • DNA present in the sample can be characterized by, for example, capillary electrophoresis based fragment analysis, sequencing using PCR analysis with species-specific and/or species-agnostic primers, SNP analysis, one or more loci from human Y-DNA, X-DNA, and/or atDNA, or any other of a wide variety of DNA characterization methods.
  • the DNA ratio characterization step results in one or more data files containing DNA sequence and/or loci information that can be utilized for identification of one or more sources of the DNA in the sample, either by species or individually within a species (such as a particular human being, etc.).
  • a species such as a particular human being, etc.
  • other characteristics of the DNA may be analyzed, such as methylation patterns or other epigenetic modifications, among other characteristics.
  • the system determines which DNA markers or loci exhibit the maximum number of alleles, or the maximum number of allele minus 1, in the DNA mixture.
  • the system enumerates all possible scenarios based on the determined maximum number of alleles, where a scenario is a combination of possible allele pairs/contributors.
  • the system determines a mixture ratio for each valid scenario, where every allele in the marker is represented.
  • a scenario is considered valid if every allele appearing in a given marker appears at least once in said scenario.
  • the system identifies all possible clusters, where clusters are a group of ratios containing one, and only one, ratio from each of the identified markers (i.e., the DNA markers or loci exhibiting the maximum number of alleles, or the maximum number of allele minus 1, in the DNA mixture).
  • statistical outliers are removed from each cluster.
  • the statistical outliers are removed from each cluster using Chebyshev's Inequality, although many other methods are possible.
  • step 90 of the method sufficiently compact clusters are identified, where compactness is the variance of the distances of each component (mixture ratio) to the cluster's centroid.
  • the centroid represents a candidate profile mixture ratio.
  • all candidate ratios are subsequently compared to all mixture ratios across all markers.
  • the candidate ratio with the highest number of markers containing at least one similar ratio at each marker is identified as the DNA profile mixture ratio. Similarity is defined as a measure of Euclidean distance below a dynamic, user specified threshold.
  • sample 210 potentially contains DNA from one or more sources.
  • Sample 210 can previously be known to include a mixture of DNA from two or more sources, or can be an uncharacterized sample. Sample 210 can be obtained directly in the field and then analyzed, or can be obtained at a distant location and/or time prior to analysis. Any sample that could possibly contain DNA therefore could be utilized in the analysis.
  • system 200 can comprise a sample preparation module 220 .
  • Sample preparation module 220 can be, for example, a device, step, component, or system that prepares the obtained sample for analysis.
  • sample preparation module 220 may comprise DNA isolation, extraction, separation, and/or purification.
  • sample preparation module 220 may include any modification of the sample to prepare that sample for analysis.
  • system 200 can optionally comprise a sample characterization module 230 .
  • DNA present in the sample can be characterized by, for example, capillary electrophoresis based fragment analysis, sequencing using PCR analysis with species-specific and/or species-agnostic primers, SNP analysis, one or more loci from human Y-DNA, X-DNA, and/or atDNA, or any other of a wide variety of DNA characterization methods.
  • SNP analysis SNP analysis
  • loci from human Y-DNA, X-DNA, and/or atDNA or any other of a wide variety of DNA characterization methods.
  • other characteristics of the DNA may be analyzed, such as methylation patterns or other epigenetic modifications, among other characteristics.
  • the DNA ratio characterization step results in one or more data files containing DNA sequence and/or loci information that can be utilized for identification of one or more sources of the DNA in the sample, either by species or individually within a species (such as a particular human being, etc.).
  • system 200 comprises a processor 240 .
  • Processor 240 can comprise, for example, a general purpose processor, an application specific processor, or any other processor suitable for carrying out the processing steps as described or otherwise envisioned herein.
  • processor 240 may be a combination of two or more processors.
  • Processor 240 may be local or remote from one or more of the other components of system 240 .
  • processor 240 might be located within a lab, within a facility comprise multiple labs, or at a central location that services multiple facilities.
  • processor 240 is offered via a software as a service.
  • non-transitory storage medium may be implemented as multiple different storage mediums, which may all be local, may be remote (e.g., in the cloud), or some combination of the two.
  • processor 240 comprises or is in communication with a non-transitory storage medium 260 .
  • Database 260 may be any storage medium suitable for storing program code for executed by processor 240 to carry out any one of the steps described or otherwise envisioned herein.
  • Non-transitory storage medium may be comprised of primary memory, secondary memory, and/or a combination thereof.
  • database 260 may also comprise stored data to facilitate the analysis, characterization, and/or identification of the DNA in the sample 210 .
  • processor 240 comprises a ratio determination algorithm or module 250 .
  • Ratio determination algorithm or module 250 may be configured to comprise, perform, or otherwise execute any of the functionality described or otherwise envisioned herein.
  • ratio determination algorithm or module 250 receives data about the DNA within the sample 210 , among other possible data, and utilizes that data to determine or estimate the ratio of contributors within the DNA of the sample, among other outcomes.
  • system 200 comprises an output device 270 , which may be any device configured to or capable of generating and/or delivering output 280 to a user or another device.
  • output device 270 may be a monitor, printer, or any other output device.
  • the output device 270 may be in wired and/or wireless communication with processor 240 and any other component of system 200 .
  • the output device 270 is a remote device connected to the system via a network.
  • output device 270 may be a smartphone, tablet, or any other portable or remote computing device.
  • Processor 240 is optionally further configured to generate output deliverable to output device 270 , and/or to drive output device 270 to generate and/or provide output 280 .
  • output 280 may comprise information about the ratio of contributors to the DNA found in the sample, and/or any other received and/or derived information about the sample.
  • a system 300 for characterizing the ratio of contributors within a DNA mixture of a sample where the sample potentially contains DNA from one or more sources.
  • the sample can previously be known to include a mixture of DNA from two or more sources, or can be an uncharacterized sample.
  • the sample can be obtained directly in the field and then analyzed, or can be obtained at a distant location and/or time prior to analysis. Any sample that could possibly contain DNA therefore could be utilized in the analysis.
  • system 300 comprises a processor 310 .
  • Processor 310 can comprise, for example, a general purpose processor, an application specific processor, or any other processor suitable for carrying out the processing steps as described or otherwise envisioned herein.
  • processor 310 may be a combination of two or more processors.
  • Processor 310 may be local or remote from one or more of the other components of system 310 .
  • processor 310 might be located within a lab, within a facility comprise multiple labs, or at a central location that services multiple facilities.
  • processor 310 is offered via a software as a service.
  • non-transitory storage medium may be implemented as multiple different storage mediums, which may all be local, may be remote (e.g., in the cloud), or some combination of the two.
  • processor 310 comprises a non-transitory storage medium 320 .
  • Storage medium 320 may be any storage medium suitable for storing program code for executed by processor 310 to carry out any one of the steps described or otherwise envisioned herein.
  • Non-transitory storage medium may be comprised of primary memory, secondary memory, and/or a combination thereof.
  • Storage medium 320 may also comprise stored data to facilitate the analysis, characterization, and/or identification of the DNA in the sample.
  • processor 310 comprises a combinatorial module 330 .
  • Combinatorial module 330 enumerates all potential DNA mixture scenarios within a single DNA marker.
  • one or more markers within a DNA mixture are characterized. The system determines which of the plurality of markers exhibits a maximum number of alleles, and then enumerates, based on that identified marker, all possible scenarios for contributors to the DNA mixture. The system then determines a mixture ratio for each enumerated scenario, where every allele found in a given marker must be represented in the scenario.
  • processor 310 comprises a clustering module 340 .
  • Clustering module 340 uses the plurality of mixture ratios generated by the combinatorial module 330 to identify all possible clusters for the determined mixture ratios, where a cluster is a group of ratios comprising just one ratio from the at least one identified marker.
  • processor 310 comprises an outlier removal module 350 .
  • Outlier removal module 350 removes any statistical outliers from each of the possible clusters generated by the clustering module 340 .
  • the system then identifies candidate clusters, where a cluster is identified as a candidate if the variance of the distance from each mixture ratio to the cluster's centroid is below a certain user-specified threshold. Lastly, the system compares all of the candidate clusters to all the mixture ratios, and the candidate ratio with the highest number of markers containing at least one similar ratio at each marker is identified as the DNA profile mixture ratio.
  • the system can comprise a single unit with one or more modules, or may comprise multiple modules in more than one location that may be connected via a wired and/or wireless network connection. Alternatively, information may be moved by hand from one module to another.
  • the system may be implemented by hardware and/or software, including but not limited to a processor, computer system, database, computer program, and others.
  • the hardware and/or software can be implemented in different systems or can be implemented in a single system.
  • a “module” or “component” as may be used herein, can include, among other things, the identification of specific functionality represented by specific computer software code of a software program.
  • a software program may contain code representing one or more modules, and the code representing a particular module can be represented by consecutive or non-consecutive lines of code.
  • aspects of the present invention may be embodied/implemented as a computer system, method or computer program product.
  • the computer program product can have a computer processor or neural network, for example, that carries out the instructions of a computer program.
  • aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, and entirely firmware embodiment, or an embodiment combining software/firmware and hardware aspects that may all generally be referred to herein as a “circuit,” “module,” “system,” or an “engine.”
  • aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
  • the computer readable medium may be a computer readable signal medium or a computer readable storage medium.
  • a computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.
  • a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction performance system, apparatus, or device.
  • the program code may perform entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • LAN local area network
  • WAN wide area network
  • Internet Service Provider for example, AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
  • each block in the flowcharts/block diagrams may represent a module, segment, or portion of code, which comprises instructions for implementing the specified logical function(s).
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be performed substantially concurrently, or the blocks may sometimes be performed in the reverse order, depending upon the functionality involved.

Landscapes

  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • Software Systems (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Molecular Biology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

A system configured to characterize a ratio of contributors to a DNA mixture within a sample, the system including: a sample preparation module configured to generate initial data about the DNA mixture within the sample; a processor comprising a ratio of contributors determination module configured to: (i) receive the generated initial data; (ii) analyze the generated initial data to determine the ratio of contributors to the DNA mixture within the sample; and an output device configured to receive the determined ratio of contributors from the processor, and further configured to output information about the received determined ratio of contributors.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application is a continuation of U.S. application Ser. No. 15/367,814, filed on Dec. 2, 2016, which claimed priority to U.S. Provisional Patent Application Ser. No. 62/262,610, filed on Dec. 3, 2015, the entire disclosure of which is incorporated herein by reference.
  • STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
  • This invention was made with Government support under Grant Number 2014-DN-BX-K029, awarded by the National Institute of Justice. The United States Government has certain rights in the invention.
  • FIELD OF THE INVENTION
  • The present disclosure is directed generally to methods and systems for identifying nucleic acid in a sample and, more particularly, to methods and systems for determining the ratio of contributors within a DNA mixture.
  • BACKGROUND
  • At the core of the genetic identification field, particularly in regard to forensic applications and clinical/medical research, is the challenge of DNA mixture interpretation. A DNA sample mixture can be defined as a mixture of two or more biological samples, and mastery of their interpretation can greatly impact the course of criminal investigations and/or quality of intelligence. The ability to identify the ratio of the contributors in a DNA sample may substantial improve the ability to identify the individual contributors within a mixed DNA sample.
  • Although historically expert systems have been in use for this problem, they often fail to meet the needs of the community, and there is continued demand by forensic communities for reliable methods of automation for mixture interpretation. The present state-of-the-art in DNA mixture interpretation includes expert systems which often have limited use, primarily focusing on improving the timeliness of analysis performed by forensic analysts. These systems capture the computational aspects of mixture analysis without taking more subjective factors into account. Further, these systems are used for simple mixtures, typically of two individuals (and thus low complexity). Although more advanced systems capable of analyzing 3-4 individual mixtures exist, these systems are both time- and cost-prohibitive.
  • For example, current methods to estimate the ratio of contributors in a mixed DNA sample rely on those DNA markers with the maximum number of alleles (or maximum number of alleles −1) given the number of contributors. These DNA markers have inherent variability where ratios at several loci within a sample may differ due to the size of the allele, locus base pair size, amount of degradation present, stochastic effects etc. These ratios are then used as a standard to help in the identification of the individual components or contributors in a DNA profile at those loci where a ratio cannot be determined. The calculation of these ratios is typically performed manually using a standard scientific calculator.
  • Accordingly, there is a need in the art for methods and systems that perform complicated DNA mixture interpretation, particularly with regard to more accurately determining the ratio of contributors within a DNA mixture.
  • SUMMARY OF THE INVENTION
  • The present disclosure is directed to methods and systems for determining the ratio of contributors within a DNA mixture. The ratio of contributors within a DNA mixture is the one of the key metrics used to separate the individual contributors during mixture deconvolution. The ratio is typically calculated using simple mathematical operations and based on known biological phenomena such as genetic dosage. This method, although effective, is limited by the capacity of human computation and fails to utilize much of the information contained within the profile.
  • Accordingly, the methods and systems described herein combine statistical and biological approaches which are made feasible through a processor. According to an embodiment, the system includes a combinatorial algorithm to enumerate all potential DNA mixture scenarios within a single DNA marker. The system further includes an outlier removal algorithm, and a clustering algorithm to identify the most similar ratios among DNA markers.
  • According to one aspect is a method for determining a ratio of the proportion of DNA from each contributor within a mixed DNA sample, comprising the steps of: (i) characterizing a parameter of the DNA mixture; (ii) characterizing a plurality of markers within the DNA mixture; (iii) identifying which of the plurality of markers exhibits a maximum number of alleles, wherein at least one of the plurality of markers is identified; (iv) enumerating, based on the identification, all possible scenarios for contributors to the DNA mixture; (v) determining a mixture ratio for each enumerated scenario, wherein every allele found in a given marker must be represented in the scenario; (vi) identifying all possible clusters for the determined mixture ratios, wherein a cluster is a group of ratios comprising just one ratio from the at least one identified marker; (vii) removing any statistical outliers from each of the identified clusters; (viii) identifying candidate clusters, wherein a cluster is identified as a candidate if the variance of the distance from each mixture ratio to the cluster's centroid is below a certain user-specified threshold; and (ix) comparing all of the candidate clusters to all the of the mixture ratios, wherein the candidate ratio with the highest number of markers containing at least one similar ratio at each marker is identified as the DNA profile mixture ratio.
  • According to an embodiment, the method further includes the step of characterizing a parameter of the DNA mixture.
  • According to an embodiment, the method further includes the step of characterizing the plurality of markers within the DNA mixture.
  • According to an embodiment, the method further includes the step of preparing the sample for analysis.
  • According to a second aspect is a system configured to characterize a ratio of contributors to a DNA mixture within a sample. The system includes: a sample preparation module configured to generate initial data about the DNA mixture within the sample; a processor comprising a ratio of contributors determination module, the ratio of contributors determination module configured to: (i) receive the generated initial data; (ii) analyze the generated initial data to determine the ratio of contributors to the DNA mixture within the sample; and an output device configured to receive the determined ratio of contributors from the processor, and further configured to output information about the received determined ratio of contributors.
  • According to an embodiment, the output device comprises a monitor.
  • According to an embodiment, the sample preparation module comprises amplification of DNA within the sample. According to an embodiment, the sample preparation module comprises amplification of one or more DNA markers within the sample.
  • According to an embodiment, analyzing the generated initial data to determine the ratio of contributors to the DNA mixture comprises the steps of: (i) identifying which of a plurality of markers within the DNA mixture exhibit a maximum number of alleles; (ii) enumerating, based on the identification, all possible scenarios for contributors to the DNA mixture; (iii) determining a mixture ratio for each enumerated scenario; (iv) identifying all possible clusters for the determined mixture ratios; (v) removing any statistical outliers from each of the identified clusters; (vi) identifying candidate clusters, wherein a cluster is identified as a candidate if the variance of the distance from each mixture ratio to the cluster's centroid is below a certain threshold; and (vii) comparing all of the candidate clusters to all the of the mixture ratios, wherein the candidate ratio with the highest number of markers containing at least one similar ratio at each marker is identified as the DNA profile mixture ratio.
  • According to a second aspect is a system configured to characterize a ratio of contributors to a DNA mixture within a sample. The system includes a processor configured to receive data about the DNA within the sample, and further configured to perform the steps of: identifying, using the received data, which of a plurality of markers within the DNA mixture exhibit a maximum number of alleles, or the maximum minus one, wherein at least one of the plurality of markers is identified; enumerating, based on the identification, all possible scenarios for contributors to the DNA mixture; determining a mixture ratio for each enumerated scenario, wherein every allele for the at least one identified marker is represented in the mixture ratio; identifying all possible clusters for the determined mixture ratios, wherein a cluster is a group of ratios comprising just one ratio from the at least one identified marker; removing any statistical outliers from each of the identified clusters; identifying candidate clusters, wherein a cluster is identified as a candidate if the variance of the distance from each mixture ratio to the cluster's centroid is below a certain threshold; and comparing all of the candidate clusters to all the of the mixture ratios, wherein the candidate ratio with the highest number of markers containing at least one similar ratio at each marker is identified as the DNA profile mixture ratio.
  • These and other aspects of the invention will be apparent from the embodiments described below.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The present invention will be more fully understood and appreciated by reading the following Detailed Description in conjunction with the accompanying drawings, in which:
  • FIG. 1 is a flowchart of a method for DNA mixture analysis, in accordance with an embodiment.
  • FIG. 2 is a schematic representation of a system for DNA mixture analysis, in accordance with an embodiment.
  • FIG. 3 is a schematic representation of a system for DNA mixture analysis, in accordance with an embodiment.
  • DETAILED DESCRIPTION OF EMBODIMENTS
  • There is a continued need for methods and systems that perform DNA mixture interpretation in both a time-effective and cost-effective manner. Accordingly, the present disclosure is directed to methods and systems for determining the ratio of contributors within a DNA mixture, namely by combining statistical and biological approaches. According to an embodiment, the method and system enumerates all potential DNA mixture scenarios within a single DNA marker, removes outliers, and clusters the results to identify the most similar ratios among DNA markers.
  • Referring to FIG. 1 is a flowchart of a method 10 for DNA mixture analysis in accordance with an embodiment. At step 20, a sample is provided. The sample can previously be known to include a mixture of DNA from two or more individuals, for example. Alternatively, the sample can be obtained from a location or source that is suspected of containing DNA from two or more individuals. As yet another alternative, the sample can be obtained from a location or source where it is merely possible that it could contain DNA from two or more individuals. The sample can be obtained directly in the field and then analyzed, or can be obtained at a distant location and/or time prior to analysis. Any sample that could possibly contain DNA therefore could be utilized in the analysis. According to another embodiment, the sample contains a mixture of DNA from two or more species.
  • At step 30, a parameter of all or part of the DNA in the sample—if DNA is present in the sample—is characterized. For example, the sample may be processed, such as by a DNA extraction and/or separation or purification step, prior to analysis. Alternatively, the sample may be analyzed without a processing step. DNA present in the sample can be characterized by, for example, capillary electrophoresis based fragment analysis, sequencing using PCR analysis with species-specific and/or species-agnostic primers, SNP analysis, one or more loci from human Y-DNA, X-DNA, and/or atDNA, or any other of a wide variety of DNA characterization methods. According to a preferred embodiment, the DNA ratio characterization step results in one or more data files containing DNA sequence and/or loci information that can be utilized for identification of one or more sources of the DNA in the sample, either by species or individually within a species (such as a particular human being, etc.). According to advanced methods, other characteristics of the DNA may be analyzed, such as methylation patterns or other epigenetic modifications, among other characteristics.
  • At step 40 of the method, the system determines which DNA markers or loci exhibit the maximum number of alleles, or the maximum number of allele minus 1, in the DNA mixture.
  • At step 50 of the method, the system enumerates all possible scenarios based on the determined maximum number of alleles, where a scenario is a combination of possible allele pairs/contributors.
  • At step 60 of the method, the system determines a mixture ratio for each valid scenario, where every allele in the marker is represented. A scenario is considered valid if every allele appearing in a given marker appears at least once in said scenario.
  • At step 70 of the method, the system identifies all possible clusters, where clusters are a group of ratios containing one, and only one, ratio from each of the identified markers (i.e., the DNA markers or loci exhibiting the maximum number of alleles, or the maximum number of allele minus 1, in the DNA mixture).
  • At step 80 of the method, statistical outliers are removed from each cluster. According to an embodiment, the statistical outliers are removed from each cluster using Chebyshev's Inequality, although many other methods are possible.
  • At step 90 of the method, sufficiently compact clusters are identified, where compactness is the variance of the distances of each component (mixture ratio) to the cluster's centroid. The centroid represents a candidate profile mixture ratio.
  • At step 100 of the method, all candidate ratios are subsequently compared to all mixture ratios across all markers. The candidate ratio with the highest number of markers containing at least one similar ratio at each marker is identified as the DNA profile mixture ratio. Similarity is defined as a measure of Euclidean distance below a dynamic, user specified threshold.
  • Referring to FIG. 2, in one embodiment, is a system 200 for characterizing the ratio of contributors within a DNA mixture of a sample 210, where sample 210 potentially contains DNA from one or more sources. Sample 210 can previously be known to include a mixture of DNA from two or more sources, or can be an uncharacterized sample. Sample 210 can be obtained directly in the field and then analyzed, or can be obtained at a distant location and/or time prior to analysis. Any sample that could possibly contain DNA therefore could be utilized in the analysis.
  • According to an embodiment, system 200 can comprise a sample preparation module 220. Sample preparation module 220 can be, for example, a device, step, component, or system that prepares the obtained sample for analysis. For example, sample preparation module 220 may comprise DNA isolation, extraction, separation, and/or purification. According to an embodiment, sample preparation module 220 may include any modification of the sample to prepare that sample for analysis.
  • According to an embodiment, system 200 can optionally comprise a sample characterization module 230. For example, DNA present in the sample can be characterized by, for example, capillary electrophoresis based fragment analysis, sequencing using PCR analysis with species-specific and/or species-agnostic primers, SNP analysis, one or more loci from human Y-DNA, X-DNA, and/or atDNA, or any other of a wide variety of DNA characterization methods. According to advanced methods, other characteristics of the DNA may be analyzed, such as methylation patterns or other epigenetic modifications, among other characteristics. According to an embodiment, the DNA ratio characterization step results in one or more data files containing DNA sequence and/or loci information that can be utilized for identification of one or more sources of the DNA in the sample, either by species or individually within a species (such as a particular human being, etc.).
  • According to an embodiment, system 200 comprises a processor 240. Processor 240 can comprise, for example, a general purpose processor, an application specific processor, or any other processor suitable for carrying out the processing steps as described or otherwise envisioned herein. According to an embodiment, processor 240 may be a combination of two or more processors. Processor 240 may be local or remote from one or more of the other components of system 240. For example, processor 240 might be located within a lab, within a facility comprise multiple labs, or at a central location that services multiple facilities. According to another embodiment, processor 240 is offered via a software as a service. One of ordinary skill will appreciate that non-transitory storage medium may be implemented as multiple different storage mediums, which may all be local, may be remote (e.g., in the cloud), or some combination of the two.
  • According to an embodiment, processor 240 comprises or is in communication with a non-transitory storage medium 260. Database 260 may be any storage medium suitable for storing program code for executed by processor 240 to carry out any one of the steps described or otherwise envisioned herein. Non-transitory storage medium may be comprised of primary memory, secondary memory, and/or a combination thereof. As described in greater detail herein, database 260 may also comprise stored data to facilitate the analysis, characterization, and/or identification of the DNA in the sample 210.
  • According to an embodiment, processor 240 comprises a ratio determination algorithm or module 250. Ratio determination algorithm or module 250 may be configured to comprise, perform, or otherwise execute any of the functionality described or otherwise envisioned herein. According to an embodiment, ratio determination algorithm or module 250 receives data about the DNA within the sample 210, among other possible data, and utilizes that data to determine or estimate the ratio of contributors within the DNA of the sample, among other outcomes.
  • According to an embodiment, system 200 comprises an output device 270, which may be any device configured to or capable of generating and/or delivering output 280 to a user or another device. For example, output device 270 may be a monitor, printer, or any other output device. The output device 270 may be in wired and/or wireless communication with processor 240 and any other component of system 200. According to yet another embodiment, the output device 270 is a remote device connected to the system via a network. For example, output device 270 may be a smartphone, tablet, or any other portable or remote computing device. Processor 240 is optionally further configured to generate output deliverable to output device 270, and/or to drive output device 270 to generate and/or provide output 280. As described herein, output 280 may comprise information about the ratio of contributors to the DNA found in the sample, and/or any other received and/or derived information about the sample.
  • Referring to FIG. 3, in one embodiment, is a system 300 for characterizing the ratio of contributors within a DNA mixture of a sample, where the sample potentially contains DNA from one or more sources. The sample can previously be known to include a mixture of DNA from two or more sources, or can be an uncharacterized sample. The sample can be obtained directly in the field and then analyzed, or can be obtained at a distant location and/or time prior to analysis. Any sample that could possibly contain DNA therefore could be utilized in the analysis.
  • According to an embodiment, system 300 comprises a processor 310. Processor 310 can comprise, for example, a general purpose processor, an application specific processor, or any other processor suitable for carrying out the processing steps as described or otherwise envisioned herein. According to an embodiment, processor 310 may be a combination of two or more processors. Processor 310 may be local or remote from one or more of the other components of system 310. For example, processor 310 might be located within a lab, within a facility comprise multiple labs, or at a central location that services multiple facilities. According to another embodiment, processor 310 is offered via a software as a service. One of ordinary skill will appreciate that non-transitory storage medium may be implemented as multiple different storage mediums, which may all be local, may be remote (e.g., in the cloud), or some combination of the two.
  • According to an embodiment, processor 310 comprises a non-transitory storage medium 320. Storage medium 320 may be any storage medium suitable for storing program code for executed by processor 310 to carry out any one of the steps described or otherwise envisioned herein. Non-transitory storage medium may be comprised of primary memory, secondary memory, and/or a combination thereof. As described in greater detail herein, Storage medium 320 may also comprise stored data to facilitate the analysis, characterization, and/or identification of the DNA in the sample.
  • According to an embodiment, processor 310 comprises a combinatorial module 330. Combinatorial module 330 enumerates all potential DNA mixture scenarios within a single DNA marker. According to an embodiment, one or more markers within a DNA mixture are characterized. The system determines which of the plurality of markers exhibits a maximum number of alleles, and then enumerates, based on that identified marker, all possible scenarios for contributors to the DNA mixture. The system then determines a mixture ratio for each enumerated scenario, where every allele found in a given marker must be represented in the scenario.
  • According to an embodiment, processor 310 comprises a clustering module 340. Clustering module 340 uses the plurality of mixture ratios generated by the combinatorial module 330 to identify all possible clusters for the determined mixture ratios, where a cluster is a group of ratios comprising just one ratio from the at least one identified marker.
  • According to an embodiment, processor 310 comprises an outlier removal module 350. Outlier removal module 350 removes any statistical outliers from each of the possible clusters generated by the clustering module 340.
  • The system then identifies candidate clusters, where a cluster is identified as a candidate if the variance of the distance from each mixture ratio to the cluster's centroid is below a certain user-specified threshold. Lastly, the system compares all of the candidate clusters to all the mixture ratios, and the candidate ratio with the highest number of markers containing at least one similar ratio at each marker is identified as the DNA profile mixture ratio.
  • According to one embodiment, the system can comprise a single unit with one or more modules, or may comprise multiple modules in more than one location that may be connected via a wired and/or wireless network connection. Alternatively, information may be moved by hand from one module to another. The system may be implemented by hardware and/or software, including but not limited to a processor, computer system, database, computer program, and others. The hardware and/or software can be implemented in different systems or can be implemented in a single system.
  • While various embodiments have been described and illustrated herein, those of ordinary skill in the art will readily envision a variety of other means and/or structures for performing the function and/or obtaining the results and/or one or more of the advantages described herein, and each of such variations and/or modifications is deemed to be within the scope of the embodiments described herein. More generally, those skilled in the art will readily appreciate that all parameters, dimensions, materials, and configurations described herein are meant to be exemplary and that the actual parameters, dimensions, materials, and/or configurations will depend upon the specific application or applications for which the teachings is/are used. Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments described herein. It is, therefore, to be understood that the foregoing embodiments are presented by way of example only and that, within the scope of the appended claims and equivalents thereto, embodiments may be practiced otherwise than as specifically described and claimed. Embodiments of the present disclosure are directed to each individual feature, system, article, material, kit, and/or method described herein. In addition, any combination of two or more such features, systems, articles, materials, kits, and/or methods, if such features, systems, articles, materials, kits, and/or methods are not mutually inconsistent, is included within the scope of the present disclosure.
  • A “module” or “component” as may be used herein, can include, among other things, the identification of specific functionality represented by specific computer software code of a software program. A software program may contain code representing one or more modules, and the code representing a particular module can be represented by consecutive or non-consecutive lines of code.
  • As will be appreciated by one skilled in the art, aspects of the present invention may be embodied/implemented as a computer system, method or computer program product. The computer program product can have a computer processor or neural network, for example, that carries out the instructions of a computer program. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, and entirely firmware embodiment, or an embodiment combining software/firmware and hardware aspects that may all generally be referred to herein as a “circuit,” “module,” “system,” or an “engine.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
  • Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction performance system, apparatus, or device.
  • The program code may perform entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • The flowcharts/block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowcharts/block diagrams may represent a module, segment, or portion of code, which comprises instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be performed substantially concurrently, or the blocks may sometimes be performed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Claims (10)

What is claimed is:
1. A system configured to determine a ratio of any contributors to a forensic DNA sample, the system comprising:
a sample preparation module configured to generate DNA sequence data representing a plurality of markers from a forensic DNA sample;
a processor programmed to receive the DNA sequence data and to determine a ratio of contributors to the forensic DNA sample using the DNA sequence data, wherein the processors is programmed to determine the ratio of contributors by
identifying each of the plurality of markers within the forensic DNA sample mixture that exhibit a maximum number of alleles,
determining all possible scenarios for contributors to the forensic DNA sample mixture based on the determined maximum number of alleles,
evaluating whether each possible scenario is a valid scenario based on whether the possible scenario has every allele appearing in one of the plurality of markers appears at least once,
determining an initial mixture ratio for each valid scenario,
identifying any clusters formed by a group of initial mixture ratios containing only one ratio for each of the plurality of markers,
selecting any compact clusters having a distance between a centroid of the cluster to the initial mixture that is below a certain threshold, wherein the centroid of selected compact clusters represents a candidate profile mixture ratio,
comparing all of the candidate profile mixture ratios to all the of the mixture ratios across all of the plurality of markers to identify the ratio of contributors to the forensic DNA sample based on which of the candidate profile mixture ratios has the highest number of markers containing at least one similar ratio at each of the plurality of markers,
an output device coupled to the processor and configured to receive the ratio of contributors to the forensic DNA sample and to output the ratio of contributors to the forensic DNA sample.
2. The system of claim 1, wherein the identification of each of the plurality of markers within the forensic DNA sample mixture that exhibit the maximum number of alleles includes the maximum number of alleles minus one.
3. The system of claim 2, wherein the processor is programmed to remove any statistical outliers from the clusters formed by a group of initial mixture ratios containing only one ratio for each of the plurality of markers.
4. The system of claim 3, wherein the processor is programmed to remove the statistical outliers using Chebyshev's Inequality.
5. The system of claim 4, wherein the at least one similar ratio is determined by a Euclidean distance that is below a predetermined threshold.
6. The system of claim 5, wherein processor is configured all a user to set the predetermined threshold.
7. The system of claim 1, wherein the sample preparation module is configured to generate the DNA sequence data using capillary electrophoresis based fragment analysis.
8. The system of claim 1, wherein the sample preparation module is configured to generate the DNA sequence data using polymerase chain reaction sequencing.
9. The system of claim 1, wherein the forensic DNA sample includes DNA from more than two individuals.
10. The system of claim 1, wherein the forensic DNA sample includes DNA from more than one species.
US17/082,098 2015-12-03 2020-10-28 Methods and systems for prediction of a dna profile mixture ratio Pending US20210050071A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/082,098 US20210050071A1 (en) 2015-12-03 2020-10-28 Methods and systems for prediction of a dna profile mixture ratio

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201562262610P 2015-12-03 2015-12-03
US15/367,814 US10854316B2 (en) 2015-12-03 2016-12-02 Methods and systems for prediction of a DNA profile mixture ratio
US17/082,098 US20210050071A1 (en) 2015-12-03 2020-10-28 Methods and systems for prediction of a dna profile mixture ratio

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US15/367,814 Continuation US10854316B2 (en) 2015-12-03 2016-12-02 Methods and systems for prediction of a DNA profile mixture ratio

Publications (1)

Publication Number Publication Date
US20210050071A1 true US20210050071A1 (en) 2021-02-18

Family

ID=58798388

Family Applications (2)

Application Number Title Priority Date Filing Date
US15/367,814 Active 2038-01-18 US10854316B2 (en) 2015-12-03 2016-12-02 Methods and systems for prediction of a DNA profile mixture ratio
US17/082,098 Pending US20210050071A1 (en) 2015-12-03 2020-10-28 Methods and systems for prediction of a dna profile mixture ratio

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US15/367,814 Active 2038-01-18 US10854316B2 (en) 2015-12-03 2016-12-02 Methods and systems for prediction of a DNA profile mixture ratio

Country Status (1)

Country Link
US (2) US10854316B2 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10854316B2 (en) * 2015-12-03 2020-12-01 Syracuse University Methods and systems for prediction of a DNA profile mixture ratio

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100086926A1 (en) * 2008-07-23 2010-04-08 David Craig Method of characterizing sequences from genetic material samples
US8898021B2 (en) * 2001-02-02 2014-11-25 Mark W. Perlin Method and system for DNA mixture analysis
US20180355347A1 (en) * 2015-12-03 2018-12-13 Syracuse University Methods and systems for determination of the number of contributors to a dna mixture
US10854316B2 (en) * 2015-12-03 2020-12-01 Syracuse University Methods and systems for prediction of a DNA profile mixture ratio
US10957421B2 (en) * 2014-12-03 2021-03-23 Syracuse University System and method for inter-species DNA mixture interpretation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9180964B2 (en) * 2013-03-15 2015-11-10 Bell Helicopter Textron Inc. Autorotative enhancement system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8898021B2 (en) * 2001-02-02 2014-11-25 Mark W. Perlin Method and system for DNA mixture analysis
US20100086926A1 (en) * 2008-07-23 2010-04-08 David Craig Method of characterizing sequences from genetic material samples
US10957421B2 (en) * 2014-12-03 2021-03-23 Syracuse University System and method for inter-species DNA mixture interpretation
US20180355347A1 (en) * 2015-12-03 2018-12-13 Syracuse University Methods and systems for determination of the number of contributors to a dna mixture
US10854316B2 (en) * 2015-12-03 2020-12-01 Syracuse University Methods and systems for prediction of a DNA profile mixture ratio

Also Published As

Publication number Publication date
US10854316B2 (en) 2020-12-01
US20170161431A1 (en) 2017-06-08

Similar Documents

Publication Publication Date Title
CN111341383B (en) Method, device and storage medium for detecting copy number variation
US9354236B2 (en) Method for identifying peptides and proteins from mass spectrometry data
CN108920899B (en) Single exon copy number variation prediction method based on target region sequencing
Awan et al. MS-REDUCE: an ultrafast technique for reduction of big mass spectrometry data for high-throughput processing
EP3658687A1 (en) Methods for detecting biallelic loss of function in next-generation sequencing genomic data
CN114088847B (en) Sample determination method and device based on chromatographic analysis, storage medium and server
KR20200107774A (en) How to align targeting nucleic acid sequencing data
CN108780047B (en) Method for detecting substance component, related device and computer-readable storage medium
US20210050071A1 (en) Methods and systems for prediction of a dna profile mixture ratio
US10957421B2 (en) System and method for inter-species DNA mixture interpretation
EP2926289A1 (en) Method and system for processing data for evaluating a quality level of a dataset
KR101839088B1 (en) Method for predicting absoulte copy number variation based on single sample
US20180355347A1 (en) Methods and systems for determination of the number of contributors to a dna mixture
US11309062B2 (en) Hierarchical optimized detection of relatives
KR102397822B1 (en) Apparatus and method for analyzing cells using chromosome structure and state information
KR20190126930A (en) SIGNATURE-HASH FOR MULTI-SEQUENCE FILES
KR101841265B1 (en) Method for eliminating bias of targeted sequencing by using nmf
US11386340B2 (en) Method and apparatus for performing block retrieval on block to be processed of urine sediment image
CN110021342B (en) Method and system for accelerating identification of variant sites
CN110032933B (en) Image data acquisition method and device, terminal and storage medium
US20170206309A1 (en) Metagenome mapping
Hiranuma et al. CloudControl: Leveraging many public ChIP-seq control experiments to better remove background noise
US20200202982A1 (en) Methods and systems for assessing the presence of allelic dropout using machine learning algorithms
JP2008226095A (en) Gene expression variation analysis method, system and program
Rezaeian et al. A new algorithm for finding enriched regions in chip-seq data

Legal Events

Date Code Title Description
AS Assignment

Owner name: SYRACUSE UNIVERSITY, NEW YORK

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MARCIANO, MICHAEL;ADELMAN, JONATHAN D.;HAARER, LAURA C.;SIGNING DATES FROM 20190711 TO 20190730;REEL/FRAME:054189/0547

STPP Information on status: patent application and granting procedure in general

Free format text: APPLICATION DISPATCHED FROM PREEXAM, NOT YET DOCKETED

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED