WO2009083989A1 - Methods for dna authentication - Google Patents

Methods for dna authentication Download PDF

Info

Publication number
WO2009083989A1
WO2009083989A1 PCT/IL2009/000009 IL2009000009W WO2009083989A1 WO 2009083989 A1 WO2009083989 A1 WO 2009083989A1 IL 2009000009 W IL2009000009 W IL 2009000009W WO 2009083989 A1 WO2009083989 A1 WO 2009083989A1
Authority
WO
WIPO (PCT)
Prior art keywords
nucleic acids
loci
authentic
sample
test sample
Prior art date
Application number
PCT/IL2009/000009
Other languages
French (fr)
Inventor
Dan Frumkin
Original Assignee
Nucleix Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nucleix Ltd. filed Critical Nucleix Ltd.
Publication of WO2009083989A1 publication Critical patent/WO2009083989A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism

Definitions

  • the invention relates to methods for verifying the authenticity of a DNA sample.
  • the invention relates to methods for determining whether nucleic acids, specifically DNA in a biological sample were generated in vitro or in vivo.
  • DNA profiling uses a variety of techniques to distinguish between individuals of the same species using only samples of their DNA. Two humans will have the vast majority of their DNA sequence in common. DNA profiling exploits highly variable repeating sequences called short tandem repeats (STRs). Two unrelated humans will be unlikely to have the same numbers of tandem repeats at a given locus. In STR profiling, PCR is used to obtain enough DNA to amplify the number of repeats at several loci. It is possible to establish a match that is extremely unlikely to have arisen by coincidence, except in the case of identical twins, who will have identical genetic profiles.
  • STRs short tandem repeats
  • DNA profiling is used in forensic science, to match suspects to samples of blood, hair, saliva, semen, etc. It has also led to several exonerations of formerly convicted suspects. It is also used in such applications as identifying human remains, paternity testing, matching organ donors, studying populations of wild animals, and establishing the province or composition of foods. It has also been used to generate hypotheses on the pattern of the human diaspora in prehistoric times.
  • Fig 1 demonstrates a general scheme of the DNA authentication procedure.
  • Fig 2A-C demonstrates DNA profiles of "real" and "fake” mock forensic samples.
  • Fig 2 A (1-3) shows the DNA profile that was obtained from sample 1 (genuine blood sample of individual A on cotton).
  • Fig 2B (1-3) shows the DNA profile that was obtained from sample 2 (genuine blood sample of individual B on cotton).
  • Fig 2C (1-3) shows the DNA profile that was obtained from sample 3 (fake blood sample on cotton, composed of red blood cells of individual A mixed with in vitro generated copies of DNA from individual B).
  • Fig 3 demonstrates a specific implementation of the DNA authentication procedure, based on analysis of methylation of HpaII digested DNA.
  • Fig 4 A demonstrates a joint DNA profiling and authentication scheme.
  • Fig 4B depicts a scheme of a joint DNA profiling and authentication procedure employing an HpaII based methylation assay.
  • the left portion of the output histogram contains authentication loci and the right portion of the output histogram contains profiling loci. Color-coded bars are depicted above each analyzed locus. Bars in the authentication region represent results that indicate that the DNA sample was generated in vivo.
  • Fig 5 depicts examples of DNA profiles combined with results of DNA authentication for the capillary electrophoresis histograms of samples 2 and 3. - A -
  • Fig 6A-D demonstrates the calculation of the representation bias based on a linear regression of capillary electrophoresis histogram peaks.
  • 6 A and 6B represent in vivo generated DNA
  • 6C, 6D represents in vitro generated DNA.
  • In vitro generated DNA can be produced such, that upon DNA profiling, it will produce a DNA profile that is indistinguishable by current methods from the profile of native DNA.
  • in vitro generated DNA can be produced such that it will reproduce any specific, desired DNA profile. Producing such in vitro generated DNA requires only the use of basic lab equipment, standard lab techniques, can be performed very quickly, and with little financial expense. It should also be noted that producing in vitro generated DNA does not necessitate obtaining a real source for the duplicated DNA. For example, alleles can be amplified or cloned from other sources and assembled to create any desired profile.
  • Such in vitro-generated DNA can be planted in crime scenes and thus incriminate any person with a known DNA profile. Planting of such "fake” DNA in crime scenes can be performed easily, and can be incorporated into genuine human tissues by mixing in vitro generated DNA with tissues (e.g. blood, sperm, saliva, etc.) from any person. If such a planted tissue is not treated for destruction of the native DNA (e.g. by UV irradiation), or if the quantity of in vitro DNA is much larger than that of the native DNA, the DNA profile that will be extracted by existing methods from such a tissue will appear as a homogeneous sample consisting of the in vitro generated DNA only.
  • tissues e.g. blood, sperm, saliva, etc.
  • the present invention provides a method for verifying the authenticity of nucleic acid molecules employed in nucleic-acid based analysis procedures, the method comprising:
  • the present invention further provides a method for verifying the authenticity of biological samples containing nucleic acid molecules employed in nucleic-acid based analysis procedures, the method comprising:
  • nucleic acids obtained from a biological sample ;
  • the present invention provides a method for verifying the authenticity of nucleic acids employed in nucleic-acid based analysis procedures, the method comprising:
  • the present invention also provides a method for verifying the authenticity of biological samples containing nucleic acid molecules employed in nucleic-acid based analysis procedures, the method comprising:
  • the authenticity of said nucleic acids or said sample is determined by subjecting the nucleic acid molecules of a test sample to at least one procedure selected from the group consisting of:
  • RNA screening for presence of RNA in said nucleic acids wherein said presence of RNA is indicative that said nucleic acids are authentic, and wherein the absence of RNA of said nucleic acids is indicative that said nucleic acids are not authentic.
  • the present invention provides use of at least one procedure selected from the group consisting of:
  • RNA in the biological sample for verifying the authenticity of nucleic acid molecules or a biological sample containing nucleic acids.
  • the authenticity of said nucleic acids or said sample is determined by amplifying a set of loci from said nucleic acids, wherein said amplifying step is carried out using PCR or Restriction and Circularization-Aided Rolling Circle Amplification.
  • the PCR is performed using both CODIS STR primers and non-CODIS STR primers and accordingly concurrent presence of CODIS STR PCR products and absence of non-CODIS STR PCR products in the sample is indicative that said sample is not authentic.
  • the authenticity of said nucleic acids or said sample is determined by calculating the representation bias, said method comprising: a. defining a set of genomic loci; b. Calculating the Relative Copy Number (RCN) of each locus and/or allele in the set; c. calculating the Representation Bias Value (RBV) of the test sample; and d.
  • RCN Relative Copy Number
  • RBV Representation Bias Value
  • the calculation of the representation bias comprises: a. defining a set of genomic loci; b. Calculating the Relative Copy Number (RCN) of each locus and/or allele in the set for a test sample and for a reference sample; c. calculating the Representation Bias Value (RBV) of the test sample; and d.
  • the authenticity of said nucleic acids or said sample is determined by calculating the amount of PCR stutter, wherein said method comprises: a. Subjecting the test sample to PCR analysis using primers specific to selected genetic loci; b. Analyzing the PCR amplification products using capillary electrophoresis; c. processing the capillary electrophoresis data for detection of alleles and stutter peaks; d. determining the size and/or area of the -1 and/or +1 stutter fraction; e. calculating the likelihood parameters representing the likelihoods of obtaining the stutter values obtained in step d in an in vivo generated nucleic acid sample; f.
  • step (f) calculating the joint likelihood value of the test sample, representing the likelihood that the test sample was generated in vivo; wherein when the joint likelihood value obtained in step (f) is smaller than a predefined threshold, this is indicative that the nucleic acids from the test sample are not authentic, and when the joint likelihood value obtained in step (f) is equal to or larger than a predefined value, this is indicative that the nucleic acids from the test sample are authentic.
  • the calculation of the amount of PCR stutter comprises: a. Subjecting the test sample and a reference sample obtained from in vivo generated DNA to PCR analysis using primers specific to selected genetic loci; b. Analyzing the PCR amplification products using capillary electrophoresis; c. processing the capillary electrophoresis data for detection of alleles and stutter peaks; d. determining the size and/or area of the -1 and/or +1 stutter fraction; e. calculating the likelihood parameters representing the likelihoods of obtaining the stutter values obtained in step d in an in vivo generated nucleic acid sample; f.
  • the joint likelihood value of the test sample representing the likelihood that the test sample was generated in vivo; wherein when the ratio between the value of the joint likelihood parameter obtained from the test sample in step f and the value of the joint likelihood parameter obtained from the reference sample is smaller than a predefined value, this is indicative that the nucleic acids from the test sample are not authentic, and when said ratio is equal to or larger than a predefined value, this is indicative that the nucleic acids from the test sample are authentic.
  • the likelihood parameter is calculated by comparison to a database or calculated by comparison to a normal distribution of corresponding values.
  • the authenticity of said nucleic acids or said sample is verified by determination of the methylation pattern, wherein said determination is performed by analyzing a set of at least one CG loci, said analysis comprising: a. determining the methylation status of each CG locus in said set of CG loci wherein said CG loci are constitutively methylated in in vivo generated DNA; b. determining the ratio between methylated CG loci and total CG loci in said set of CG loci, c.
  • step b comparing the ratio obtained in step b to a predefined threshold value, wherein a ratio lower than said threshold value is indicative that said nucleic acids are not authentic, and wherein a ratio equal to or larger than said threshold value is indicative that said nucleic acids are authentic.
  • the determination of the methylation pattern is performed by analyzing a set of at least one CG loci, said analysis comprising: a. determining the methylation status of each CG locus in said set of CG loci wherein said CG loci are constitutively methylated in in vivo generated DNA; b. determining the ratio between methylated CG loci and total CG loci in said set of CG loci, c.
  • step b comparing the ratio obtained in step b to a corresponding ratio obtained from an in vitro generated reference sample, wherein a significantly larger ratio obtained from the test sample in comparison to the corresponding ratio obtained from the reference sample is indicative that said nucleic acids are authentic, and wherein the ratio obtained from the test sample is not significantly larger than the corresponding ratio obtained from the reference sample is indicative that said nucleic acids are not authentic.
  • the determination of the methylation pattern is performed by analyzing a set of at least one CG loci, said analysis comprising: a. determining the methylation status of each CG locus in said set of CG loci wherein said CG loci are constitutively methylated in in vivo generated DNA; b. determining the ratio between methylated CG loci and total CG loci in said set of CG loci, c.
  • step b comparing the ratio obtained in step b to a corresponding ratio obtained from an in vivo generated reference sample, wherein comparable ratios of the test sample and the reference sample are indicative that the nucleic acids obtained from the test sample are authentic, and wherein non-comparable ratios of the test sample and the reference sample are indicative that the nucleic acids obtained from the test sample are not authentic.
  • the determination of the methylation pattern is performed by analyzing two sets each set comprising at least one CG loci, said analysis comprising: a. determining the methylation status of each CG locus in said two sets of CG loci wherein in the first of said sets said CG loci are constitutively methylated in in vivo generated DNA; and wherein in the second of said sets said CG loci are constitutively unmethylated in in vivo generated DNA; b. determining the ratio between methylated CG loci and total CG loci in said first set of CG loci, c. determining the ratio between unmethylated CG loci and total CG loci in said second set of CG loci, d.
  • the determination of the methylation pattern is performed by analyzing two sets each set comprising at least one CG loci, said analysis comprising: a. determining the methylation status of each CG locus in said two sets of CG loci wherein in the first of said sets said CG loci are constitutively methylated in in vivo generated DNA; and wherein in the second of said sets said CG loci are constitutively unmethylated in in vivo generated DNA; b. determining the ratio between methylated CG loci and total CG loci in said first set of CG loci, c. determining the ratio between unmethylated CG loci and total CG loci in said second set of CG loci, d.
  • step b comparing the ratio obtained in step b to a corresponding ratio obtained from an in vitro generated reference sample, and comparing the ratio obtained in step c to a predefined threshold value wherein if the ratio obtained in step b is significantly greater than the corresponding ratio obtained from the in vitro generated reference sample, and the ratio obtained in step c is greater than a predefined threshold value, this is indicative that said nucleic acids are authentic, and wherein if the ratio obtained in step b is not significantly greater than the corresponding ratio obtained from the in vitro generated reference sample, and/or the ratio obtained in step c is not greater than a predefined threshold value, this is indicative that said nucleic acids are not authentic.
  • the determination of the methylation pattern is performed by analyzing two sets each set comprising at least one CG loci, said analysis comprising: a. determining the methylation status of each CG locus in said two sets of CG loci wherein in the first of said sets said CG loci are constitutively methylated in in vivo generated DNA; and wherein in the second of said sets said CG loci are constitutively unmethylated in in vivo generated DNA; b. determining the ratio between methylated CG loci and total CG loci in said first set of CG loci, c. determining the ratio between unmethylated CG loci and total CG loci in said second set of CG loci, d.
  • the determination of the methylation pattern is performed using bisulfite sequencing.
  • the determination of the methylation pattern is performed using methylation specific PCR.
  • the determination of the methylation pattern is performed using methylation-sensitive endonuclease digestion.
  • said CG loci are amplified using loci specific primers.
  • said loci specific primers are selected from the group consisting of SEQ ID NO. 1-15 (depicted in Tables 1-3).
  • the authenticity of said nucleic acids or said sample is verified by screening for non-genomic sequences in said nucleic acids, wherein detection of either primer dimers, plasmid sequences, non-genomic sequences ligated to ends of genomic sequences, or non-genomic sequences originating from degenerate primers used in in vitro generation of the nucleic acid sample, is indicative that said nucleic acids are not authentic and wherein absence of primer dimers, plasmid sequences, non-genomic sequences ligated to ends of genomic sequences, and non- genomic sequences originating from degenerate primers used in in vitro generation of the nucleic acid sample is indicative that said nucleic acids are authentic.
  • the presence of said non-genomic sequences is detected by a method comprising:
  • the authenticity of said nucleic acids or said sample is verified by determining the distribution of nucleic acid fragment lengths in said nucleic acids, wherein said method comprises:
  • the analysis of the distribution of nucleic acid fragment lengths in said nucleic acids comprises:
  • step (d) determining the probability that both distributions represent random samplings from the same source; wherein, when said probability determined in step (d) is less than about 0.05, this is indicative that the nucleic acids from the test sample are not authentic, and wherein when said probability determined in step (d) is equal to or greater than about 0.05, this is indicative that the nucleic acids from the test sample are authentic.
  • the authenticity of said nucleic acids or said sample which is verified by detecting RNA in said nucleic acids is performed by RT-PCR on one or more specific loci, wherein the absence of RT-PCR amplification products indicates that the nucleic acids are not authentic, and wherein the presence of RT-PCR amplification products indicates that the nucleic acids are authentic.
  • the biological sample is selected from a group consisting of: blood, saliva, hair, semen, urine, feces, skin, epidermal cell, buccal cell, and bone sample.
  • the methods of verification of authenticity, in accordance with the present invention are carried out for forensic uses.
  • said nucleic acids are from a human source.
  • said nucleic acids are genomic DNA, cDNA, hnRNA, mRNA, rRNA, tRNA, fragmented nucleic acids, nucleic acids obtained from sub cellular organelles.
  • said nucleic acids are DNA.
  • the present invention provides a kit for verifying the authenticity of nucleic acids or a biological sample containing nucleic acids, wherein the kit comprises: i. reagents for carrying out at least one procedure selected from the group consisting of:
  • Methods for DNA fingerprinting include Restriction fragment length polymorphism (RFLP), Amplified fragment length polymorphism, short tandem repeat (STR) analysis.
  • RFLP Restriction fragment length polymorphism
  • STR short tandem repeat
  • CODIS Combined DNA Index System
  • DNA profile obtained from such in- vitro generated DNA is indistinguishable from the profile of native DNA by methods known in the art so that in vitro generated DNA can be produced such that it will reproduce any specific DNA profile to be implanted in crime scenes and thus incriminate any person with a known DNA profile. Since DNA profiles from crime scenes are used as evidence in court of law for indictment, there is a need to develop methods for distinguishing in vitro generated DNA from in vivo generated DNA.
  • STR analysis is the most prevalent method of DNA fingerprinting used today. This method uses highly polymorphic regions that have short repeated sequences of DNA (the most common is a 4 bases repeat, but there are other lengths in use, including 5 bases). Because different people have different numbers of repeat units, these regions of DNA can be used to discriminate between individuals. These STR loci (genomic locations) are targeted with sequence-specific primers and are amplified using PCR. The DNA fragments that result are then separated and detected using electrophoresis. There are two common methods of separation and detection - Capillary Electrophoresis (CE) and gel electrophoresis.
  • CE Capillary Electrophoresis
  • gel electrophoresis There are two common methods of separation and detection - Capillary Electrophoresis (CE) and gel electrophoresis.
  • the polymorphisms displayed at each STR region are by themselves very common, typically each polymorphism is shared by around 5 - 20% of individuals. When looking at multiple loci, it is the unique combination of these polymorphisms in an individual that makes this method discriminating as an identification tool. The more STR regions that are tested in an individual the more discriminating the test becomes.
  • STR based DNA profiling systems are in use in different countries. In North America, systems which amplify the CODIS 13 core loci are almost always used, while in the UK the SGM+ system, which is compatible with The National DNA Database is used. Whichever system is used, many of the STR regions under test are the same. These DNA profiling systems are based around multiplex reactions, whereby many STR regions are tested simultaneously.
  • Capillary electrophoresis is performed by electro-kinetically injecting the DNA fragments into a capillary, filled with polymer.
  • the DNA is pulled through the tube by the application of an electric field, separating the fragments such that the smaller fragments travel faster through the capillary.
  • the fragments are then detected using fluorescent dyes that were attached to the primers used in PCR. This allows multiple fragments to be amplified and run simultaneously, also known as multiplexing. Sizes are assigned using labeled DNA size standards that are added to each sample, and the number of repeats are determined by comparing the size to an allelic ladder, a sample that contains all of the common possible repeat sizes. Although this method is expensive, larger capacity machines with higher throughput are being used to lower the cost/sample and reduce backlogs that exist in many government crime facilities.
  • Gel electrophoresis acts using similar principles as CE, but instead of using a capillary, a large polyacrylamide gel is used to separate the DNA fragments. An electric field is applied, as in CE, but instead of detection being performed at a single location in the capillary, the entire gel is scanned into a computer, and all fragments are detected simultaneously. This produces an image showing all of the bands corresponding to different repeat sizes and the allelic ladder. This approach does not require the use of size standards, since the allelic ladder is run alongside the samples and serves this purpose. Visualization can either be through the use of fluorescently tagged dyes in the primers or by silver staining the gel prior to scanning.
  • the CODIS is the FBI-funded computer system that solves crimes by searching DNA profiles developed by federal, state, and local crime laboratories.
  • CODIS profile A record in the CODIS database, known as a CODIS profile, consists of a sample identifier, an identifier for the laboratory responsible for the profile, and the results of the DNA analysis (known as the DNA profile). Other than the DNA profile, CODIS does not contain any personal identity information - the system does not store names, dates of birth, social security numbers, etc.
  • CODIS In its original form, CODIS consisted of two indexes: the Convicted Offender Index and the Forensic Index.
  • the Convicted Offender Index contains profiles of individuals convicted of crimes; state law governs which specific crimes are eligible for CODIS.
  • the Forensic Index contains profiles developed from biological material found at crime-scenes.
  • CODIS has added several other indexes, including: an Arrestee Index, a Missing or Unidentified Persons Index, and a Missing Persons Reference Index.
  • CODIS has a matching algorithm that searches the various indexes against one another according to strict rules that protect personal privacy. For identifying suspects in rape and homicide cases, CODIS searches the Forensic Index against itself and against the Offender Index. A Forensic to Forensic match provides an investigative lead that connects two or more previously unlinked cases. A Forensic to Offender match actually provides a suspect for an otherwise unsolved case. It is important to note that the CODIS matching algorithm only produces a list of candidate matches. Each candidate match is confirmed or refuted by a Qualified DNA Analyst.
  • the Convicted Offender Index requires all 13 CODIS STRs to be present for a profile upload. Forensic profiles only require 10 of the STRs to be present for an upload.
  • the CODIS profile is created by genotyping 13 STR loci, plus two additional genomic loci located on chromosomes X, Y — for determination of sex.
  • the CODIS profile consists of a vector of 26 numbers (representing the allelic values of the maternal and paternal alleles of the 13 STR loci), and the letters XX or XY (representing male or female). Each profile has an associated "frequency", which represents the chance for a randomly picked person to have that profile. The frequency of the profile is the product of all the individual allelic frequencies.
  • compositions, methods, or embodiments discussed are intended to be only illustrative of the invention disclosed by this specification. Variations on these compositions, methods, or embodiments are readily apparent to a person of skill in the art based upon the teachings of this specification and are therefore intended to be included as part of the inventions disclosed herein.
  • forensics or "forensic science” as used herein refers to the application of a broad spectrum of methods aimed to answer questions of identity being of interest to the legal system. For example, the identification of potential suspects whose DNA may match evidence left at crime scenes, the exoneration of persons wrongly accused of crimes, identification of crime and catastrophe victims, or establishment of paternity and other family relationships.
  • nucleic acid x refers to, but is not limited to, genomic DNA, cDNA, hnRNA, mRNA, rRNA, tRNA, fragmented nucleic acid, and nucleic acid obtained from sub cellular organelles such as mitochondria.
  • nucleic acids include, but are not limited to, synthetic or in vitro transcription products.
  • nucleic-acid based analysis procedures refers to any identification procedure which is based on the analysis of nucleic acids, e.g. DNA profiling.
  • in vitro generated nucleic acid refers to, but is not limited to a nucleic acid, which is an artificial assembly ("fake DNA"), achieved by various methods. Such in vitro generated nucleic acid may be implanted in a biological sample. Some non-limiting examples of such methods are described herein below:
  • Plasmid allele-containing inserts only, generated for example, but not only, by endonuclease cleavage of the plasmids and gel purification of the inserts.
  • PCR-based WGA methods include degenerate oligonucleotide-primed (DOP) PCR [1], primer extension pre-amplification (PEP) [2], and ligation-mediated PCR [3].
  • WGA Whole genome amplification
  • MDA multiple displacement amplification
  • biological sample refers to, but is not limited to, any biological sample derived from an animal, preferably a human, and preferably a sample which contains nucleic acids.
  • samples are not directly retrieved from the subject to be identified, but are collected from the environment, e.g. a crime scene or a rape victim.
  • samples include fluids, tissues, cell samples, organs, biopsies, etc.
  • Most preferred samples are blood, plasma, saliva, urine, sperm, hair, etc.
  • the biological sample can also be any of the following - blood drops, dried blood stains, dried saliva stains, dried underwear stains (e.g.
  • Genomic DNA can be extracted from such biological samples.
  • the biological sample may be treated prior to its use, e.g. in order to render nucleic acids available. Techniques of cell or protein lysis, concentration or dilution of nucleic acids that may be used in the context of the present invention are known in the art.
  • allele is intended to be a genetic variation associated with a segment of DNA, i.e., one of two or more alternate forms of a DNA sequence occupying the same locus.
  • locus refers to a position on a chromosome of a gene or other chromosome marker. Locus may also mean the DNA at that position. A variant of the DNA sequence at a given locus is called an allele as denoted herein. Alleles of a locus are located at identical sites on homologous chromosomes.
  • PCR polymerase chain reaction
  • RCA-RCA Restriction and Circularization-Aided Rolling Circle Amplification
  • STR primers refers to any commercially available or made-in-the-lab nucleotide primers that can be used to amplify a target nucleic acid sequence from a biological sample by PCR.
  • STR loci There are -1.5 million non-CODIS STR loci. Non-limiting examples of the above are presented in the following website http://www.cstl.nist.gov/biotech/strbase/str ref.htm that currently contains 3156 references for STRs employed in science, forensics and beyond.
  • STR primers may be obtained from commercial kits for amplification of hundreds of STR loci (for example - ABI Prism Linkage Mapping Set- MDlO -Applied Biosystems), and for amplification of thousands of SNP loci (for example - Illumina BeadArray linkage mapping panel).
  • CODIS STR primers refers to STR primers that are designed to amplify any of the thirteen core STR loci designated by the FBI's "Combined DNA Index System", specifically, the repeated sequences of THOl, TPOX, CSFlPO 5 VWA, FGA, D3S1358, D5S818, D7S820, D13S317, D16S539, D8S1179, D18S51, and D21Sll.
  • PCR polymerase chain reaction
  • -1 stutter refers to a stutter byproduct that is one repeat unit smaller than its associated allele.
  • +1 stutter refers to a stutter byproduct that is one repeat unit larger than its associated allele.
  • -i stutter fraction ' refers to the height (or area) of the -1 stutter peak divided by the height (or area) of the allele peak.
  • +1 stutter fraction refers to the height (or area) of the +1 stutter peak divided by the height (or area) of the allele peak.
  • ''capillary electrophoresis histogram 1 refers to a histogram obtained from capillary electrophoresis of PCR products wherein said products were amplified from genomic loci.
  • representation bias refers to differences in copy - number between different genomic loci in the nucleic acid sample in question.
  • 'CG locus' refers to a genomic sequence that contains one or more CG dinucleotides.
  • constitutively-unmethylated means unmethylated in DNA of most cells of a specific tissue type.
  • the method of the present invention is illustrated in a general scheme depicted in Fig. 1.
  • the input to the DNA authentication scheme in accordance with the present invention is a DNA sample isolated from a biological sample.
  • the DNA undergoes a biochemical procedure followed by signal detection and signal analysis.
  • the authentication methods described herein may also use as input the raw data obtained in the standard DNA profiling procedure.
  • the method of the present invention concerns the authentication of nucleic acids which were isolated from a biological sample.
  • a biological sample For example, a blood sample found at a crime scene.
  • the isolation of nucleic acids (e.g. DNA) from a biological sample may be achieved by various methods known in the art (e.g. see Sambrook et al, [10]) for example, by performing the following the steps:
  • RNA isolation from a biological sample may be achieved by any method known in the art, e.g. as described in [10].
  • the determination whether the nucleic acids in a biological sample were generated in vitro or in vivo may be accomplished using various methods, including those described herein. a. Determining the methylation pattern of a nucleic acid
  • Methylation in the human genome occurs in the form of 5-methyl cytosine and is confined to cytosine residues that are part of the sequence CG (cytosine residues that are part of other sequences are not methylated).
  • CG dinucleotides in the human genome are methylated, and others are not.
  • methylation is cell and tissue specific, such that a specific CG dinucleotide can be methylated in a certain cell and at the same time unmethylated in a different cell, or methylated in a certain tissue and at the same time unmethylated in different tissues. Since methylation at a specific locus can vary from cell to cell, when analyzing the methylation status of DNA extracted from a plurality of cells (e.g. from a forensic sample), the signal can be mixed, showing both the methylated and unmethylated signals in varying ratios. Therefore, when referring to the methylation status of a specific locus in DNA extracted from a plurality of cells, it should be understood that the status refers to the strongest signal, which corresponds to the methylation status of the majority of cells in the sample.
  • genomic loci The methylation status of different genomic loci has been investigated and published (for example, see ref. 9). Some genomic regions have been shown to be mostly methylated, some have been shown to be mostly unmethylated, and some regions have been shown to be mostly methylated in certain tissues but mostly unmethylated in other tissues.
  • Non-limiting examples of methylated loci and corresponding primers for their detection are provided in Table 1.
  • Non-limiting examples of unmethylated loci and corresponding primers for their detection are provided in Table 2.
  • the herein described methods for determining the methylation pattern of nucleic acids i.e. bisulfite sequencing, methylation specific PCR, methylation-sensitive endonuclease digestion
  • Version 1 based on analysis of one set of loci:
  • ratio obtained from test sample in step 3 > corresponding ratio obtained from in vivo control DNA - 0.3
  • ratio obtained from test sample in step 4 > corresponding ratio obtained from in vivo control DNA — 0.3
  • the initial steps involve determining the methylation status of DNA at each CG locus in the set.
  • exemplary methods for determining the methylation pattern of nucleic acids include, but are not limited to the following methods:
  • Bisulfite sequencing is the sequencing of bisulfite treated-DNA to determine its pattern of methylation. The method is based on the fact that treatment of DNA with sodium bisulfite results in conversion of non-methylated cytosine residues to uracil, while leaving the methylated cytosine residues unaffected. Following conversion by sodium bisulfite, specific regions of the DNA are amplified by PCR, and the PCR products are sequenced.
  • uracil residues are amplified as if they were thymine residues, unmethylated cytosine residues in the original DNA appear as thymine residues in the sequenced PCR product, whereas methylated cytosine residues in the original DNA appear as cytosine residues in the sequenced PCR product.
  • each CG locus contains one CG dinucleotide, and the methylation status of each CG dinucleotide is determined by:
  • step 3 at the CG locus concludes that the CG locus was methylated. Otherwise, if the sequence obtained in step 3 at the CG locus is TG, conclude that the CG locus was unmethylated. It should be understood in the context of the present invention that when sequencing from the complementary strand, the unmethylated CGs in the original sequence will appear as CA.
  • Methylation specific PCR is a method of methylation analysis that, like bisulfite sequencing, is also performed on bisulfite-treated DNA, but avoids the need to sequence the genomic region of interest. Instead, the selected region in the bisulfite-treated DNA is amplified by PCR using two sets of primers that are designed to anneal to the same genomic targets.
  • the primer pairs are designed to be "methylated-specific" by including sequences complementing only unconverted 5-rnethylcytosines, or conversely "unmethylated-specific", complementing thymines converted from unmethylated cytosines. Methylation is determined by the relative efficiency of the different primer pairs in achieving amplification.
  • each CG locus is comprised of one or more CG dinucleotides in the primer sequences. CG dinucleotides that are found in the amplified genomic region, but which are not in the primer sequences (i.e. in the region between the primers) are not part of the CG locus.
  • the methylation status of each CG locus can be determined by:
  • step 3 Detecting the presence, absence, and/or quantity of amplification products from step 2 (e.g. by gel/capillary electrophoresis or real time PCR. If detection is based on capillary electrophoresis, fluorescent primers should be used in the PCR in step 2. If detection is based on real time PCR, a fluorescent DNA binding dye or a specific fluorescent DNA probe may need to be used along with the primers in the PCR in step 2). 4. Determining the methylation status of the CG locus by comparing the results obtained in step 3 for the two sets of primers used for amplification.
  • the primers that were designed to preferentially amplify the methylated version of the DNA produce a larger quantity of PCR product than the primers that were designed to preferentially amplify the unmethylated version of the DNA 5 conclude that the CG locus was methylated. Otherwise, conclude that the CG locus was unmethylated.
  • methylation specific PCR determines the methylation status of CG dinucleotides in the primer sequences only, and not in the entire genomic region that is amplified by PCR. Therefore, CG dinucleotides that are found in the amplified sequence but are not in the primer sequences are not part of the CG locus.
  • Digestion of DNA with methylation-sensitive endonucleases represents a method for methylation analysis that can be applied directly to genomic DNA without the need to perform bisulfite conversion.
  • the method is based on the fact that methylation-sensitive endonucleases digest only un-methylated DNA, while leaving methylated DNA intact. Following digestion, the DNA can be analyzed for methylation status by a variety of methods, including gel electrophoresis, and PCR amplification of specific loci.
  • each CG locus is comprised of one or more CG dinucleotides that are part of recognition sequence(s) of the methylation-sensitive restriction endonuclease(s) that are used in step 1 of the procedure.
  • CG dinucleotides that are found in the amplified genomic region, but are not in the recognition sequence(s) of the endonucleas(s) are not part of the CG locus.
  • the methylation status of each CG locus is determined by:
  • methylation-sensitive endonucleases e.g. Hpall, Hhal.
  • Amplifying e.g. by PCR) a genomic region that contains the CG locus and a reference locus from the digested DNA. The reference locus must not contain any of the recognition sequences of the endonucleases used in step 1.
  • step 3 Detecting the presence, absence, and/or quantity of amplification products from step 2 (e.g. by gel/capillary electrophoresis or real time PCR. If detection is based on capillary electrophoresis, fluorescent primers should be used in the PCR in step 2. If detection is based on real time PCR, a fluorescent DNA binding dye or a specific fluorescent DNA probe may need to be used along with the primers in the PCR in step 2).
  • step 4 Determining the methylation status of the CG locus from the results obtained in step 3 by one of the following methods: a. Compare the signal obtained from the amplification of the CG locus to a predetermined threshold. In gel electrophoresis, if a band corresponding to the CG locus is detectable, conclude that the CG locus was methylated, otherwise conclude that the CG locus was unmethylated. In capillary electrophoresis, if the signal corresponding to the CG locus is greater than a pre-determined threshold (e.g. 50 relative fluorescence units) conclude that the CG locus was methylated, otherwise conclude that the CG locus was unmethylated.
  • a predetermined threshold e.g. 50 relative fluorescence units
  • CT cycle threshold
  • a pre-determined threshold e.g. 30
  • the signals of both loci are comparable.
  • a pre-determined threshold ratio e.g. 50%
  • the signals of both loci are comparable.
  • the difference between the cycle thresholds of the CG locus and the cycle threshold of the reference locus is not greater than 2, the signals of both loci are comparable.
  • Methylation-sensitive endonuclease digestion determines the methylation status of CG dinucleotides in the recognition sequences of the endonucleases that are used only, and not in the entire genomic region that is amplified by PCR. Therefore, CG dinucleotides that are found in the amplified sequence but are not in the recognition sequences of the endonucleases are not part of the CG locus.
  • the determination whether a biological sample containing nucleic acids was generated in vivo or in vitro can be performed by analysis of a set of genomic loci in the sample. Any genomic locus may be used for this purpose, other than those loci that are traditionally used for DNA profiling, (e.g. CODIS loci). If the in vitro generated DNA sample consists only of CODIS loci, then all other genomic loci will be absent from the sample. Therefore, the attempt to amplify any non-CODIS locus will fail in such in vitro generated DNA samples, but not in in vivo generated DNA samples. Accordingly, the absence of non-CODIS loci from the test sample indicates that the DNA was synthetically constructed and does not originate from a specific individual.
  • Any genomic locus may be used for this purpose, other than those loci that are traditionally used for DNA profiling, (e.g. CODIS loci). If the in vitro generated DNA sample consists only of CODIS loci, then all other genomic loci will be absent from the sample. Therefore, the attempt to amplify any
  • any non-CODIS loci will be appropriate for the authentication purpose. If, however, the set of additional loci is meant not only for DNA authentication but also for DNA profiling, then the usual guidelines for selection of profiling loci (e.g. polymorphic in the human population, having relatively low mutation rates, neutral, non-phenotypic, each locus present on a separate chromosome) may be employed.
  • profiling loci e.g. polymorphic in the human population, having relatively low mutation rates, neutral, non-phenotypic, each locus present on a separate chromosome
  • the presence or absence of a set of genomic loci is determined, for example, by one of the following methods: a. Amplifying each locus in the set of loci by PCR and detecting the presence of amplification products by gel or capillary electrophoresis; b. Amplifying the locus by real-time PCR and detecting the presence of amplification products.
  • the real-time software compares the fluorescence of the sample to that of the reference sample(s) and determines at each cycle whether each PCR amplicon is present, and if so, it's amount. If at the end of 40 cycles of real-time PCR no presence is detected it is concluded that the locus is absent.
  • the ratio of present loci / total analyzed loci for the entire set of analyzed loci is calculated.
  • at least one CODIS STR locus is amplified using the same method used for the amplification of the set of analyzed loci.
  • a predetermined threshold level e.g. 1
  • amplification methods can be used to amplify DNA loci, including PCR [5], transcription based amplification [7] and strand displacement amplification (SDA) [8].
  • the nucleic acid sample is subjected to PCR amplification using primer pairs specific to each locus in the set.
  • the following PCR amplification method can be used to amplify the DNA loci: i. providing a nucleic acid template (e.g.
  • a PCR reaction mixture comprising one or more primers, polymerase such as Taq polymerase or another DNA polymerase with a temperature optimum at around 70°C, Deoxynucleotide triphosphates (dNTPs), and a buffer solution, providing a suitable chemical environment for stability of the DNA polymerase, ii. performing an initialization step iii. performing a denaturation step iv. performing an annealing step v. performing an elongation step vi. repeating steps iii to v 20 to 40 times, preferably 30 to 35 times, vii. performing a final elongation step viii. running the PCR product on a an electrophoresis gel ix. analyzing the signal obtained from said PCR product.
  • polymerase such as Taq polymerase or another DNA polymerase with a temperature optimum at around 70°C
  • dNTPs Deoxynucleotide triphosphates
  • a buffer solution providing a suitable chemical environment for stability
  • In vivo generated DNA generally has a smaller representation bias in relation to in vitro generated DNA.
  • each genomic locus In the native DNA that is found in the cells of organisms each genomic locus is represented exactly once per haploid genome.
  • the strict control of copy numbers of genomic loci is achieved by enzymatic mechanisms that monitor the fidelity the DNA replication process. These mechanisms are not present in in vitro generated DNA, leading to preferential amplification of some loci, resulting in a significantly larger representation bias.
  • analysis of the representation bias can be used for determining whether a nucleic acid in a biological sample containing nucleic acids was generated in vitro or in vivo. For example by the following method:
  • RCN Relative Copy Number
  • This may be performed by, but is not limited to, any of the following methods: a. Real-time PCR; b. PCR followed by quantification of PCR products by gel electrophoresis or by capillary electrophoresis. If capillary electrophoresis is used, either PCR product peak heights and/or peak areas may be used for quantification. c. Hybridization to sequences complementary to the tested loci (e.g. using a DNA microarray).
  • RBV Representation Bias Value
  • a. RBV ratio between the maximal and minimal RCN values obtained in step 2
  • b. RBV ratio between the standard deviation and the mean of all the RCN values obtained in step 2
  • d If the analysis method used in step 2 is able to differentiate between the relative copy numbers of both alleles of a single heterozygous locus (e.g.
  • the linear regression may be calculated for example using the Least Squares method [13] Calculating the linear regression allows for correction of the "ski-slope" effect which is seen in some capillary electrophoresis histograms as a result of sample overload, DNA degradation and other factors, and which causes the smaller amplicons to be amplified preferentially over larger amplicons. Since different fluorescent dyes have different intensities, the linear regression may be calculated separately for each dye.
  • the likelihood parameter may be calculated by one of the following non-limiting options:
  • the likelihood parameter is equal to the maximum of the following two values: (1) the fraction of database elements with RBV equal to or greater than the value obtained for the test sample in step 3, and (2) Mn, where n is the number of database elements
  • the likelihood parameter is equal to the probability of a random sampling from the normal distribution having a value that is equal to or greater than the value of the test sample, obtained in step 3.
  • This likelihood is equal to the value of the complementary cumulative distribution of the normal function, and can be calculated by the following formula: where JC is the value obtained for the case sample, ⁇ and ⁇ are the mean and standard deviation (respectively) of the normal distribution, andp is the obtained likelihood value;
  • step 5 Determining whether the test sample was generated in vitro or in vivo by either of the following: a. If the likelihood parameter obtained in step 4 is smaller than a predetermined threshold (e.g. 0.05) then conclude that DNA from then conclude that the test sample was generated in vitro, otherwise conclude that it was generated in vivo. b. Perform steps 1-4 on a reference sample (e.g. from a suspect with a similar profile), calculate the ratio between the likelihood parameter of the test sample and the likelihood parameter of the reference sample. If this ratio is smaller than a predefined threshold (e.g. 0.5), conclude that the test sample was generated in vitro, otherwise conclude that the test sample was generated in vivo.
  • a predetermined threshold e.g. 0.05
  • the likelihood parameter may be much smaller than the threshold indicated above, e.g. under 0.01, or under 0.005.
  • this method can be performed on capillary electrophoresis histograms obtained by standard profiling kits (e.g. Identifiler). In such cases, the above method should start in step 3.
  • the loci used for representation bias analysis may be chosen as follows:
  • the analysis may be performed on a set of STR loci used for DNA profiling, such as the SGM+ or Identifier loci. In accordance with the above, analysis is performed on the same capillary electrophoresis histogram that is used for profiling.
  • the set can include loci that are under-represented loci in Multiple Displacement Amplification [MDA]-based WGA, e.g. in telomere or centromere regions of chromosomes, and other normal/over- represented loci.
  • MDA Multiple Displacement Amplification
  • the set can include the vWA locus (over- represented in WGA).
  • loci should be selected such that they are well separated, preferable residing on separate chromosomes
  • PCR stutter is an artifact produced during a PCR reaction
  • profiling of DNA that was generated in vitro by PCR, or by a PCR-based WGA method will have increased stutter in relation to profiling of in vivo generated DNA. This is because hi the former case two PCR reactions (one of the in vitro generation of DNA and one of the DNA profiling ) are involved, while in the latter case there is only one (the DNA profiling) PCR reaction.
  • the determination whether nucleic acids in a biological sample were generated in vitro or in vivo can be performed based on analysis of PCR stutter, for example, as follows:
  • test sample Subjecting the test sample to PCR analysis using primers specific to selected genetic loci;
  • the capillary electrophoresis machine records the raw data in the form of pairs of numbers. Each pair contains an X coordinate, which records the time point, and hence is correlated to the length of the DNA, and a Y coordinate, which records the intensity of fluorescence, and hence is correlated to the quantity of DNA). 3.
  • the raw data is processed for detection of alleles and stutter peaks by either: i. Standard capillary electrophoresis analysis software (e.g.
  • a local maximum is a point (X Y); in which the Y value is greater than the Y value of both the previous (i- 1) data pair and the next (i+1) data pair (optionally use a smoothing method in order to reduce the number of maxima).
  • the peak height as the Y value of the peak.
  • the peak size as the X value of the peak.
  • the maximum expected stutter value represents the highest fraction of a stutter band that can be expected in in vivo generated DNA.
  • the maximum expected stutter value is determined empirically based on multiple capillary electrophoresis runs of different samples and is different for each locus. (For example, for the D3S1358 locus, the maximum allowed stutter value in the GeneMapper software is 0.11).
  • stutter fractions Calculate the size of the -1 stutter fraction, defined as the height of the -1 stutter peak divided by the height of its associated allele peak. Alternatively, the stutter fraction is defined as the area of the -1 stutter peak divided by the area of its associated allele peak.
  • the likelihood parameter is equal to the maximum of the following two values: (1) the fraction of said database elements (corresponding to the same allele) with -1 stutter fraction values equal to or greater than the value obtained for the test sample in step 6, and (2) Mn, where n is the number of said database elements (corresponding to the analyzed allele) b.
  • Mn Mn, where n is the number of said database elements (corresponding to the analyzed allele) b.
  • the likelihood parameter is equal to the probability of a random sampling from the said normal distribution having a value that is equal to or greater than the value obtained for the test sample in step 6.
  • This likelihood is equal to the value of the complementary cumulative distribution of the normal function, and can be calculated by the following formula: where x is the value obtained for the test sample, ⁇ and ⁇ are the mean and standard deviation (respectively) of the normal distribution, snap is the obtained likelihood parameter value;
  • step 7 For the entire set of likelihood parameters obtained in step 7, calculating the "joint likelihood value" of the test sample, which is correlated to the likelihood that the DNA in the test sample was generated in vivo.
  • a non-limiting example of how to calculate this value is by the Fisher's combined probability test, which combines the results from a variety of independent tests into one test statistic (Jr) having a chi-square distribution using the formula:
  • t l where k is the number likelihood parameters, and Pi are the likelihood parameters obtained in step 7.
  • the p- value for X 2 itself can be interpolated from the chi-square table using 2k degrees of freedom. Such a table is available for example in [12].
  • the compute/; value is the joint likelihood value. 9. Determining whether the test sample was generated in vitro or in vivo by either of the following: i. If the joint likelihood value obtained in step 8 is smaller than a predetermined threshold (e.g. 0.05), conclude that the DNA from the test sample was generated in vitro, otherwise conclude that it was generated in vivo. ii. Perform steps 1-8 on a reference sample (e.g.
  • the method can be performed using the +1 stutter instead of the -1 stutter.
  • the joint likelihood value may be much smaller than the threshold indicated above, e.g. under 0.01, or under 0.005.
  • this method can be performed on capillary electrophoresis histograms obtained by standard profiling kits (e.g. Identif ⁇ ler). In such cases, the above method should start in step 3.
  • In vitro generated DNA can be detected by the presence of non-genomic sequences obtained from the biological sample.
  • the non-genomic sequences may include primer dimers (in DNA generated by PCR-based methods), plasmid sequences (in DNA generated by cloning methods), non-genomic sequences ligated to ends of genomic sequences (e.g. in ligation-mediated PCR).
  • the presence of such non-genomic sequences can be detected by assays which are well-known in the art, for example, by cloning of the nucleic acids from the test sample into bacteria, and sequencing the cloned molecules.
  • Non-degraded, in vivo generated DNA that is extracted from biological samples by standard procedures consists of a distribution of fragments of varying lengths, from about 500 base pairs (bps) up to more than 10,000 bps.
  • DNA generated in vitro may consist of either small fragments only (e.g. DNA generated by PCR), or fragments with a relatively uniform size distribution (e.g. cloned DNA).
  • the distribution of fragment lengths may be determined by the following method:
  • determining the distribution of fragment lengths i.e. amount of DNA as a function of fragment size. This can be performed by a variety of commercial software programs (e.g. TotalLab of BioSystematica).
  • a If the DNA in the test sample does not contain fragments larger than 10 l ⁇ lobases, conclude that the DNA of the test sample was generated in vitro, otherwise conclude that the DNA of the test sample was generated in vivo.
  • b Comparing both distributions obtained in step 3 using a statistical test which determines whether both distributions represent two random samplings from the same source distributions (e.g. by performing the Kolmogorov-Smirnov two sample goodness-of-fit hypothesis test [14]. If the analysis shows that the probability that both distributions represent random samplings from the same source distributions is less than a predefined threshold (e.g. 0.05), conclude that the DNA of the test sample was generated in vitro, otherwise conclude that the DNA was generated in vivo.
  • a predefined threshold e.g. 0.05
  • RNA in the biological sample c. g. Detection of RNA in the biological sample
  • RNA transcripts are highly transcribed housekeeping genes (e.g. SDHA) are likely to be found the biological sample if it partially degraded.
  • RNA is detected, it can be concluded that the DNA in the sample was generated in vivo, if RNA is not detected, it can be concluded that the DNA in the sample was generated in vitro.
  • RNA in the sample may be detected by assays which are well known in the art, for example by RT-PCR (reverse-transcriptase PCR) on a specific locus.
  • assays which are well known in the art, for example by RT-PCR (reverse-transcriptase PCR) on a specific locus.
  • RNA will most likely not be compatible with the in vitro generated DNA that is found in the sample. This incompatibility can be detected by genotyping a set of transcribed STRs (e.g. RT-PCR followed by capillary electrophoresis).
  • a 'fake' blood sample is a blood sample in which the nucleic acids were generated in vitro.
  • Example 1 Demonstration of a CODIS profile obtained from a fake biological sample
  • Sample 1 A dry blood stain on a cotton fabric, prepared from lO ⁇ l of venous blood from individual (A) that was dispensed on the fabric. This sample contains "real", in vivo generated, DNA (Fig 2A).
  • Sample 2 A dry blood stain on a cotton fabric, prepared from lO ⁇ l of venous blood from individual (B) that was dispensed on the fabric. This sample contains "real”, in vivo generated DNA (Fig 2B).
  • Sample 3 A dry blood stain on cotton composed of red blood cells from individual (A) mixed with in vitro generated DNA that was amplified from the DNA of individual (B). This sample contains only "fake”, in vitro generated, DNA, because red blood cells are not nucleated and therefore contain no genomic DNA (Fig 2C).
  • Sample 3 was prepared as follows:
  • Red blood cells were isolated from the bottom phase of the fractionated blood from individual (A), following centrifugation at 150Og for 10 minutes.
  • Genomic DNA from individual (B) was extracted from a saliva stain on tissue paper by organic extraction according to a published protocol [10]. Ten nanograms of the extracted DNA were used as template for in vitro multiple displacement amplification with the Repli-G kit (Qiagen), yielding 10 ⁇ g of in vitro generated DNA. The generated DNA includes copies of all genomic loci.
  • DNA was extracted from all bloodstain samples by organic extraction according to a published protocol [10] and quantified in real time PCR using the Quantifiler kit (Applied Biosystems).
  • Profiling was performed on Ing DNA extracted from each sample. Multiplex PCR of CODIS loci was performed in 50 ⁇ l total reaction volume in a GeneAmp PCR system 9700-GOLD (Applied Biosystems) using the ProfilerPlus kit (Applied Biosystems). Amplified products were separated on an ABIPRISM 310 Genetic Analyzer capillary electrophoresis machine, and analyzed using the GeneMapperID-X 1.1 software (Applied Biosystems).
  • the profiles of all samples are depicted in figure 2.
  • the profile of sample 3 (the "fake” sample; Fig 2C) is identical to the profile of sample 2 (Fig 2B), and does not contain any additional alleles that are found in sample 1 (Fig 2A, which corresponds to the human origin of the red blood cells used in sample 3).
  • the software also verifies for all alleles that the peak heights are within the limits of reasonable minimum and maximum values.
  • the software outputs its analysis in the form of a colored bar above each locus, whereby a green bar indicates a "perfect” score, and yellow and red bars indicate scores that are "imperfect” to various degrees.
  • the software also outputs a similar color coded score for the entire profile.
  • the profile of sample 3 is "perfect". This demonstrates that "perfect" profiles can be obtained from biological samples that were forged using simple techniques.
  • Example 2 Demonstration of a procedure for DNA authentication based on analysis of methylation in HpaII digested DNA
  • sample 2 ("real" sample from individual B)
  • sample 3 "fake” sample containing red blood cells of individual A and in vitro generated DNA copied from the DNA of individual B).
  • Figure 3 depicts a DNA authentication procedure based on analysis of methylation in HpaII digested DNA, as exemplified below.
  • HpaII is a methylation-sensitive restriction endonuclease that specifically recognizes and cleaves the sequence CCGG only if it is unmethylated.
  • the digestion reaction was performed in 20 ⁇ l total reaction volume, including IOng of DNA template, 10 units of HpaII (New England Biolabs), and 2 ⁇ l of 1OX buffer 4 (New England Biolabs). Digestion was performed at 37 0 C for one hour, followed by heat inactivation of the enzyme by incubation at 65 0 C for 20 minutes.
  • each sample was divided into 5 aliquots and amplified by PCR (one PCR performed for each aliquot) at 5 genomic loci - CMl 5 CM2 (constitutively methylated loci, Primer sequences are in Table I) 5 CUl, CU2 (constitutively unmethylated loci, Primer sequences are in Table 2), and REFl (reference locus, Primer sequences are in Table 3).
  • PCR was performed in the GeneAmp PCR system 9700-GOLD (Applied Biosystems) machine in a total reaction volume of 50 ⁇ l. The PCR program consisted of 28 cycles, and all forward primers were labeled with a fluorescent dye (NED).
  • Example 3 Demonstration of a procedure for DNA authentication based on capillary electrophoresis
  • the profile of a DNA sample is obtained by performing the following steps: (i) performing multiplex PCR (with fluorescent primers), (ii) running the amplified PCR products on a capillary electrophoresis machine, and (iii) analyzing the obtained capillary electrophoresis histogram.
  • Various DNA profiling kits are currently available, including SGM+, PowerPlexl ⁇ , ProfilerPlus, CoFiler, and others.
  • DNA authentication may also be performed based on analysis of a capillary electrophoresis histogram.
  • a single histogram that contains the authentication and profiling data is contained in a single computer file. According to this procedure, DNA authentication and profiling can be performed simultaneously.
  • the PCR for DNA profiling and the PCR for DNA authentication are performed separately, but their amplified products are joined together into a single capillary electrophoresis run. This option was employed in Example 2.
  • DNA from a biological sample is divided into two aliquots.
  • One aliquot is used for the biochemical step of the standard DNA profiling procedure (multiplex PCR on CODIS loci).
  • the other aliquot is used for the biochemical step of the DNA authentication procedure.
  • the products of both biochemical steps are combined into a single tube and run on a capillary electrophoresis machine.
  • the resulting histogram is analyzed by a signal analysis software which performs both profiling and authentication.
  • the DNA profiling and DNA authentication are performed in a single multiplex PCR reaction and in a single capillary electrophoresis run.
  • STR loci that are found in kits such as CoFiler, ProfilerPlus, Identifiler, SGM+ and PowerPlexl ⁇ do not contain a HpaII site, and therefore a joint PCR reaction, amplifying STR loci from one of the above kits and different STR loci for DNA authentication will succeed in amplifying all profiling and authentication loci.
  • test sample e.g. cigarette butt, blood-stain, saliva
  • DNA is extracted from the test sample (e.g. using organic extraction or Chelex)
  • DNA obtained in step 2 is quantified (e.g. using real-time PCR)
  • Another 0.5-2ng of the DNA sample are used for the authentication DNA procedure, in one of the following options: a. Analysis of DNA methylation based on methylation-sensitive endonuclease digestion: i. Subjecting the DNA from a test sample to digestion with one or more methylation-sensitive endonucleases (e.g. Hpall, Hhal) ii. Perform multiplex PCR on a set of loci including one or more restriction sites corresponding to the endonucleases used in (i) b. Analysis of genomic loci that are not part of DNA profiling: i. Perform multiplex PCR on a set of loci that are not part of DNA profiling
  • step 6 The amplified PCR products obtained in step 4 and 5 are combined and run in a single capillary electrophoresis reaction.
  • step 6 The capillary electrophoresis histogram obtained is step 6 is conceptually divided into two sections, one corresponding to authentication data, and the other corresponding to profiling data.
  • the capillary electrophoresis histogram section corresponding to authentication data is analyzed.
  • test sample e.g. cigarette butt, blood-stain, saliva
  • DNA is extracted from the test sample (e.g. using organic extraction or Chelex).
  • DNA obtained in step 2 is quantified (e.g. using real-time PCR).
  • step 6 The amplified PCR products obtained in step 5 are run in a capillary electrophoresis reaction
  • step 6 The capillary electrophoresis histogram obtained is step 6 is conceptually divided into two sections, one corresponding to authentication data, and the other corresponding to profiling data
  • the capillary electrophoresis histogram section corresponding to authentication data is analyzed
  • This example illustrates calculation of representation bias based on a linear regression of capillary electrophoresis histogram peaks.
  • linear regressions dashed lines
  • 6A, 6B linear regressions
  • 6C, 6D in vitro-generated DNA samples
  • Bar plots show the degree of deviation of each peak.
  • the deviation of peak #3 in the in vitro generated DNA sample is 64%, as can be seen in the corresponding bar (see arrow).
  • the representation bias of a sample is the mean of all deviations. In vivo generated DNA samples are expected to have significantly lower representation bias values than in vitro generated DNA samples.

Abstract

The present invention provides methods for verifying the authenticity of biological samples containing nucleic acid molecules. The methods enable to distinguish between in vitro generated DNA and in vivo generated DNA and can be used in forensics to assure that DNA profiles produced from crime scene samples are genuine. The methods employ an array of nucleic acid based procedures for verifying the authenticity of a DNA sample such as polymerase chain reaction, sodium bisulfite treatment, and methylation-sensitive endonuclease digestion. The invention further provides kits for verifying the authenticity of biological samples containing nucleic acids employing the methods and reagents described in the invention.

Description

METHODSFORDNAAUTHENTICATION
FIELD OF THE INVENTION
The invention relates to methods for verifying the authenticity of a DNA sample. In particular, the invention relates to methods for determining whether nucleic acids, specifically DNA in a biological sample were generated in vitro or in vivo.
REFERENCES:
[I] Telenius H, Carter NP, Bebb CE, Nordenskjold M, and Ponder BA, Tunnacliffe A. Genomics 1992, 13(3): 718-725.
[2] Zhang L, Cui X, Schmitt K, Hubert R3 Navidi W, Arnheim N. Proc Natl
AcadSci USA 1992, 89(13):5847-5851.
[3]. Saunders RD, Glover DM, Ashburner M5 Siden-Kiamos I, Louis C5
Monastirioti M, Savakis C, Kafatos F. Nucleic Acids Res 1989, 17(22):9027-
9037.
[4] Dean FB, Hosono S, Fang L, Wu X, Faruqi AF, Bray-Ward P5 Sun Z3 Zong
Q, Du Y, Du J etal Proc Natl Acad Sd USA. 2002, 99(8):5261-5266.
[5] Saiki RK3 Scharf S5 Falooba F5 Mullis KB, Jorn GT5 Horn GT3 Erlich HA,
Arnheim N. Science. 1985, 230: 1350-1354.
[6] Olejniczak M3 Krzyzosiak WI5 Electrophoresis. 2006 Oct; 27(19):3724-34.
[7] Kwoh, D. Y.5 Kwoh, T. J. Am BiotechnolLab. 1990, 8(13):14-25.
[8] Walker GT5 Little MC5 Nadeau JG5 Shank DD. Proc Natl Acad Sci U SA.
1992 1; 89(l):392-6.
[9] Eckhardt F et al: DNA raethylation profiling of human chromosomes 6,
20 and 22. Nature Genetics 2006, 38:1359-1360
[10] Sambrook L5 Fritsch EF3 Maniatis T. (1989) Molecular cloning: a laboratory manual, 2nd edn. Cold Spring Harbor, New York
[I I] Wang, G., E. Maher, C. Brennan, L. Chin, C. Leo, M. Kaur, P. Zhu, M. Rook, J. L. Wolfe, and G. M. Makrigiorgos. 2004. DNA amplification method tolerant to sample degradation. Genome Res. 14:2357-2366 [12] "Primer of biostatistics" by Stanton A. Glantz, McGraw-Hill Medical; 6 edition (April 15, 2005); Table 5-7, pages 156-7.
[13] "Linear Regression (Lecture Notes in Statistics)" (VoI 175) section 2.2, pages 36-47 by Jϋrgen Groβ Springer, 1 edition (September 10, 2003).
[14] Chakravarti, Laha, and Roy, (1967). Handbook of Methods of Applied
Statistics, Volume I, John Wiley and Sons, pp. 392-394.
BACKGROUND OF THE INVENTION
DNA profiling uses a variety of techniques to distinguish between individuals of the same species using only samples of their DNA. Two humans will have the vast majority of their DNA sequence in common. DNA profiling exploits highly variable repeating sequences called short tandem repeats (STRs). Two unrelated humans will be unlikely to have the same numbers of tandem repeats at a given locus. In STR profiling, PCR is used to obtain enough DNA to amplify the number of repeats at several loci. It is possible to establish a match that is extremely unlikely to have arisen by coincidence, except in the case of identical twins, who will have identical genetic profiles.
DNA profiling is used in forensic science, to match suspects to samples of blood, hair, saliva, semen, etc. It has also led to several exonerations of formerly convicted suspects. It is also used in such applications as identifying human remains, paternity testing, matching organ donors, studying populations of wild animals, and establishing the province or composition of foods. It has also been used to generate hypotheses on the pattern of the human diaspora in prehistoric times.
Testing is subject to the legal code of the jurisdiction in which it is performed. Usually the testing is voluntary, but it can be made compulsory by such instruments as a search warrant or court order. Several jurisdictions have also begun to assemble databases containing DNA information of convicts. The United States maintains the largest DNA database in the world: The Combined DNA Index System (CODIS), with over 4.5 million records as of 2007. The United Kingdom, maintains the National DNA Database (NDNAD), which is of similar size. The size of this database, and its rate of growth, is giving concern to civil liberties groups in the UK, where police have wide- ranging powers to take samples and retain them even in the event of acquittal. BRIEF DESCRIPTION OF THE DRAWINGS
In order to understand the invention and to see how it may be carried out in practice, embodiments will now be described, by way of non-limiting example only, with reference to the accompanying figures. In the figures, identical and similar structures, elements or parts thereof that appear in more than one figure are generally labeled with the same or similar references in the figures in which they appear. Dimensions of components and features shown in the figures are chosen primarily for convenience and clarity of presentation and are not necessarily to scale. The attached figures are:
Fig 1 demonstrates a general scheme of the DNA authentication procedure.
Fig 2A-C demonstrates DNA profiles of "real" and "fake" mock forensic samples. Fig 2 A (1-3) shows the DNA profile that was obtained from sample 1 (genuine blood sample of individual A on cotton). Fig 2B (1-3) shows the DNA profile that was obtained from sample 2 (genuine blood sample of individual B on cotton). Fig 2C (1-3) shows the DNA profile that was obtained from sample 3 (fake blood sample on cotton, composed of red blood cells of individual A mixed with in vitro generated copies of DNA from individual B).
Fig 3 demonstrates a specific implementation of the DNA authentication procedure, based on analysis of methylation of HpaII digested DNA.
Fig 4 A demonstrates a joint DNA profiling and authentication scheme.
Fig 4B depicts a scheme of a joint DNA profiling and authentication procedure employing an HpaII based methylation assay. The left portion of the output histogram contains authentication loci and the right portion of the output histogram contains profiling loci. Color-coded bars are depicted above each analyzed locus. Bars in the authentication region represent results that indicate that the DNA sample was generated in vivo.
Fig 5 depicts examples of DNA profiles combined with results of DNA authentication for the capillary electrophoresis histograms of samples 2 and 3. - A -
Fig 6A-D demonstrates the calculation of the representation bias based on a linear regression of capillary electrophoresis histogram peaks. 6 A and 6B represent in vivo generated DNA, and 6C, 6D represents in vitro generated DNA.
SUMMARY OF THE INVENTION
Current techniques used by forensic laboratories cannot distinguish between "real" biological samples containing in vivo generated DNA, and "fake" biological samples containing in vitro generated DNA, which can be produced very easily with basic lab techniques at little financial expense.
Therefore, there is a need to develop methods for distinguishing in vitro generated DNA from in vivo generated DNA in order to mitigate attempts for forging DNA-based identification procedures. For example, DNA profiles from crime scenes are used as evidence in court of law for indictment, therefore the assurance that such profiles are genuine is of utmost importance. The inventors of the present application have developed a method for distinguishing a "fake", in-vitro generated DNA sample from a genuine, native DNA sample.
There are several different methods for generating DNA in vitro, such that this DNA will produce a desired DNA profile upon profiling with currently existing procedures. In vitro generated DNA can be produced such, that upon DNA profiling, it will produce a DNA profile that is indistinguishable by current methods from the profile of native DNA. Furthermore, in vitro generated DNA can be produced such that it will reproduce any specific, desired DNA profile. Producing such in vitro generated DNA requires only the use of basic lab equipment, standard lab techniques, can be performed very quickly, and with little financial expense. It should also be noted that producing in vitro generated DNA does not necessitate obtaining a real source for the duplicated DNA. For example, alleles can be amplified or cloned from other sources and assembled to create any desired profile.
Such in vitro-generated DNA can be planted in crime scenes and thus incriminate any person with a known DNA profile. Planting of such "fake" DNA in crime scenes can be performed easily, and can be incorporated into genuine human tissues by mixing in vitro generated DNA with tissues (e.g. blood, sperm, saliva, etc.) from any person. If such a planted tissue is not treated for destruction of the native DNA (e.g. by UV irradiation), or if the quantity of in vitro DNA is much larger than that of the native DNA, the DNA profile that will be extracted by existing methods from such a tissue will appear as a homogeneous sample consisting of the in vitro generated DNA only.
Accordingly, by a first of its aspects the present invention provides a method for verifying the authenticity of nucleic acid molecules employed in nucleic-acid based analysis procedures, the method comprising:
(a) obtaining nucleic acids; and
(b) conducting an analysis on said nucleic acids in order to determine whether said nucleic acids were generated in vivo or in vitro; wherein the determination that said nucleic acids were generated in vitro is indicative that said nucleic acids are not authentic, and wherein the determination that said nucleic acids were generated in vivo is indicative that said nucleic acids are authentic.
The present invention further provides a method for verifying the authenticity of biological samples containing nucleic acid molecules employed in nucleic-acid based analysis procedures, the method comprising:
(a) obtaining nucleic acids, wherein said nucleic acids were obtained from a biological sample ; and
(b) conducting an analysis on said nucleic acids in order to determine whether said nucleic acids were generated in vivo or in vitro; wherein the determination that said nucleic acids were generated in vitro is indicative that said sample is not authentic, and wherein the determination that said nucleic acids were generated in vivo is indicative that said nucleic acids are authentic.
By yet another aspect, the present invention provides a method for verifying the authenticity of nucleic acids employed in nucleic-acid based analysis procedures, the method comprising:
(a) obtaining a capillary electrophoresis histogram of amplified nucleic acids; and
(b) conducting an analysis on said histogram in order to determine whether said nucleic acids were generated in vivo or in vitro; wherein the determination that said nucleic acids were generated in vitro is indicative that said nucleic acids are not authentic, and wherein the determination that said nucleic acids were generated in vivo is indicative that said nucleic acids are authentic.
The present invention also provides a method for verifying the authenticity of biological samples containing nucleic acid molecules employed in nucleic-acid based analysis procedures, the method comprising:
(a) obtaining a capillary electrophoresis histogram of amplified nucleic acids isolated from said biological sample; and
(b) conducting an analysis on said histogram in order to determine whether said nucleic acids were generated in vivo or in vitro; wherein the determination that said nucleic acids were generated in vitro is indicative that said sample is not authentic, and wherein the determination that said nucleic acids were generated in vivo is indicative that said nucleic acids are authentic.
In certain embodiments, the authenticity of said nucleic acids or said sample is determined by subjecting the nucleic acid molecules of a test sample to at least one procedure selected from the group consisting of:
(a) analyzing the methylation pattern of said nucleic acids and determining whether the methylation pattern of said nucleic acids is consistent with in vivo generation of said nucleic acids or consistent with in vitro generation of said nucleic acids, wherein consistency with in vivo generation is indicative that said nucleic acids are authentic, and wherein consistency with in vitro generation is indicative that said nucleic acids are not authentic.
(b) amplifying a set of loci from said nucleic acids and determining whether the amplification pattern of said loci is consistent with in vivo generation of said nucleic acids or consistent with in vitro generation of said nucleic acids, wherein consistency with in vivo generation is indicative that said nucleic acids are authentic, and wherein consistency with in vitro generation is indicative that said nucleic acids are not authentic.
(c) calculating the representation bias in said nucleic acids and determining whether the representation bias of said nucleic acids is consistent with in vivo generation of said nucleic acids or consistent with in vitro generation of said nucleic acids, wherein consistency with in vivo generation is indicative that said nucleic acids are authentic, and wherein consistency with in vitro generation is indicative that said nucleic acids are not authentic.
(d) calculating the amount of PCR stutter of said nucleic acids and determining whether the pattern of PCR stutter of said nucleic acids is consistent with in vivo generation of said nucleic acids or consistent with in vitro generation of said nucleic acids, wherein consistency with in vivo generation is indicative that said nucleic acids are authentic, and wherein consistency with in viti-o generation is indicative that said nucleic acids are not authentic.
(e) screening for the presence of non-genomic sequences in said nucleic acids, wherein the absence of non-genomic sequences in said nucleic acids is indicative that said nucleic acids are authentic, and wherein the presence of non-genomic sequences in said nucleic acids is indicative that said nucleic acids are not authentic.
(f) analyzing the distribution of nucleic acid fragment lengths in said nucleic acids; and determining whether said distribution is consistent with in vivo generation of said nucleic acids or consistent with in vitro generation of said nucleic acids, wherein said consistency with in vivo generation is indicative that said nucleic acids are authentic, and wherein consistency with in vitro generation is indicative that said nucleic acids are not authentic.
(g) screening for presence of RNA in said nucleic acids, wherein said presence of RNA is indicative that said nucleic acids are authentic, and wherein the absence of RNA of said nucleic acids is indicative that said nucleic acids are not authentic.
In another aspect, the present invention provides use of at least one procedure selected from the group consisting of:
(a) determining the methylation pattern of said nucleic acids,
(b) amplifying a set of loci from said nucleic acids,
(c) calculating the representation bias in said nucleic acids,
(d) calculating the amount of PCR stutter of said nucleic acids,
(e) screening for non-genomic sequences in said nucleic acids,
(f) determining the distribution of nucleic acid fragment lengths in said nucleic acids, and
(g) detecting RNA in the biological sample, for verifying the authenticity of nucleic acid molecules or a biological sample containing nucleic acids.
In one embodiment, the authenticity of said nucleic acids or said sample is determined by amplifying a set of loci from said nucleic acids, wherein said amplifying step is carried out using PCR or Restriction and Circularization-Aided Rolling Circle Amplification.
In one specific embodiment, the PCR is performed using both CODIS STR primers and non-CODIS STR primers and accordingly concurrent presence of CODIS STR PCR products and absence of non-CODIS STR PCR products in the sample is indicative that said sample is not authentic.
In another embodiment, the authenticity of said nucleic acids or said sample is determined by calculating the representation bias, said method comprising: a. defining a set of genomic loci; b. Calculating the Relative Copy Number (RCN) of each locus and/or allele in the set; c. calculating the Representation Bias Value (RBV) of the test sample; and d. calculating a likelihood parameter representing the likelihood of obtaining an RBV equal to or greater than the RBV obtained in step c in an in vivo generated DNA sample; wherein when the value of the likelihood parameter obtained in step (d) is smaller than a predefined threshold this is indicative that the nucleic acids from the test sample are not authentic, and wherein when the value of the likelihood parameter obtained in step (d) is equal to or larger than a predefined value, this is indicative that the nucleic acids from the test sample are authentic.
In yet another embodiment, the calculation of the representation bias comprises: a. defining a set of genomic loci; b. Calculating the Relative Copy Number (RCN) of each locus and/or allele in the set for a test sample and for a reference sample; c. calculating the Representation Bias Value (RBV) of the test sample; and d. calculating a likelihood parameter representing the likelihood of obtaining an RBV equal to or greater than the RBV obtained in step c in an in vivo generated DNA sample; wherein when the ratio between the value of the likelihood parameter obtained from the test sample and the value of the likelihood parameter obtained from the reference sample is smaller than a predefined value this is indicative that the nucleic acids from the test sample are not authentic, and when said ratio is equal to or larger than a predefined value, this is indicative that the nucleic acids from the test sample are authentic.
In another embodiment, the authenticity of said nucleic acids or said sample is determined by calculating the amount of PCR stutter, wherein said method comprises: a. Subjecting the test sample to PCR analysis using primers specific to selected genetic loci; b. Analyzing the PCR amplification products using capillary electrophoresis; c. processing the capillary electrophoresis data for detection of alleles and stutter peaks; d. determining the size and/or area of the -1 and/or +1 stutter fraction; e. calculating the likelihood parameters representing the likelihoods of obtaining the stutter values obtained in step d in an in vivo generated nucleic acid sample; f. calculating the joint likelihood value of the test sample, representing the likelihood that the test sample was generated in vivo; wherein when the joint likelihood value obtained in step (f) is smaller than a predefined threshold, this is indicative that the nucleic acids from the test sample are not authentic, and when the joint likelihood value obtained in step (f) is equal to or larger than a predefined value, this is indicative that the nucleic acids from the test sample are authentic.
In yet another embodiment, the calculation of the amount of PCR stutter comprises: a. Subjecting the test sample and a reference sample obtained from in vivo generated DNA to PCR analysis using primers specific to selected genetic loci; b. Analyzing the PCR amplification products using capillary electrophoresis; c. processing the capillary electrophoresis data for detection of alleles and stutter peaks; d. determining the size and/or area of the -1 and/or +1 stutter fraction; e. calculating the likelihood parameters representing the likelihoods of obtaining the stutter values obtained in step d in an in vivo generated nucleic acid sample; f. calculating the joint likelihood value of the test sample, representing the likelihood that the test sample was generated in vivo; wherein when the ratio between the value of the joint likelihood parameter obtained from the test sample in step f and the value of the joint likelihood parameter obtained from the reference sample is smaller than a predefined value, this is indicative that the nucleic acids from the test sample are not authentic, and when said ratio is equal to or larger than a predefined value, this is indicative that the nucleic acids from the test sample are authentic.
In certain embodiments, the likelihood parameter is calculated by comparison to a database or calculated by comparison to a normal distribution of corresponding values.
In another embodiment, the authenticity of said nucleic acids or said sample is verified by determination of the methylation pattern, wherein said determination is performed by analyzing a set of at least one CG loci, said analysis comprising: a. determining the methylation status of each CG locus in said set of CG loci wherein said CG loci are constitutively methylated in in vivo generated DNA; b. determining the ratio between methylated CG loci and total CG loci in said set of CG loci, c. comparing the ratio obtained in step b to a predefined threshold value, wherein a ratio lower than said threshold value is indicative that said nucleic acids are not authentic, and wherein a ratio equal to or larger than said threshold value is indicative that said nucleic acids are authentic.
In another embodiment the determination of the methylation pattern is performed by analyzing a set of at least one CG loci, said analysis comprising: a. determining the methylation status of each CG locus in said set of CG loci wherein said CG loci are constitutively methylated in in vivo generated DNA; b. determining the ratio between methylated CG loci and total CG loci in said set of CG loci, c. comparing the ratio obtained in step b to a corresponding ratio obtained from an in vitro generated reference sample, wherein a significantly larger ratio obtained from the test sample in comparison to the corresponding ratio obtained from the reference sample is indicative that said nucleic acids are authentic, and wherein the ratio obtained from the test sample is not significantly larger than the corresponding ratio obtained from the reference sample is indicative that said nucleic acids are not authentic.
In yet another embodiment the determination of the methylation pattern is performed by analyzing a set of at least one CG loci, said analysis comprising: a. determining the methylation status of each CG locus in said set of CG loci wherein said CG loci are constitutively methylated in in vivo generated DNA; b. determining the ratio between methylated CG loci and total CG loci in said set of CG loci, c. comparing the ratio obtained in step b to a corresponding ratio obtained from an in vivo generated reference sample, wherein comparable ratios of the test sample and the reference sample are indicative that the nucleic acids obtained from the test sample are authentic, and wherein non-comparable ratios of the test sample and the reference sample are indicative that the nucleic acids obtained from the test sample are not authentic.
In another embodiment the determination of the methylation pattern is performed by analyzing two sets each set comprising at least one CG loci, said analysis comprising: a. determining the methylation status of each CG locus in said two sets of CG loci wherein in the first of said sets said CG loci are constitutively methylated in in vivo generated DNA; and wherein in the second of said sets said CG loci are constitutively unmethylated in in vivo generated DNA; b. determining the ratio between methylated CG loci and total CG loci in said first set of CG loci, c. determining the ratio between unmethylated CG loci and total CG loci in said second set of CG loci, d. comparing the ratios obtained in steps b and c to predefined threshold values, wherein when both said ratios are greater than said predefined ratios, this is indicative that said nucleic acids are authentic, and wherein when at least one of said ratios is not greater than its corresponding predefined ratio, this is indicative that said nucleic acids are not authentic.
In another embodiment the determination of the methylation pattern is performed by analyzing two sets each set comprising at least one CG loci, said analysis comprising: a. determining the methylation status of each CG locus in said two sets of CG loci wherein in the first of said sets said CG loci are constitutively methylated in in vivo generated DNA; and wherein in the second of said sets said CG loci are constitutively unmethylated in in vivo generated DNA; b. determining the ratio between methylated CG loci and total CG loci in said first set of CG loci, c. determining the ratio between unmethylated CG loci and total CG loci in said second set of CG loci, d. comparing the ratio obtained in step b to a corresponding ratio obtained from an in vitro generated reference sample, and comparing the ratio obtained in step c to a predefined threshold value wherein if the ratio obtained in step b is significantly greater than the corresponding ratio obtained from the in vitro generated reference sample, and the ratio obtained in step c is greater than a predefined threshold value, this is indicative that said nucleic acids are authentic, and wherein if the ratio obtained in step b is not significantly greater than the corresponding ratio obtained from the in vitro generated reference sample, and/or the ratio obtained in step c is not greater than a predefined threshold value, this is indicative that said nucleic acids are not authentic.
In yet another embodiment the determination of the methylation pattern is performed by analyzing two sets each set comprising at least one CG loci, said analysis comprising: a. determining the methylation status of each CG locus in said two sets of CG loci wherein in the first of said sets said CG loci are constitutively methylated in in vivo generated DNA; and wherein in the second of said sets said CG loci are constitutively unmethylated in in vivo generated DNA; b. determining the ratio between methylated CG loci and total CG loci in said first set of CG loci, c. determining the ratio between unmethylated CG loci and total CG loci in said second set of CG loci, d. comparing the ratios obtained in steps b and c to corresponding ratios obtained from an in vivo generated reference sample, wherein if both ratios are comparable, this is indicative that the nucleic acids from the test sample are authentic, and wherein if at least one of the ratios is not comparable, this is indicative that the nucleic acids from the test sample are not authentic.
According to one specific embodiment, the determination of the methylation pattern is performed using bisulfite sequencing.
According to another embodiment, the determination of the methylation pattern is performed using methylation specific PCR.
According to another embodiment, the determination of the methylation pattern is performed using methylation-sensitive endonuclease digestion.
According to another embodiment, said CG loci are amplified using loci specific primers.
According to a specific embodiment, said loci specific primers are selected from the group consisting of SEQ ID NO. 1-15 (depicted in Tables 1-3).
In another embodiment, the authenticity of said nucleic acids or said sample is verified by screening for non-genomic sequences in said nucleic acids, wherein detection of either primer dimers, plasmid sequences, non-genomic sequences ligated to ends of genomic sequences, or non-genomic sequences originating from degenerate primers used in in vitro generation of the nucleic acid sample, is indicative that said nucleic acids are not authentic and wherein absence of primer dimers, plasmid sequences, non-genomic sequences ligated to ends of genomic sequences, and non- genomic sequences originating from degenerate primers used in in vitro generation of the nucleic acid sample is indicative that said nucleic acids are authentic.
In one embodiment, the presence of said non-genomic sequences is detected by a method comprising:
(a) cloning of the nucleic acids from the test sample, and
(b) sequencing the cloned molecules.
In another embodiment, the authenticity of said nucleic acids or said sample is verified by determining the distribution of nucleic acid fragment lengths in said nucleic acids, wherein said method comprises:
(a) Subjecting nucleic acids from a test sample to size fractionation; and
(b) determining the distribution of fragment lengths for said nucleic acids; wherein the absence of fragments larger than about 10 kilobases in said nucleic acids is indicative that the nucleic acids are not authentic, and wherein the presence of fragments larger than about 10 kilobases in said nucleic acids is indicative that the nucleic acids are authentic .
In another embodiment, the analysis of the distribution of nucleic acid fragment lengths in said nucleic acids comprises:
(a) Subjecting nucleic acids from a test sample and from an in vivo generated reference sample to size fractionation;
(b) determining the distribution of fragment lengths for said nucleic acids;
(c) comparing both distributions obtained in step (b); and
(d) determining the probability that both distributions represent random samplings from the same source; wherein, when said probability determined in step (d) is less than about 0.05, this is indicative that the nucleic acids from the test sample are not authentic, and wherein when said probability determined in step (d) is equal to or greater than about 0.05, this is indicative that the nucleic acids from the test sample are authentic.
In another embodiment, the authenticity of said nucleic acids or said sample which is verified by detecting RNA in said nucleic acids is performed by RT-PCR on one or more specific loci, wherein the absence of RT-PCR amplification products indicates that the nucleic acids are not authentic, and wherein the presence of RT-PCR amplification products indicates that the nucleic acids are authentic.
In accordance with the present invention, the biological sample is selected from a group consisting of: blood, saliva, hair, semen, urine, feces, skin, epidermal cell, buccal cell, and bone sample.
In one specific embodiment, the methods of verification of authenticity, in accordance with the present invention, are carried out for forensic uses.
In one specific embodiment said nucleic acids are from a human source.
In another specific embodiment, said nucleic acids are genomic DNA, cDNA, hnRNA, mRNA, rRNA, tRNA, fragmented nucleic acids, nucleic acids obtained from sub cellular organelles.
In one specific embodiment, said nucleic acids are DNA.
In another aspect, the present invention provides a kit for verifying the authenticity of nucleic acids or a biological sample containing nucleic acids, wherein the kit comprises: i. reagents for carrying out at least one procedure selected from the group consisting of:
(a) determining the methylation pattern of said nucleic acids,
(b) amplifying a set of loci from said nucleic acids,
(c) calculating the representation bias in said nucleic acids,
(d) calculating the amount of PCR stutter of said nucleic acids,
(e) screening for non-genomic sequences in said nucleic acids,
(f) determining the distribution of nucleic acid fragment lengths in said nucleic acids,
(g) detecting RNA in the biological sample, and ii. instructions for using the kit for verifying the authenticity of said nucleic acids and/or biological sample.
Detailed Description
Methods for DNA fingerprinting include Restriction fragment length polymorphism (RFLP), Amplified fragment length polymorphism, short tandem repeat (STR) analysis. To date, most STR based platforms used for genotyping DNA samples obtained from hair, semen etc., found in crime scenes, employ a panel of STR markers, such as CODIS (Combined DNA Index System), for DNA profiling of a subject. These methods, although robust and relatively mistake-proof, cannot differentiate between an in-vivo DNA sample found at the scene of the crime and an in-vitro generated DNA sample that was produced, for example, using PCR, cloning, or Whole Genome Amplification (WGA), and implanted at the crime scene. The possibility to generate DNA in-vitro by several different techniques enables a rather straightforward and easy "implanting" of DNA at a crime scene by criminals who wish to incriminate an individual or exonerate themselves. Moreover, the DNA profile obtained from such in- vitro generated DNA is indistinguishable from the profile of native DNA by methods known in the art so that in vitro generated DNA can be produced such that it will reproduce any specific DNA profile to be implanted in crime scenes and thus incriminate any person with a known DNA profile. Since DNA profiles from crime scenes are used as evidence in court of law for indictment, there is a need to develop methods for distinguishing in vitro generated DNA from in vivo generated DNA.
STR analysis is the most prevalent method of DNA fingerprinting used today. This method uses highly polymorphic regions that have short repeated sequences of DNA (the most common is a 4 bases repeat, but there are other lengths in use, including 5 bases). Because different people have different numbers of repeat units, these regions of DNA can be used to discriminate between individuals. These STR loci (genomic locations) are targeted with sequence-specific primers and are amplified using PCR. The DNA fragments that result are then separated and detected using electrophoresis. There are two common methods of separation and detection - Capillary Electrophoresis (CE) and gel electrophoresis.
The polymorphisms displayed at each STR region are by themselves very common, typically each polymorphism is shared by around 5 - 20% of individuals. When looking at multiple loci, it is the unique combination of these polymorphisms in an individual that makes this method discriminating as an identification tool. The more STR regions that are tested in an individual the more discriminating the test becomes.
Different STR based DNA profiling systems are in use in different countries. In North America, systems which amplify the CODIS 13 core loci are almost always used, while in the UK the SGM+ system, which is compatible with The National DNA Database is used. Whichever system is used, many of the STR regions under test are the same. These DNA profiling systems are based around multiplex reactions, whereby many STR regions are tested simultaneously.
Capillary electrophoresis is performed by electro-kinetically injecting the DNA fragments into a capillary, filled with polymer. The DNA is pulled through the tube by the application of an electric field, separating the fragments such that the smaller fragments travel faster through the capillary. The fragments are then detected using fluorescent dyes that were attached to the primers used in PCR. This allows multiple fragments to be amplified and run simultaneously, also known as multiplexing. Sizes are assigned using labeled DNA size standards that are added to each sample, and the number of repeats are determined by comparing the size to an allelic ladder, a sample that contains all of the common possible repeat sizes. Although this method is expensive, larger capacity machines with higher throughput are being used to lower the cost/sample and reduce backlogs that exist in many government crime facilities.
Gel electrophoresis acts using similar principles as CE, but instead of using a capillary, a large polyacrylamide gel is used to separate the DNA fragments. An electric field is applied, as in CE, but instead of detection being performed at a single location in the capillary, the entire gel is scanned into a computer, and all fragments are detected simultaneously. This produces an image showing all of the bands corresponding to different repeat sizes and the allelic ladder. This approach does not require the use of size standards, since the allelic ladder is run alongside the samples and serves this purpose. Visualization can either be through the use of fluorescently tagged dyes in the primers or by silver staining the gel prior to scanning.
In the U.S.A., there are 13 core loci that are currently used for discrimination in CODIS. Because these loci are independently assorted (having a certain number of repeats at one locus doesn't change the likelihood of having any number of repeats at any other locus), the product rule for probabilities can be applied. This has resulted in the ability to generate match probabilities of one in a quintillion or more.
The CODIS is the FBI-funded computer system that solves crimes by searching DNA profiles developed by federal, state, and local crime laboratories.
A record in the CODIS database, known as a CODIS profile, consists of a sample identifier, an identifier for the laboratory responsible for the profile, and the results of the DNA analysis (known as the DNA profile). Other than the DNA profile, CODIS does not contain any personal identity information - the system does not store names, dates of birth, social security numbers, etc.
In its original form, CODIS consisted of two indexes: the Convicted Offender Index and the Forensic Index. The Convicted Offender Index contains profiles of individuals convicted of crimes; state law governs which specific crimes are eligible for CODIS. The Forensic Index contains profiles developed from biological material found at crime-scenes.
In the past several years, CODIS has added several other indexes, including: an Arrestee Index, a Missing or Unidentified Persons Index, and a Missing Persons Reference Index.
CODIS has a matching algorithm that searches the various indexes against one another according to strict rules that protect personal privacy. For identifying suspects in rape and homicide cases, CODIS searches the Forensic Index against itself and against the Offender Index. A Forensic to Forensic match provides an investigative lead that connects two or more previously unlinked cases. A Forensic to Offender match actually provides a suspect for an otherwise unsolved case. It is important to note that the CODIS matching algorithm only produces a list of candidate matches. Each candidate match is confirmed or refuted by a Qualified DNA Analyst.
CODIS databases exist at the local, state, and national levels. This tiered architecture allows crime laboratories to control their own data - each laboratory decides which profiles it will share with the rest of the country. As of 2006, approximately 180 laboratories in all 50 states in the US participate in CODIS. The national level, the National DNA Index System (NDIS), are operated by the FBI at an undisclosed location
As of May 2007, 177,870 forensic profiles and 4,582,516 offender profiles have been accumulated, making it the largest DNA databank in the world, surpassing the United Kingdom's National DNA Database, which consisted of an estimated 3,976,090 profiles as of June 2007. As of the same date, CODIS has produced over 49,400 matches to requests, assisting in more than 50,343 investigations.
The growing public approval of DNA databases has seen the creation and expansion of many states' own DNA databanks. California currently maintains the third largest DNA databank in the world. Political measures such as California Proposition 69 (2004), which increased the scope of the databank, have already met with a significant increase in numbers of investigations aided.
In order to decrease the number of irrelevant matches at NDIS, the Convicted Offender Index requires all 13 CODIS STRs to be present for a profile upload. Forensic profiles only require 10 of the STRs to be present for an upload.
The CODIS profile is created by genotyping 13 STR loci, plus two additional genomic loci located on chromosomes X, Y — for determination of sex.
The CODIS profile consists of a vector of 26 numbers (representing the allelic values of the maternal and paternal alleles of the 13 STR loci), and the letters XX or XY (representing male or female). Each profile has an associated "frequency", which represents the chance for a randomly picked person to have that profile. The frequency of the profile is the product of all the individual allelic frequencies.
Specific compositions, methods, or embodiments discussed are intended to be only illustrative of the invention disclosed by this specification. Variations on these compositions, methods, or embodiments are readily apparent to a person of skill in the art based upon the teachings of this specification and are therefore intended to be included as part of the inventions disclosed herein.
Definitions
The term "forensics" or "forensic science" as used herein refers to the application of a broad spectrum of methods aimed to answer questions of identity being of interest to the legal system. For example, the identification of potential suspects whose DNA may match evidence left at crime scenes, the exoneration of persons wrongly accused of crimes, identification of crime and catastrophe victims, or establishment of paternity and other family relationships.
The term "authenticity" as used herein refers to the truthfulness of the origin of a nucleic acid. Accordingly, in the context of the present invention an authentic nucleic acid is one that was generated in vivo. The term "nucleic acidx as used herein refers to, but is not limited to, genomic DNA, cDNA, hnRNA, mRNA, rRNA, tRNA, fragmented nucleic acid, and nucleic acid obtained from sub cellular organelles such as mitochondria. In addition, nucleic acids include, but are not limited to, synthetic or in vitro transcription products.
The term "nucleic-acid based analysis procedures" as used herein refers to any identification procedure which is based on the analysis of nucleic acids, e.g. DNA profiling.
The term "in vitro generated nucleic acid" as used herein refers to, but is not limited to a nucleic acid, which is an artificial assembly ("fake DNA"), achieved by various methods. Such in vitro generated nucleic acid may be implanted in a biological sample. Some non-limiting examples of such methods are described herein below:
• Individual plasmids that contain cloned human CODIS alleles. A plasmid library consisting of several hundred plasmids contains all possible alleles of all CODIS loci.
• Nucleic acid fragments which consist of fragments amplified by PCR from plasmids or directly from human genomic DNA.
• Plasmid allele-containing inserts only, generated for example, but not only, by endonuclease cleavage of the plasmids and gel purification of the inserts.
• Whole genome amplification (WGA) of a template nucleic acid sample using a PCR-based WGA method. PCR-based WGA methods include degenerate oligonucleotide-primed (DOP) PCR [1], primer extension pre-amplification (PEP) [2], and ligation-mediated PCR [3].
• Whole genome amplification (WGA) of a template nucleic acid sample using an isothermal WGA method such as multiple displacement amplification (MDA) [4].
The term "biological sample" ('test sample1) as used herein, refers to, but is not limited to, any biological sample derived from an animal, preferably a human, and preferably a sample which contains nucleic acids. In the context of the present invention, such samples are not directly retrieved from the subject to be identified, but are collected from the environment, e.g. a crime scene or a rape victim. Examples of such samples include fluids, tissues, cell samples, organs, biopsies, etc. Most preferred samples are blood, plasma, saliva, urine, sperm, hair, etc. The biological sample can also be any of the following - blood drops, dried blood stains, dried saliva stains, dried underwear stains (e.g. stains on underwear, pads, tampons, diapers), clothing, dental floss, ear wax, electric razor clippings, gum, hair, licked envelope, nails, paraffin embedded tissue, post mortem tissue, razors, teeth, toothbrush, toothpick, dried umbilical cord. Genomic DNA can be extracted from such biological samples. The biological sample may be treated prior to its use, e.g. in order to render nucleic acids available. Techniques of cell or protein lysis, concentration or dilution of nucleic acids that may be used in the context of the present invention are known in the art.
As used herein, the term "allele" is intended to be a genetic variation associated with a segment of DNA, i.e., one of two or more alternate forms of a DNA sequence occupying the same locus.
The term "locus" (plural — loci) refers to a position on a chromosome of a gene or other chromosome marker. Locus may also mean the DNA at that position. A variant of the DNA sequence at a given locus is called an allele as denoted herein. Alleles of a locus are located at identical sites on homologous chromosomes.
The term "polymerase chain reaction (PCR)" and its associated terms - "reaction mix", "initialization" , "denaturation", "annealing" and "elongation" are known to any person of ordinary skill in the art. A non-limiting reference of the PCR method can be found in [5].
The term "Restriction and Circularization-Aided Rolling Circle Amplification (RCA-RCA)" refers to a whole genome amplification procedure (as described in 11) which retains the allelic differences among degraded amplified genomes while achieving almost complete genome coverage. RCA-RCA utilizes restriction digestion and whole genome circularization to generate genomic sequences amenable to rolling circle amplification.
The term "STR primers" as used herein refers to any commercially available or made-in-the-lab nucleotide primers that can be used to amplify a target nucleic acid sequence from a biological sample by PCR. There are -1.5 million non-CODIS STR loci. Non-limiting examples of the above are presented in the following website http://www.cstl.nist.gov/biotech/strbase/str ref.htm that currently contains 3156 references for STRs employed in science, forensics and beyond. In addition to published primer sequences, STR primers may be obtained from commercial kits for amplification of hundreds of STR loci (for example - ABI Prism Linkage Mapping Set- MDlO -Applied Biosystems), and for amplification of thousands of SNP loci (for example - Illumina BeadArray linkage mapping panel).
The term "CODIS STR primers" as used herein refers to STR primers that are designed to amplify any of the thirteen core STR loci designated by the FBI's "Combined DNA Index System", specifically, the repeated sequences of THOl, TPOX, CSFlPO5 VWA, FGA, D3S1358, D5S818, D7S820, D13S317, D16S539, D8S1179, D18S51, and D21Sll.
The term "polymerase chain reaction (PCR) stutter" as used herein refers to PCR byproducts, obtained along with the main PCR product. These "stutter" byproducts are usually shorter by multiples of the repeated unit PCR produced in the course of PCR amplification of the STR sequences. The mechanism by which these artifacts are formed is understood, but it represents an intrinsic limitation of the PCR technology and therefore no effective remedy has been found to eliminate these spurious products [6].
The term "-1 stutter" as used herein refers to a stutter byproduct that is one repeat unit smaller than its associated allele. Similarly, "+1 stutter" refers to a stutter byproduct that is one repeat unit larger than its associated allele.
The term -i stutter fraction ' refers to the height (or area) of the -1 stutter peak divided by the height (or area) of the allele peak. Similarly, "+1 stutter fraction" refers to the height (or area) of the +1 stutter peak divided by the height (or area) of the allele peak.
The term ''capillary electrophoresis histogram1 as used herein refers to a histogram obtained from capillary electrophoresis of PCR products wherein said products were amplified from genomic loci.
The term "representation bias" as used herein refers to differences in copy - number between different genomic loci in the nucleic acid sample in question.
The term 'CG locus' refers to a genomic sequence that contains one or more CG dinucleotides.
The term "constitutively-methylated" as used herein means methylated in DNA of most cells of a specific tissue type.
The term " constitutively-unmethylated" as used herein means unmethylated in DNA of most cells of a specific tissue type. The method of the present invention is illustrated in a general scheme depicted in Fig. 1.
As illustrated in Fig. 1, the input to the DNA authentication scheme in accordance with the present invention is a DNA sample isolated from a biological sample. The DNA undergoes a biochemical procedure followed by signal detection and signal analysis. The signal analysis determines whether the input DNA was generated in vivo (= Authentic) or in vitro (=Not authentic).
The authentication methods described herein may also use as input the raw data obtained in the standard DNA profiling procedure.
The method of the present invention concerns the authentication of nucleic acids which were isolated from a biological sample. For example, a blood sample found at a crime scene. The isolation of nucleic acids (e.g. DNA) from a biological sample may be achieved by various methods known in the art (e.g. see Sambrook et al, [10]) for example, by performing the following the steps:
• Chelating divalent cations such as Mg2+ and Ca2+ to stop the activity of DNase enzymes which degrade the DNA.
• Breaking open cells by grinding or sonication, and removing membrane lipids by adding a detergent.
• Removing cellular and histone proteins bound to the DNA, by adding a protease, by precipitation with sodium or ammonium acetate, or by using a phenol-chloroform extraction step.
• Precipitating DNA in cold ethanol or isopropanol, DNA is insoluble in alcohol and clings together; this step also removes salt.
• Washing the resulting DNA pellet with alcohol.
• Solubilize the DNA in a slightly alkaline buffer.
The isolation of RNA from a biological sample may be achieved by any method known in the art, e.g. as described in [10].
The determination whether the nucleic acids in a biological sample were generated in vitro or in vivo may be accomplished using various methods, including those described herein. a. Determining the methylation pattern of a nucleic acid
Methylation in the human genome occurs in the form of 5-methyl cytosine and is confined to cytosine residues that are part of the sequence CG (cytosine residues that are part of other sequences are not methylated).
Some CG dinucleotides in the human genome are methylated, and others are not. In addition, methylation is cell and tissue specific, such that a specific CG dinucleotide can be methylated in a certain cell and at the same time unmethylated in a different cell, or methylated in a certain tissue and at the same time unmethylated in different tissues. Since methylation at a specific locus can vary from cell to cell, when analyzing the methylation status of DNA extracted from a plurality of cells (e.g. from a forensic sample), the signal can be mixed, showing both the methylated and unmethylated signals in varying ratios. Therefore, when referring to the methylation status of a specific locus in DNA extracted from a plurality of cells, it should be understood that the status refers to the strongest signal, which corresponds to the methylation status of the majority of cells in the sample.
The methylation status of different genomic loci has been investigated and published (for example, see ref. 9). Some genomic regions have been shown to be mostly methylated, some have been shown to be mostly unmethylated, and some regions have been shown to be mostly methylated in certain tissues but mostly unmethylated in other tissues.
There are several different methods for determining the methylation status of genomic loci. Examples of methods that are commonly used are Bisulfite sequencing, Methylation-specific PCR, and Methylation-sensitive endonuclease digestion (10-13).
Further, various data sources are available for retrieving or storing DNA methylation data and making these data readily available to the public, for example MetDB (http://www.methdb.nef).
Non-limiting examples of methylated loci and corresponding primers for their detection are provided in Table 1. Non-limiting examples of unmethylated loci and corresponding primers for their detection are provided in Table 2. The herein described methods for determining the methylation pattern of nucleic acids (i.e. bisulfite sequencing, methylation specific PCR, methylation-sensitive endonuclease digestion) can be employed according to the present invention to the DNA authentication procedure by analyzing at least one version from the two versions specified below:
Version 1: based on analysis of one set of loci:
1. Determine the methylation status of DNA from a test sample for each CG locus in a set of CG loci that are constitutively methylated in in vivo generated DNA
2. For the entire set of analyzed CG loci, determine the ratio of methylated CG loci / total CG loci
3. Perform one or more of the following procedures: a. Compare the result from step 2 to a predetermined threshold level. If the ratio obtained in step 2 is greater than a predetermined threshold level (e.g. ratio obtained from test sample >= 0.2), conclude that the DNA from the test sample was generated in vivo. Otherwise, conclude that the DNA from the test sample was generated in vitro. b. Compare the result from step 2 to the corresponding result obtained from an in vitro generated control DNA sample. If the ratio obtained in step 2 is significantly greater than the ratio obtained from an in vitro generated control DNA (e.g. ratio obtained from test sample >= ratio obtained from in vitro generated control DNA + 0.2), conclude that the DNA from the test sample was generated in vivo. Otherwise, conclude that the DNA from the test sample was generated in vitro. c. Compare the result from step 2 to the corresponding result obtained from an in vivo generated control DNA sample. If the ratio obtained in step 2 is comparable to the ratio obtained from an in vivo generated control DNA (e.g. ratio obtained from test sample >= ratio obtained from in vivo generated control DNA - 0.3), conclude that the DNA from the test sample was generated in vivo. Otherwise conclude that the DNA from the test sample was generated in vitro. Version 2: based on analysis of two sets of loci:
1. Determine the methylation status of DNA from a test sample for each CG locus in a first set of CG loci that are constitutively methylated in in vivo generated DNA
2. Determine the methylation status of DNA from a test sample for each CG locus in a second set of CG loci that are constitutively unmethylated in in vivo generated DNA
3. For the entire first set of analyzed CG loci, determine the ratio of methylated CG loci / total CG loci
4. For the entire second set of analyzed CG loci, determine the ratio of unmethylated CG loci / total CG loci
5. Perform one or more of the following procedures: a. Compare the result from steps 3 and 4 to predetermined threshold levels. If the ratios obtained in steps 3 and 4 are greater than predetermined threshold levels (e.g. ratio obtained from case sample in step 3 >= 0.2 AND ratio obtained from case sample in step 4 >= 0.2), conclude that the DNA from the test sample was generated in vivo. Otherwise, conclude that the DNA from the test sample was generated in vitro. b. Compare the result from steps 3 and 4 to the corresponding results obtained from an in vitro generated control DNA sample. If the ratio obtained in step 3 is significantly greater than the corresponding ratio obtained from an in vitro generated control DNA sample AND the ratio obtained in step 4 is greater than a predetermined threshold level (e.g. ratio obtained from the test sample in step 3 >= corresponding ratio obtained from the in vitro generated control DNA + 0.2 AND ratio obtained from test sample in step 4 >= 0.2), conclude that the DNA from the test sample was generated in vivo. Otherwise, conclude that the DNA from the test sample was generated in vitro. c. Compare the results from steps 3 and 4 to the corresponding results obtained from an in vivo generated control DNA sample. If the ratios obtained in steps 3 and 4 are comparable to the corresponding ratios obtained from an in vivo generated control DNA (e.g. ratio obtained from test sample in step 3 >= corresponding ratio obtained from in vivo control DNA - 0.3 AND ratio obtained from test sample in step 4 >= corresponding ratio obtained from in vivo control DNA — 0.3), conclude that the DNA from the test sample was generated in vivo. Otherwise conclude that the DNA from the test sample was generated in vitro.
In both versions of the procedure, the initial steps (step 1 in Version 1 and steps 1 and 2 in Version 2) involve determining the methylation status of DNA at each CG locus in the set.
As noted herein exemplary methods for determining the methylation pattern of nucleic acids include, but are not limited to the following methods:
Bisulfite sequencing
Bisulfite sequencing is the sequencing of bisulfite treated-DNA to determine its pattern of methylation. The method is based on the fact that treatment of DNA with sodium bisulfite results in conversion of non-methylated cytosine residues to uracil, while leaving the methylated cytosine residues unaffected. Following conversion by sodium bisulfite, specific regions of the DNA are amplified by PCR, and the PCR products are sequenced. Since in the polymerase chain reaction uracil residues are amplified as if they were thymine residues, unmethylated cytosine residues in the original DNA appear as thymine residues in the sequenced PCR product, whereas methylated cytosine residues in the original DNA appear as cytosine residues in the sequenced PCR product.
In the procedure for DNA authentication based on bisulfite sequencing, each CG locus contains one CG dinucleotide, and the methylation status of each CG dinucleotide is determined by:
1. Subjecting DNA from a test sample to sodium bisulfite treatment.
2. Amplifying (e.g. by PCR) a genomic region that contains the CG locus from the bisulfite-treated DNA.
3. Sequencing the amplified product from step 2.
4. If the sequence obtained in step 3 at the CG locus is CG, conclude that the CG locus was methylated. Otherwise, if the sequence obtained in step 3 at the CG locus is TG, conclude that the CG locus was unmethylated. It should be understood in the context of the present invention that when sequencing from the complementary strand, the unmethylated CGs in the original sequence will appear as CA.
Methylation specific PCR
Methylation specific PCR is a method of methylation analysis that, like bisulfite sequencing, is also performed on bisulfite-treated DNA, but avoids the need to sequence the genomic region of interest. Instead, the selected region in the bisulfite-treated DNA is amplified by PCR using two sets of primers that are designed to anneal to the same genomic targets. The primer pairs are designed to be "methylated-specific" by including sequences complementing only unconverted 5-rnethylcytosines, or conversely "unmethylated-specific", complementing thymines converted from unmethylated cytosines. Methylation is determined by the relative efficiency of the different primer pairs in achieving amplification.
In the procedure for DNA authentication based on Methylation specific PCR, each CG locus is comprised of one or more CG dinucleotides in the primer sequences. CG dinucleotides that are found in the amplified genomic region, but which are not in the primer sequences (i.e. in the region between the primers) are not part of the CG locus. The methylation status of each CG locus can be determined by:
1. Subjecting DNA from a test sample to sodium bisulfite treatment.
2. Amplifying by PCR a genomic region that contains the CG locus from the bisulfite-treated DNA. For amplification, two pairs of primers are used. One pah- is designed to preferentially amplify the methylated version of the bisulfite- treated DNA, and the other pair is designed to preferentially amplify the unmethylated version of the same bisulfite-treated DNA.
3. Detecting the presence, absence, and/or quantity of amplification products from step 2 (e.g. by gel/capillary electrophoresis or real time PCR. If detection is based on capillary electrophoresis, fluorescent primers should be used in the PCR in step 2. If detection is based on real time PCR, a fluorescent DNA binding dye or a specific fluorescent DNA probe may need to be used along with the primers in the PCR in step 2). 4. Determining the methylation status of the CG locus by comparing the results obtained in step 3 for the two sets of primers used for amplification. If the primers that were designed to preferentially amplify the methylated version of the DNA produce a larger quantity of PCR product than the primers that were designed to preferentially amplify the unmethylated version of the DNA5 conclude that the CG locus was methylated. Otherwise, conclude that the CG locus was unmethylated.
It should be understood in the context of the present invention that methylation specific PCR determines the methylation status of CG dinucleotides in the primer sequences only, and not in the entire genomic region that is amplified by PCR. Therefore, CG dinucleotides that are found in the amplified sequence but are not in the primer sequences are not part of the CG locus.
Methylation-sensitive endonuclease digestion
Digestion of DNA with methylation-sensitive endonucleases represents a method for methylation analysis that can be applied directly to genomic DNA without the need to perform bisulfite conversion. The method is based on the fact that methylation-sensitive endonucleases digest only un-methylated DNA, while leaving methylated DNA intact. Following digestion, the DNA can be analyzed for methylation status by a variety of methods, including gel electrophoresis, and PCR amplification of specific loci.
In the procedure for DNA authentication based on Methylation-sensitive endonuclease digestion, each CG locus is comprised of one or more CG dinucleotides that are part of recognition sequence(s) of the methylation-sensitive restriction endonuclease(s) that are used in step 1 of the procedure. CG dinucleotides that are found in the amplified genomic region, but are not in the recognition sequence(s) of the endonucleas(s) are not part of the CG locus. The methylation status of each CG locus is determined by:
1. Subjecting DNA from a test sample to digestion with one or more methylation-sensitive endonucleases (e.g. Hpall, Hhal). 2. Amplifying (e.g. by PCR) a genomic region that contains the CG locus and a reference locus from the digested DNA. The reference locus must not contain any of the recognition sequences of the endonucleases used in step 1.
3. Detecting the presence, absence, and/or quantity of amplification products from step 2 (e.g. by gel/capillary electrophoresis or real time PCR. If detection is based on capillary electrophoresis, fluorescent primers should be used in the PCR in step 2. If detection is based on real time PCR, a fluorescent DNA binding dye or a specific fluorescent DNA probe may need to be used along with the primers in the PCR in step 2).
4. Determining the methylation status of the CG locus from the results obtained in step 3 by one of the following methods: a. Compare the signal obtained from the amplification of the CG locus to a predetermined threshold. In gel electrophoresis, if a band corresponding to the CG locus is detectable, conclude that the CG locus was methylated, otherwise conclude that the CG locus was unmethylated. In capillary electrophoresis, if the signal corresponding to the CG locus is greater than a pre-determined threshold (e.g. 50 relative fluorescence units) conclude that the CG locus was methylated, otherwise conclude that the CG locus was unmethylated. In real time PCR, if the cycle threshold ("CT") of the signal corresponding to the CG locus is less than a pre-determined threshold (e.g. 30), conclude that the CG locus was methylated, otherwise conclude that the CG locus was unmethylated. b. Compare the signal obtained from the amplification of the CG locus to the signal obtained from the amplification of the reference locus. If the signal of the CG locus is comparable to the signal of the reference locus, conclude that the CG locus was methylated, otherwise conclude that the CG locus was unmethylated. Definitions of comparable signals are as follows: In gel electrophoresis, if the intensity of the band corresponding to the CG locus is greater than a pre-determined threshold ratio (e.g. 50%) of the intensity of the band corresponding to the reference locus, the signals of both loci are comparable. In capillary electrophoresis, if the height and/or area of the peak corresponding to the CG locus is greater than a pre-determined threshold ratio (e.g. 50%) of the height and/or area of the peak corresponding to the reference locus, the signals of both loci are comparable. In real time PCR, if the difference between the cycle thresholds of the CG locus and the cycle threshold of the reference locus is not greater than 2, the signals of both loci are comparable.
It should be understood in the context of the present invention that Methylation-sensitive endonuclease digestion determines the methylation status of CG dinucleotides in the recognition sequences of the endonucleases that are used only, and not in the entire genomic region that is amplified by PCR. Therefore, CG dinucleotides that are found in the amplified sequence but are not in the recognition sequences of the endonucleases are not part of the CG locus.
b. Amplifying a set of loci
The determination whether a biological sample containing nucleic acids was generated in vivo or in vitro, can be performed by analysis of a set of genomic loci in the sample. Any genomic locus may be used for this purpose, other than those loci that are traditionally used for DNA profiling, (e.g. CODIS loci). If the in vitro generated DNA sample consists only of CODIS loci, then all other genomic loci will be absent from the sample. Therefore, the attempt to amplify any non-CODIS locus will fail in such in vitro generated DNA samples, but not in in vivo generated DNA samples. Accordingly, the absence of non-CODIS loci from the test sample indicates that the DNA was synthetically constructed and does not originate from a specific individual. A person skilled in the art needs no special guidelines for selection of these loci, as any non-CODIS loci will be appropriate for the authentication purpose. If, however, the set of additional loci is meant not only for DNA authentication but also for DNA profiling, then the usual guidelines for selection of profiling loci (e.g. polymorphic in the human population, having relatively low mutation rates, neutral, non-phenotypic, each locus present on a separate chromosome) may be employed.
Therefore, in accordance with the present invention, the presence or absence of a set of genomic loci is determined, for example, by one of the following methods: a. Amplifying each locus in the set of loci by PCR and detecting the presence of amplification products by gel or capillary electrophoresis; b. Amplifying the locus by real-time PCR and detecting the presence of amplification products. The real-time software compares the fluorescence of the sample to that of the reference sample(s) and determines at each cycle whether each PCR amplicon is present, and if so, it's amount. If at the end of 40 cycles of real-time PCR no presence is detected it is concluded that the locus is absent. c. Hybridizing the test DNA to sequences complementary to the selected loci and detecting hybridization above a pre-determined threshold level (e.g. by hybridizing to a DNA microarray).
After the presence of each locus is determined, the ratio of present loci / total analyzed loci for the entire set of analyzed loci is calculated. In addition at least one CODIS STR locus is amplified using the same method used for the amplification of the set of analyzed loci.
If the calculated ratio is equal or greater than a predetermined threshold level (e.g. 1) the DNA from the test sample was generated in vivo. Otherwise it is concluded that the DNA from the test sample was generated in vitro.
Various amplification methods can be used to amplify DNA loci, including PCR [5], transcription based amplification [7] and strand displacement amplification (SDA) [8]. Preferably, the nucleic acid sample is subjected to PCR amplification using primer pairs specific to each locus in the set. For example, the following PCR amplification method can be used to amplify the DNA loci: i. providing a nucleic acid template (e.g. the DNA from the test sample) and a PCR reaction mixture comprising one or more primers, polymerase such as Taq polymerase or another DNA polymerase with a temperature optimum at around 70°C, Deoxynucleotide triphosphates (dNTPs), and a buffer solution, providing a suitable chemical environment for stability of the DNA polymerase, ii. performing an initialization step iii. performing a denaturation step iv. performing an annealing step v. performing an elongation step vi. repeating steps iii to v 20 to 40 times, preferably 30 to 35 times, vii. performing a final elongation step viii. running the PCR product on a an electrophoresis gel ix. analyzing the signal obtained from said PCR product.
c. Calculating the Representation bias of nucleic acids
In vivo generated DNA generally has a smaller representation bias in relation to in vitro generated DNA. In the native DNA that is found in the cells of organisms each genomic locus is represented exactly once per haploid genome. In vivo, the strict control of copy numbers of genomic loci is achieved by enzymatic mechanisms that monitor the fidelity the DNA replication process. These mechanisms are not present in in vitro generated DNA, leading to preferential amplification of some loci, resulting in a significantly larger representation bias. Thus, analysis of the representation bias can be used for determining whether a nucleic acid in a biological sample containing nucleic acids was generated in vitro or in vivo. For example by the following method:
1. Defining a set of genomic loci;
2. For a test sample, calculating the Relative Copy Number (RCN) of each locus and/or allele in the set defined in step 1. This may be performed by, but is not limited to, any of the following methods: a. Real-time PCR; b. PCR followed by quantification of PCR products by gel electrophoresis or by capillary electrophoresis. If capillary electrophoresis is used, either PCR product peak heights and/or peak areas may be used for quantification. c. Hybridization to sequences complementary to the tested loci (e.g. using a DNA microarray).
3. Calculating the Representation Bias Value (RBV) of the test sample, which is a numerical value representing the degree of representation bias, in one of the methods as follows: a. RBV = ratio between the maximal and minimal RCN values obtained in step 2; b. RBV = ratio between the standard deviation and the mean of all the RCN values obtained in step 2; c. Calculating the mean of the squared differences between each RCN value obtained in step 2 and the mean of all the RCN values obtained in step 2, using the following formula:
Figure imgf000036_0001
n where n is the number of loci in the set. d. If the analysis method used in step 2 is able to differentiate between the relative copy numbers of both alleles of a single heterozygous locus (e.g. if DNA was size fractionated by electrophoresis): i. Obtaining for each heterozygous locus the ratio between the RCN of the allele with the smaller copy number and the RCN of the allele with the larger copy number (in capillary electrophoresis, this ratio is often defined as the PHR = Peak Height Ratio between alleles) ii. Calculating the mean of the ratios obtained in step (i) iii. RBV = 1 / value obtained in (ii) e. If RCN values were obtained by capillary electrophoresis: Calculating the RBV from the mean deviation of genotyped peak heights of the capillary electrophoresis histogram based on a linear regression of the genotyped peaks. The linear regression may be calculated for example using the Least Squares method [13] Calculating the linear regression allows for correction of the "ski-slope" effect which is seen in some capillary electrophoresis histograms as a result of sample overload, DNA degradation and other factors, and which causes the smaller amplicons to be amplified preferentially over larger amplicons. Since different fluorescent dyes have different intensities, the linear regression may be calculated separately for each dye.
The calculation is performed as follows: i. For each fluorescent dye color (e.g. NED) of the capillary electrophoresis histogram:
(a) Separate superimposed alleles at homozygous loci: for each homozygous locus, convert the single genotyped peak that corresponds to both alleles into two identical peaks with the same size as the original peak, and with a height equal to half the height of the original peak.
(b) Calculating a linear regression of all peaks corresponding to alleles
(c) For each peak corresponding to an allele, calculating the normalized degree of deviation of the peak from the linear regression obtained in (b). This may be performed, for example, by the following non-limiting option:
1. Obtaining the y- value of the linear regression obtained in (b) at x, where x is the size of the peak
2. Calculating the normalized deviation of the peak height from the linear regression, equal to | peak height - value from cl]/(value from cl);
3. Alternatively, calculate | peak height - value from cl|2/(vame from cl) ii. Define RBV as equal to the mean of the values obtained in (i).
4. Calculating a likelihood parameter that is correlated to the likelihood that the DNA in the test sample was generated in vivo. The likelihood parameter may be calculated by one of the following non-limiting options:
a. Calculating the likelihood parameter based on a database of RBVs obtained from analyses of in vivo generated DNA samples. The likelihood parameter is equal to the maximum of the following two values: (1) the fraction of database elements with RBV equal to or greater than the value obtained for the test sample in step 3, and (2) Mn, where n is the number of database elements
Calculating the likelihood parameter based on a pre-determined normal distribution of RBVs. The likelihood parameter is equal to the probability of a random sampling from the normal distribution having a value that is equal to or greater than the value of the test sample, obtained in step 3. This likelihood is equal to the value of the complementary cumulative distribution of the normal function, and can be calculated by the following formula:
Figure imgf000038_0001
where JC is the value obtained for the case sample, μ and σ are the mean and standard deviation (respectively) of the normal distribution, andp is the obtained likelihood value;
5. Determining whether the test sample was generated in vitro or in vivo by either of the following: a. If the likelihood parameter obtained in step 4 is smaller than a predetermined threshold (e.g. 0.05) then conclude that DNA from then conclude that the test sample was generated in vitro, otherwise conclude that it was generated in vivo. b. Perform steps 1-4 on a reference sample (e.g. from a suspect with a similar profile), calculate the ratio between the likelihood parameter of the test sample and the likelihood parameter of the reference sample. If this ratio is smaller than a predefined threshold (e.g. 0.5), conclude that the test sample was generated in vitro, otherwise conclude that the test sample was generated in vivo.
It should be appreciated that in cases that the DNA in the sample was generated in vitro, the likelihood parameter may be much smaller than the threshold indicated above, e.g. under 0.01, or under 0.005.
Optionally, this method can be performed on capillary electrophoresis histograms obtained by standard profiling kits (e.g. Identifiler). In such cases, the above method should start in step 3. The loci used for representation bias analysis may be chosen as follows:
The analysis may be performed on a set of STR loci used for DNA profiling, such as the SGM+ or Identifier loci. In accordance with the above, analysis is performed on the same capillary electrophoresis histogram that is used for profiling.
In addition, known loci or loci that are expected to have a high representation bias in WGA are selected. As a non-limiting example, the set can include loci that are under-represented loci in Multiple Displacement Amplification [MDA]-based WGA, e.g. in telomere or centromere regions of chromosomes, and other normal/over- represented loci. In a specific example, the set can include the vWA locus (over- represented in WGA).
In addition, the loci should be selected such that they are well separated, preferable residing on separate chromosomes
d. Calculating the amount of PCR stutter of nucleic acids
Since PCR stutter is an artifact produced during a PCR reaction, profiling of DNA that was generated in vitro by PCR, or by a PCR-based WGA method, will have increased stutter in relation to profiling of in vivo generated DNA. This is because hi the former case two PCR reactions (one of the in vitro generation of DNA and one of the DNA profiling ) are involved, while in the latter case there is only one (the DNA profiling) PCR reaction.
Therefore, in accordance with the present invention, the determination whether nucleic acids in a biological sample were generated in vitro or in vivo can be performed based on analysis of PCR stutter, for example, as follows:
1. Subjecting the test sample to PCR analysis using primers specific to selected genetic loci;
2. Detecting the PCR amplification products using capillary electrophoresis (The capillary electrophoresis machine records the raw data in the form of pairs of numbers. Each pair contains an X coordinate, which records the time point, and hence is correlated to the length of the DNA, and a Y coordinate, which records the intensity of fluorescence, and hence is correlated to the quantity of DNA). 3. The raw data is processed for detection of alleles and stutter peaks by either: i. Standard capillary electrophoresis analysis software (e.g.
GeneMapper). ii. The following algorithm:
1. From the raw data, find all local maxima and term them "peaks". A local maximum is a point (X Y); in which the Y value is greater than the Y value of both the previous (i- 1) data pair and the next (i+1) data pair (optionally use a smoothing method in order to reduce the number of maxima). Define the peak height as the Y value of the peak. Define the peak size as the X value of the peak.
2. Term all peaks that have Y values greater than a predetermined threshold "Putative alleles" (e.g. a threshold of 50 relative fluorescence units)
3. For each putative allele, obtain the "Maximum expected stutter value". The maximum expected stutter value represents the highest fraction of a stutter band that can be expected in in vivo generated DNA. The maximum expected stutter value is determined empirically based on multiple capillary electrophoresis runs of different samples and is different for each locus. (For example, for the D3S1358 locus, the maximum allowed stutter value in the GeneMapper software is 0.11).
4. Determine which putative alleles are true alleles. Examine all putative alleles, starting from the smallest size. For each examined putative allele, determine whether a putative allele exists at a predefined interval that is approximately one repeat unit larger than the putative allele that is examined (e.g. at [+3.25 bases, +4.75 bases]). If no putative allele is found at the designated region, term the examined putative allele "Allele". Otherwise: term the putative allele that is found in the designated region "The associated putative allele of the examined putative allele". Calculate the ratio of the height of the examined putative allele to the height of the associated putative allele of the examined putative allele. If this ratio is greater than the maximum expected stutter value of the examined putative allele, term the examined putative allele "Allele".
5. Determine stutter peaks. For each allele, inspect a predefined interval that is approximately one repeat unit smaller than the examined allele (e.g. [-4.75 bases, -3.25 bases]). Identify the highest peak in the interval. If the highest peak in the interval is not termed as "Allele", term the said peak "-1 stutter associated with the examined allele".
6. Calculating stutter fractions. Calculate the size of the -1 stutter fraction, defined as the height of the -1 stutter peak divided by the height of its associated allele peak. Alternatively, the stutter fraction is defined as the area of the -1 stutter peak divided by the area of its associated allele peak.
7. Calculating the likelihood parameters. For each allele in the test sample with an associated stutter: calculate a likelihood parameter that is correlated to the likelihood that the specific allele in the test sample was generated in vivo. The likelihood parameter may be calculated by the following non- limiting options:
a. Based on a database of -1 stutter values obtained from analyses of capillary electrophoresis runs of in vivo generated DNA samples. The likelihood parameter is equal to the maximum of the following two values: (1) the fraction of said database elements (corresponding to the same allele) with -1 stutter fraction values equal to or greater than the value obtained for the test sample in step 6, and (2) Mn, where n is the number of said database elements (corresponding to the analyzed allele) b. Based on a pre-determined normal distribution of in vivo -1 stutter values for this allele. The likelihood parameter is equal to the probability of a random sampling from the said normal distribution having a value that is equal to or greater than the value obtained for the test sample in step 6. This likelihood is equal to the value of the complementary cumulative distribution of the normal function, and can be calculated by the following formula:
Figure imgf000042_0001
where x is the value obtained for the test sample, μ and σ are the mean and standard deviation (respectively) of the normal distribution, snap is the obtained likelihood parameter value;
8. For the entire set of likelihood parameters obtained in step 7, calculating the "joint likelihood value" of the test sample, which is correlated to the likelihood that the DNA in the test sample was generated in vivo. A non-limiting example of how to calculate this value is by the Fisher's combined probability test, which combines the results from a variety of independent tests into one test statistic (Jr) having a chi-square distribution using the formula:
t=l where k is the number likelihood parameters, and Pi are the likelihood parameters obtained in step 7. The p- value for X2 itself can be interpolated from the chi-square table using 2k degrees of freedom. Such a table is available for example in [12]. The compute/; value is the joint likelihood value. 9. Determining whether the test sample was generated in vitro or in vivo by either of the following: i. If the joint likelihood value obtained in step 8 is smaller than a predetermined threshold (e.g. 0.05), conclude that the DNA from the test sample was generated in vitro, otherwise conclude that it was generated in vivo. ii. Perform steps 1-8 on a reference sample (e.g. from a suspect with a similar profile), calculate the ratio between the joint likelihood value of the test sample and the joint likelihood value of the reference sample. If this ratio is smaller than a predefined threshold (e.g. 0.5), conclude that the test sample was generated in vitro, otherwise conclude that the test sample was generated in vivo.
Alternatively, or in addition, the method can be performed using the +1 stutter instead of the -1 stutter.
It should be appreciated that in cases that the DNA in the sample was generated in vitro, the joint likelihood value may be much smaller than the threshold indicated above, e.g. under 0.01, or under 0.005.
It should also be noted that this method can be performed on capillary electrophoresis histograms obtained by standard profiling kits (e.g. Identifϊler). In such cases, the above method should start in step 3.
e. Detection of non genomic sequences in the biological sample
In vitro generated DNA can be detected by the presence of non-genomic sequences obtained from the biological sample. The non-genomic sequences may include primer dimers (in DNA generated by PCR-based methods), plasmid sequences (in DNA generated by cloning methods), non-genomic sequences ligated to ends of genomic sequences (e.g. in ligation-mediated PCR). The presence of such non-genomic sequences can be detected by assays which are well-known in the art, for example, by cloning of the nucleic acids from the test sample into bacteria, and sequencing the cloned molecules.
f. Determining the distribution of nucleic acid fragment lengths in the biological sample
Another method for distinguishing between in vivo generated and in vitro generated DNA is by analyzing the distribution of nucleic acid fragment lengths in the test sample. Non-degraded, in vivo generated DNA, that is extracted from biological samples by standard procedures consists of a distribution of fragments of varying lengths, from about 500 base pairs (bps) up to more than 10,000 bps. In contrast, DNA generated in vitro may consist of either small fragments only (e.g. DNA generated by PCR), or fragments with a relatively uniform size distribution (e.g. cloned DNA).
The distribution of fragment lengths may be determined by the following method:
1. Subjecting DNA from a test sample to size fractionation (e.g. by gel electrophoresis, mass spectrometry).
2. Subjecting DNA from an in vivo generated reference sample to size fractionation using the same method used in step 1
3. For both the test sample and the reference sample, determining the distribution of fragment lengths (i.e. amount of DNA as a function of fragment size). This can be performed by a variety of commercial software programs (e.g. TotalLab of BioSystematica).
4. Perform one or more of the following:
a. If the DNA in the test sample does not contain fragments larger than 10 lάlobases, conclude that the DNA of the test sample was generated in vitro, otherwise conclude that the DNA of the test sample was generated in vivo. b. Comparing both distributions obtained in step 3 using a statistical test which determines whether both distributions represent two random samplings from the same source distributions (e.g. by performing the Kolmogorov-Smirnov two sample goodness-of-fit hypothesis test [14]. If the analysis shows that the probability that both distributions represent random samplings from the same source distributions is less than a predefined threshold (e.g. 0.05), conclude that the DNA of the test sample was generated in vitro, otherwise conclude that the DNA was generated in vivo.
c. g. Detection of RNA in the biological sample
Another method for distinguishing between in vivo generated and in vitro generated DNA is by detecting the presence of KNA in the biological sample. Biological samples that have not been adversely affected by environmental conditions will likely contain a certain amount of RNA transcripts. Although in most conditions, RNA degrades much faster than DNA, transcripts of highly transcribed housekeeping genes (e.g. SDHA) are likely to be found the biological sample if it partially degraded.
If RNA is detected, it can be concluded that the DNA in the sample was generated in vivo, if RNA is not detected, it can be concluded that the DNA in the sample was generated in vitro.
The presence of RNA in the sample may be detected by assays which are well known in the art, for example by RT-PCR (reverse-transcriptase PCR) on a specific locus.
It should be noted that if a "fake" sample contains some biological material (e.g. red blood cells extracted from fractionated blood), then some residual RNA may be present in the fake sample. However, this RNA will most likely not be compatible with the in vitro generated DNA that is found in the sample. This incompatibility can be detected by genotyping a set of transcribed STRs (e.g. RT-PCR followed by capillary electrophoresis). EXAMPLES
It should be understood in the context of the present invention that by verifying the authenticity of a nucleic acid molecule the authenticity of the biological sample (e.g. test sample) that contains the nucleic acid is thereby established. In this respect a 'fake' blood sample is a blood sample in which the nucleic acids were generated in vitro.
Example 1: Demonstration of a CODIS profile obtained from a fake biological sample
In order to demonstrate that a CODIS profile can be obtained from a fake biological sample three mock forensic samples were produced:
Sample preparation
Sample 1 - A dry blood stain on a cotton fabric, prepared from lOμl of venous blood from individual (A) that was dispensed on the fabric. This sample contains "real", in vivo generated, DNA (Fig 2A).
Sample 2 - A dry blood stain on a cotton fabric, prepared from lOμl of venous blood from individual (B) that was dispensed on the fabric. This sample contains "real", in vivo generated DNA (Fig 2B).
Sample 3 - A dry blood stain on cotton composed of red blood cells from individual (A) mixed with in vitro generated DNA that was amplified from the DNA of individual (B). This sample contains only "fake", in vitro generated, DNA, because red blood cells are not nucleated and therefore contain no genomic DNA (Fig 2C).
Sample 3 was prepared as follows:
Red blood cells were isolated from the bottom phase of the fractionated blood from individual (A), following centrifugation at 150Og for 10 minutes.
Genomic DNA from individual (B) was extracted from a saliva stain on tissue paper by organic extraction according to a published protocol [10]. Ten nanograms of the extracted DNA were used as template for in vitro multiple displacement amplification with the Repli-G kit (Qiagen), yielding 10 μg of in vitro generated DNA. The generated DNA includes copies of all genomic loci.
For preparation of Sample3, 40 μl of red blood cells were mixed with 60 μl (6 μg) of in vitro generated DNA, and dispensed on the fabric. Human blood test
All samples were tested with the HEXAGON OBTI (BLUESTAR forensic) human blood test, (which is based on detection of human hemoglobin and is routinely performed in crime scenes for identification of human DNA), and the results were positive, confirming that all three samples contain blood from human origin. This result shows that "fake" DNA samples can be produced easily with basic lab techniques with little financial expense.
DNA extraction, quantification, and profiling
DNA was extracted from all bloodstain samples by organic extraction according to a published protocol [10] and quantified in real time PCR using the Quantifiler kit (Applied Biosystems).
Profiling was performed on Ing DNA extracted from each sample. Multiplex PCR of CODIS loci was performed in 50 μl total reaction volume in a GeneAmp PCR system 9700-GOLD (Applied Biosystems) using the ProfilerPlus kit (Applied Biosystems). Amplified products were separated on an ABIPRISM 310 Genetic Analyzer capillary electrophoresis machine, and analyzed using the GeneMapperID-X 1.1 software (Applied Biosystems).
Analysis of the profiles
The profiles of all samples are depicted in figure 2. The profile of sample 3 (the "fake" sample; Fig 2C) is identical to the profile of sample 2 (Fig 2B), and does not contain any additional alleles that are found in sample 1 (Fig 2A, which corresponds to the human origin of the red blood cells used in sample 3). The GeneMapperID-X 1.1 software performs automatic analysis of capillary electrophoresis histograms of DNA profiles. In each locus, the software determines whether the profile is consistent with a single human source. Specifically, it verifies that the number of alleles is 1-2, and that the peak height ratio in each heterozygous locus is >=70%. The software also verifies for all alleles that the peak heights are within the limits of reasonable minimum and maximum values. The software outputs its analysis in the form of a colored bar above each locus, whereby a green bar indicates a "perfect" score, and yellow and red bars indicate scores that are "imperfect" to various degrees. The software also outputs a similar color coded score for the entire profile.
As can be seen in figure 2, the profile of sample 3 is "perfect". This demonstrates that "perfect" profiles can be obtained from biological samples that were forged using simple techniques.
Example 2: Demonstration of a procedure for DNA authentication based on analysis of methylation in HpaII digested DNA
The analysis was performed on two of the mock forensic samples described in Example 1 above — sample 2 ("real" sample from individual B), and sample 3 ("fake" sample containing red blood cells of individual A and in vitro generated DNA copied from the DNA of individual B).
Figure 3 depicts a DNA authentication procedure based on analysis of methylation in HpaII digested DNA, as exemplified below.
Digestion with HpaII
Aliquots of DNA from each of the two samples were digested with HpaII, which is a methylation-sensitive restriction endonuclease that specifically recognizes and cleaves the sequence CCGG only if it is unmethylated. The digestion reaction was performed in 20 μl total reaction volume, including IOng of DNA template, 10 units of HpaII (New England Biolabs), and 2 μl of 1OX buffer 4 (New England Biolabs). Digestion was performed at 370C for one hour, followed by heat inactivation of the enzyme by incubation at 650C for 20 minutes.
Amplification of methylation loci
Analysis of the methylation status was performed by amplification and analysis of three types of genomic loci:
1. Constitutively methylated loci that contain an HpaII recognition sequence - PCR amplification of these loci is expected to produce amplicons in in vivo generated DNA since the methylation of the HpaII sites blocks their restriction by the enzyme. Conversely, in unmethylated in vifro generated DNA, the HpaII sites are cleaved by the enzyme, leading to fragmented DNA that is not amplified in subsequent PCR.
2. Constitutively unmethylated loci that contain an HpaII recognition sequence - PCR amplification of these loci is not expected to produce amplicons, neither in in vivo generated DNA nor in in vitro generated DNA, since in both cases the HpaII sites are not methylated.
3. A reference locus that does not contain any HpaII site and is therefore amplified in subsequent PCR that is performed on all templates, regardless of their methylation status.
Following digestion with HpaII, each sample was divided into 5 aliquots and amplified by PCR (one PCR performed for each aliquot) at 5 genomic loci - CMl5 CM2 (constitutively methylated loci, Primer sequences are in Table I)5 CUl, CU2 (constitutively unmethylated loci, Primer sequences are in Table 2), and REFl (reference locus, Primer sequences are in Table 3). PCR was performed in the GeneAmp PCR system 9700-GOLD (Applied Biosystems) machine in a total reaction volume of 50 μl. The PCR program consisted of 28 cycles, and all forward primers were labeled with a fluorescent dye (NED).
Table 1
Figure imgf000049_0001
Table 2
Figure imgf000050_0001
Table 3
Figure imgf000050_0002
Capillary electrophoresis of amplified products
For each sample, aliquots of amplification products from CMl, CM2, CUl, CU2, and REFl were combined with aliquots of the ProfilerPlus products of the same sample and run on an ABIPRISM 310 Genetic Analyzer capillary electrophoresis machine. The resulting capillary histograms are shown in figure 4B (bars above loci were added for illustration purposes).
In the capillary electrophoresis histograms, the left part corresponds to the authentication loci, while the right part corresponds to the profiling loci. In both samples REFl amplified successfully, indicating that the PCR reaction was successful. In sample 2 (in vivo generated, "real" DNA), both CMl and CM2 successfully amplified, while CUl and CU2 amplified weakly, indicating that both CMl and CM2 were methylated, whereas CUl and CU2 were unmethylated. This result confirms that the DNA was generated in vivo. In sample 3, no amplification products were visible for CMl, CM2, CUl, and CU2, indicating that all these loci were unmethylated. This indicates that sample 3 was generated in vitro. Fig 5 depicts the capillary electrophoresis histograms of samples 2 and 3.
Example 3: Demonstration of a procedure for DNA authentication based on capillary electrophoresis
In a standard DNA profiling reaction, the profile of a DNA sample is obtained by performing the following steps: (i) performing multiplex PCR (with fluorescent primers), (ii) running the amplified PCR products on a capillary electrophoresis machine, and (iii) analyzing the obtained capillary electrophoresis histogram. Various DNA profiling kits are currently available, including SGM+, PowerPlexlβ, ProfilerPlus, CoFiler, and others.
DNA authentication may also be performed based on analysis of a capillary electrophoresis histogram. A single histogram that contains the authentication and profiling data is contained in a single computer file. According to this procedure, DNA authentication and profiling can be performed simultaneously.
According to one option, the PCR for DNA profiling and the PCR for DNA authentication are performed separately, but their amplified products are joined together into a single capillary electrophoresis run. This option was employed in Example 2.
According to this option, and as illustrated in Fig. 4A, DNA from a biological sample is divided into two aliquots. One aliquot is used for the biochemical step of the standard DNA profiling procedure (multiplex PCR on CODIS loci). The other aliquot is used for the biochemical step of the DNA authentication procedure. The products of both biochemical steps are combined into a single tube and run on a capillary electrophoresis machine. The resulting histogram is analyzed by a signal analysis software which performs both profiling and authentication.
According to a second option, the DNA profiling and DNA authentication are performed in a single multiplex PCR reaction and in a single capillary electrophoresis run. When using HpaII methylation-sensitive endonuclease, STR loci that are found in kits such as CoFiler, ProfilerPlus, Identifiler, SGM+ and PowerPlexlό do not contain a HpaII site, and therefore a joint PCR reaction, amplifying STR loci from one of the above kits and different STR loci for DNA authentication will succeed in amplifying all profiling and authentication loci.
Separate PCR reactions, combined capillary electrophoresis reaction
1. A biological sample (herein, the "test sample", e.g. cigarette butt, blood-stain, saliva) is obtained
2. DNA is extracted from the test sample (e.g. using organic extraction or Chelex)
3. DNA obtained in step 2 is quantified (e.g. using real-time PCR)
4. 0.5-2ng of the DNA sample is used for the standard DNA profiling procedure (e.g. SGM+, ProfϊlerPlus, CoFiler), obtaining amplified PCR products of a set of genomic loci
5. Another 0.5-2ng of the DNA sample are used for the authentication DNA procedure, in one of the following options: a. Analysis of DNA methylation based on methylation-sensitive endonuclease digestion: i. Subjecting the DNA from a test sample to digestion with one or more methylation-sensitive endonucleases (e.g. Hpall, Hhal) ii. Perform multiplex PCR on a set of loci including one or more restriction sites corresponding to the endonucleases used in (i) b. Analysis of genomic loci that are not part of DNA profiling: i. Perform multiplex PCR on a set of loci that are not part of DNA profiling
6. The amplified PCR products obtained in step 4 and 5 are combined and run in a single capillary electrophoresis reaction.
7. The capillary electrophoresis histogram obtained is step 6 is conceptually divided into two sections, one corresponding to authentication data, and the other corresponding to profiling data.
8. The capillary electrophoresis histogram section corresponding to profiling data is analyzed and a profile is generated
9. The capillary electrophoresis histogram section corresponding to authentication data is analyzed.
10. Output the results of the profiling and authentication analyses. Joint PCR and capillary electrophoresis reactions:
1. A biological sample (herein, the "test sample", e.g. cigarette butt, blood-stain, saliva) is obtained.
2. DNA is extracted from the test sample (e.g. using organic extraction or Chelex).
3. DNA obtained in step 2 is quantified (e.g. using real-time PCR).
4. In case of analysis based on DNA methylation using methylation-sensitive endonuclease digestion: subjecting the DNA from a test sample to digestion with one or more methylation-sensitive endonucleases (e.g. Hpall, Hhal)
5. 0.5-2ng of the DNA sample is used for a joint DNA profiling and authentication procedure: a multiplex PCR reaction including profiling and authentication genomic loci
6. The amplified PCR products obtained in step 5 are run in a capillary electrophoresis reaction
7. The capillary electrophoresis histogram obtained is step 6 is conceptually divided into two sections, one corresponding to authentication data, and the other corresponding to profiling data
8. The capillary electrophoresis histogram section corresponding to profiling data is analyzed and a profile is generated
9. The capillary electrophoresis histogram section corresponding to authentication data is analyzed
10. Output the results of the profiling and authentication analyses
Example 4: Calculation of the representation bias
This example illustrates calculation of representation bias based on a linear regression of capillary electrophoresis histogram peaks. In figure 6, linear regressions (dashed lines) based on the peaks of in vivo- (6 A, 6B) and in vitro (6C, 6D)-generated DNA samples are shown. For each peak its degree of deviation from the linear regression is calculated. Bar plots show the degree of deviation of each peak. For example, the deviation of peak #3 in the in vitro generated DNA sample is 64%, as can be seen in the corresponding bar (see arrow). The representation bias of a sample is the mean of all deviations. In vivo generated DNA samples are expected to have significantly lower representation bias values than in vitro generated DNA samples.

Claims

CLAIMS:
1. A method for verifying the authenticity of nucleic acid molecules employed in nucleic-acid based analysis procedures, the method comprising:
(a) obtaining nucleic acids; and
(b) conducting an analysis on said nucleic acids in order to determine whether said nucleic acids were generated in vivo or in vitro; wherein the determination that said nucleic acids were generated in vitro is indicative that said nucleic acids are not authentic, and wherein the determination that said nucleic acids were generated in vivo is indicative that said nucleic acids are authentic.
2. A method for verifying the authenticity of biological samples containing nucleic acid molecules employed in nucleic-acid based analysis procedures, the method comprising:
(a) obtaining nucleic acids, wherein said nucleic acids were obtained from a biological sample ; and
(b) conducting an analysis on said nucleic acids in order to determine whether said nucleic acids were generated in vivo or in vitro; wherein the determination that said nucleic acids were generated in vitro is indicative that said sample is not authentic, and wherein the determination that said nucleic acids were generated in vivo is indicative that said nucleic acids are authentic.
3. A method for verifying the authenticity of nucleic acids employed in nucleic- acid based analysis procedures, the method comprising:
(a) obtaining a capillary electrophoresis histogram of amplified nucleic acids; and
(b) conducting an analysis on said histogram in order to determine whether said nucleic acids were generated in vivo or in vitro; wherein the determination that said nucleic acids were generated in vitro is indicative that said nucleic acids are not authentic, and wherein the determination that said nucleic acids were generated in vivo is indicative that said nucleic acids are authentic.
4. A method for verifying the authenticity of biological samples containing nucleic acid molecules employed in nucleic-acid based analysis procedures, the method comprising:
(a) obtaining a capillary electrophoresis histogram of amplified nucleic acids isolated from said biological sample; and
(b) conducting an analysis on said histogram in order to determine whether said nucleic acids were generated in vivo or in vitro; wherein the determination that said nucleic acids were generated in vitro is indicative that said sample is not authentic, and wherein the determination that said nucleic acids were generated in vivo is indicative that said nucleic acids are authentic.
5. A method according to any of claims 1-4 wherein the authenticity of said nucleic acids or said sample is determined by subjecting the nucleic acid molecules of a test sample to at least one procedure selected from the group consisting of:
(a) analyzing the methylation pattern of said nucleic acids and determining whether the methylation pattern of said nucleic acids is consistent with in vivo generation of said nucleic acids or consistent with in vitro generation of said nucleic acids, wherein consistency with in vivo generation is indicative that said nucleic acids are authentic, and wherein consistency with in vitro generation is indicative that said nucleic acids are not authentic,
(b) amplifying a set of loci from said nucleic acids and determining whether the amplification pattern of said loci is consistent with in vivo generation of said nucleic acids or consistent with in vitro generation of said nucleic acids, wherein consistency with in vivo generation is indicative that said nucleic acids are authentic, and wherein consistency with in vitro generation is indicative that said nucleic acids are not authentic,
(c) calculating the representation bias in said nucleic acids and determining whether the representation bias of said nucleic acids is consistent with in vivo generation of said nucleic acids or consistent with in vitro generation of said nucleic acids, wherein consistency with in vivo generation is indicative that said nucleic acids are authentic, and wherein consistency with in vitro generation is indicative that said nucleic acids are not authentic, (d) calculating the amount of PCR stutter of said nucleic acids and determining whether the pattern of PCR stutter of said nucleic acids is consistent with in vivo generation of said nucleic acids or consistent with in vitro generation of said nucleic acids, wherein consistency with in vivo generation is indicative that said nucleic acids are authentic, and wherein consistency with in vitro generation is indicative that said nucleic acids are not authentic,
(e) screening for the presence of non-genomic sequences in said nucleic acids, wherein the absence of non-genomic sequences in said nucleic acids is indicative that said nucleic acids are authentic, and wherein the presence of non- genomic sequences in said nucleic acids is indicative that said nucleic acids are not authentic,
(f) analyzing the distribution of nucleic acid fragment lengths in said nucleic acids; and determining whether said distribution is consistent with in vivo generation of said nucleic acids or consistent with in vitro generation of said nucleic acids, wherein said consistency with in vivo generation is indicative that said nucleic acids are authentic, and wherein said consistency with in vitro generation is indicative that said nucleic acids are not authentic, and
(g) screening for presence of RNA in said nucleic acids, wherein said presence of RNA is indicative that said nucleic acids are authentic, and wherein the absence of RNA of said nucleic acids is indicative that said nucleic acids are not authentic.
6. A method according to claim 5 wherein in step (b) said amplifying step is carried out using PCR or Restriction and Circularization-Aided Rolling Circle Amplification.
7. A method according to claim 6 wherein the PCR is performed using CODIS STR primers and non-CODIS STR primers and wherein concurrent presence of CODIS STR PCR products and absence of non-CODIS STR PCR products in the sample is indicative that said sample is not authentic, and wherein presence of both CODIS STR PCR products and non-CODIS STR PCR products in the sample is indicative that said sample is authentic.
8. A method according to claim 5 wherein calculating the representation bias comprises: (a) defining a set of genomic loci;
(b) calculating the Relative Copy Number (RCN) of each locus and/or allele in the set;
(c) calculating the Representation Bias Value (RBV) of the test sample; and
(d) calculating a likelihood parameter representing the likelihood of obtaining an RBV equal to or greater than the RBV obtained in step c in an in vivo generated DNA sample;
Wherein when the value of the likelihood parameter obtained in step (d) is smaller than a predefined threshold the nucleic acids from the test sample are not authentic, and when the value of the likelihood parameter obtained in step (d) is equal to or larger than a predefined value, the nucleic acids from the test sample are authentic.
9. A method according to claim 5 wherein calculating the representation bias comprises:
(a) defining a set of genomic loci;
(b) calculating the Relative Copy Number (RCN) of each locus and/or allele in the set for a test sample and for a reference sample;
(c) calculating the Representation Bias Value (RBV) of the test sample; and
(d) calculating a likelihood parameter representing the likelihood of obtaining an RBV equal to or greater than the RBV obtained in step c in an in vivo generated DNA sample; wherein when the ratio between the value of the likelihood parameter obtained from the test sample and the value of the likelihood parameter obtained from the reference sample is smaller than a predefined value the nucleic acids from the test sample are not authentic, and when said ratio is equal to or larger than a predefined value, the nucleic acids from the test sample are authentic.
10. A method according to claim 5 wherein calculating the amount of PCR stutter comprises:
(a) subjecting the test sample to PCR analysis using primers specific to selected genetic loci;
(b) analyzing the PCR amplification products using capillary electrophoresis; (c) processing the capillary electrophoresis data for detection of alleles and stutter peaks;
(d) determining the size and/or area of the -1 and/or +1 stutter fraction;
(e) calculating the likelihood parameters representing the likelihoods of obtaining the stutter values obtained in step d in an in vivo generated nucleic acid sample;
(f) calculating the joint likelihood value of the test sample, representing the likelihood that the test sample was generated in vivo; wherein when the joint likelihood value obtained in step (f) is smaller than a predefined threshold the nucleic acids from the test sample are not authentic, and when the joint likelihood value obtained in step (f) is equal to or larger than a predefined value, the nucleic acids from the test sample are authentic.
11. A method according to claim 5 wherein calculating the amount of PCR stutter comprises:
(a) Subjecting the test sample and a reference sample to PCR analysis using primers specific to selected genetic loci;
(b) Analyzing the PCR amplification products using capillary electrophoresis;
(c) processing the capillary electrophoresis data for detection of alleles and stutter peaks;
(d) determining the size and/or area of the -1 and/or +1 stutter fraction;
(e) calculating the likelihood parameters representing the likelihoods of obtaining the stutter values obtained in step d in an in vivo generated nucleic acid sample;
(f) calculating the joint likelihood value of the test sample, representing the likelihood that the test sample was generated in vivo; wherein when the ratio between the value of the joint likelihood parameter obtained from the test sample in step f and the value of the joint likelihood parameter obtained from a reference sample is smaller than a predefined value the nucleic acids from the test sample are not authentic, and when said ratio is equal to or larger than a predefined value, the nucleic acids from the test sample are authentic.
12. A method according to any of claims 7-11 wherein the likelihood parameter is calculated by comparison to a database or calculated by comparison to a normal distribution of corresponding values.
13. A method according to claim 5 wherein determination of the methylation pattern is performed by analyzing a set of at least one CG loci, said analysis comprising:
(a) determining the methylation status of each CG locus in said set of CG loci wherein said CG loci are constitutively methylated in in vivo generated DNA;
(b) determining the ratio between methylated CG loci and total CG loci in said set of CG loci, and
(c) comparing the ratio obtained in step b to a predefined threshold value, wherein a ratio lower than said threshold value is indicative that said nucleic acids are not authentic, and wherein a ratio equal to or larger than said threshold value is indicative that said nucleic acids are authentic.
14. A method according to claim 5 wherein determination of the methylation pattern is performed by analyzing a set of at least one CG loci, said analysis comprising:
(a) determining the methylation status of each CG locus in said set of CG loci wherein said CG loci are constitutively methylated in in vivo generated DNA;
(b) determining the ratio between methylated CG loci and total CG loci in said set of CG loci,
(c) comparing the ratio obtained in step b to a corresponding ratio obtained from an in vitro generated reference sample, wherein a significantly larger ratio obtained from the test sample in comparison to the corresponding ratio obtained from the reference sample is indicative that said nucleic acids are authentic, and wherein if the ratio obtained from the test sample is not significantly larger than the ratio obtained from the reference sample, this is indicative that said nucleic acids are not authentic.
15. A method according to claim 5 wherein determination of the methylation pattern is performed by analyzing a set of at least one CG loci, said analysis comprising:
(a) determining the methylation status of each CG locus in said set of CG loci wherein said CG loci are constitutively methylated in in vivo generated DNA;
(b) determining the ratio between methylated CG loci and total CG loci in said set of CG loci, and
(c) comparing the ratios obtained in step b to a corresponding ratio obtained from an in vivo generated reference sample, wherein comparable ratios of the test sample and the reference sample are indicative that nucleic acids from the test sample are authentic, and wherein the ratio of the test sample is not comparable to the ratio of the reference sample, this is indicative that nucleic acids from the test sample are not authentic.
16. A method according to claim 5 wherein determination of the methylation pattern is performed by analyzing two sets each set comprising at least one CG loci, said analysis comprising:
(a) determining the methylation status of each CG locus in said two sets of CG loci wherein in the first of said sets said CG loci are constitutively methylated in in vivo generated DNA; and wherein in the second of said sets said CG loci are constitutively unmethylated in in vivo generated DNA;
(b) determining the ratio between methylated CG loci and total CG loci in said first set of CG loci,
(c) determining the ratio between unmethylated CG loci and total CG loci in said second set of CG loci,
(d) comparing the ratios obtained in steps b and c to predefined threshold values, wherein when both ratios are greater than said predefined threshold value said nucleic acids are authentic, and wherein if at least one ratio is not greater than its corresponding predefined threshold value, this is indicative that said nucleic acids are not authentic.
17. A method according to claim 5 wherein determination of the methylation pattern is performed by analyzing two sets each set comprising at least one CG loci, said analysis comprising:
(a) determining the methylation status of each CG locus in said two sets of CG loci wherein in the first of said sets said CG loci are constitutively methylated in in vivo generated DNA; and wherein in the second of said sets said CG loci are constitutively unmethylated in in vivo generated DNA;
(b) determining the ratio between methylated CG loci and total CG loci in said first set of CG loci,
(c) determining the ratio between unmethylated CG loci and total CG loci in said second set of CG loci, and
(d) comparing the ratio obtained in step b to a corresponding ratio obtained from an in vitro generated reference sample, and comparing the ratio obtained in step c to a predefined threshold value, wherein if the ratio obtained in step b is significantly greater than the corresponding ratio obtained from the reference sample, and the ratio obtained in step c is greater than a predefined threshold value, this is indicative that nucleic acids in the test sample are authentic, wherein if the ratio obtained in step b is not significantly greater than the corresponding ratio obtained from the reference sample, and/or the ratio obtained in step c is not greater than a predefined threshold value, this is indicative that nucleic acids in the test sample are not authentic.
18. A method according to claim 5 wherein determination of the methylation pattern is performed by analyzing two sets each set comprising at least one CG loci, said analysis comprising:
(a) determining the methylation status of each CG locus in said two sets of CG loci wherein in the first of said sets said CG loci are constitutively methylated in in vivo generated DNA; and wherein in the second of said sets said CG loci are constitutively unmethylated in in vivo generated DNA;
(b) determining the ratio between methylated CG loci and total CG loci in said first set of CG loci, (c) determining the ratio between unmethylated CG loci and total CG loci in said second set of CG loci, and
(d) comparing the ratios obtained in steps b and c to corresponding ratios obtained from an in vivo generated reference sample, wherein if both ratios are comparable, this is indicative that nucleic acids from the test sample are authentic and wherein if at least one of the ratios is not comparable, this is indicative that nucleic acids from the test sample are not authentic.
19. A method according to any of claims 13-18, wherein the determination of the methylation pattern is performed using bisulfite sequencing.
20. A method according to any of claims 13-18 wherein determination of the methylation pattern is performed using methylation specific PCR.
21. A method according to any of claims 13-18 wherein the determination of the methylation pattern is performed using methylation-sensitive endonuclease digestion.
22. A method according to claims 13-21 wherein said CG loci are amplified using loci specific primers.
23. A method according to claim 22 wherein said loci specific primers are selected from the group consisting of SEQ ID NO. 1-15.
24. A method according to claim 5 wherein screening for non-genomic sequences in said nucleic acids comprises the detection of primer dimers, plasmid sequences, non- genomic sequences ligated to ends of genomic sequences, or non-genomic sequences originating from degenerate primers used in in vitro generation of the nucleic acid sample, wherein detection of said non-genomic sequences is indicative that said nucleic acids are not authentic, and wherein lack of detection of said non-genomic sequences is indicative that said nucleic acids are authentic.
25. A method according to claim 24 wherein the presence of said non-genomic sequences is detected by a method comprising:
(a) cloning of the nucleic acids from the test sample, and
(b) sequencing the cloned molecules.
26. A method according to claim 5 wherein determining the distribution of nucleic acid fragment lengths in said nucleic acids comprises:
(a) subjecting nucleic acids from a test sample to size fractionation; and (b) determining the distribution of fragment lengths for said nucleic acids; wherein the absence of fragments larger than about 10 kilobases in said nucleic acids is indicative that the nucleic acids are not authentic, wherein the presence of fragments larger than about 10 kilobases in said nucleic acids is indicative that the nucleic acids are authentic.
27. A method according to claim 5 wherein determining the distribution of nucleic acid fragment lengths in said nucleic acids comprises:
(a) subjecting nucleic acids from a test sample and from an in vivo generated reference sample to size fractionation;
(b) determining the distribution of fragment lengths for said nucleic acids;
(c) comparing the distributions obtained in step (b); and
(d) determining the probability that both distributions represent random samplings from the same source; wherein, when said probability determined in step (d) is less than about 0.05, this is indicative that the nucleic acids from the test sample are not authentic, and wherein, when said probability determined in step (d) is equal to or larger than about 0.05, this is indicative that the nucleic acids from the test sample are authentic.
28. A method according to claim 5 wherein detecting RNA in said nucleic acids is performed by RT-PCR on one or more specific loci, wherein the absence of RT-PCR amplification products indicates that the nucleic acids are not authentic, and wherein the presence of RT-PCR amplification products indicates that the nucleic acids are authentic.
29. A method according to any of claims 5-28 wherein said biological sample is selected from a group consisting of: blood, saliva, hair, semen, urine, feces, skin, epidermal cell, buccal cell, and bone sample.
30. A method according to any one of claims 1-29 wherein verification of authenticity is carried out for forensic uses.
31. A method according to any one of claims 1-30 wherein said nucleic acids are from a human source.
32. A method according to any one of claims 1-31 wherein said nucleic acids are genomic DNA5 cDNA, hnRNA, mRNA, rRNA, tRNA, fragmented nucleic acids, or nucleic acids obtained from sub cellular organelles.
33. A kit for verifying the authenticity of nucleic acids or a biological sample containing nucleic acids, wherein the kit comprises:
(a) reagents for carrying out at least one procedure selected from the group consisting of: i. determining the methylation pattern of said nucleic acids, ii. amplifying a set of loci from said nucleic acids, iii. calculating the representation bias in said nucleic acids, iv. calculating the amount of PCR stutter of said nucleic acids, v. screening for non-genomic sequences in said nucleic acids, vi. determining the distribution of nucleic acid fragment lengths in said nucleic acids, vii. detecting RNA in the biological sample, and
(b) instructions for using the kit for verifying the authenticity of said nucleic acids and/or biological sample.
34. A kit according to claim 33 wherein said kit comprises reagents for determining the methylation pattern of said nucleic acids.
35. A kit according to claim 33 wherein said kit comprises reagents for amplifying a set of loci from said nucleic acids.
36. A kit according to claim 33 wherein said kit comprises reagents for calculating the representation bias in said nucleic acids.
37. A kit according to claim 33 wherein said kit comprises reagents for detecting RNA in said nucleic acids.
38. A kit according to claim 33 wherein said kit comprises reagents for calculating the amount of PCR stutter of said nucleic acids.
39. A kit according to claim 33 wherein said kit comprises reagents for screening for non-genomic sequences in said nucleic acids.
40. A kit according to claim 33 wherein said kit comprises reagents for determining the distribution of nucleic acid fragment lengths.
41. Use of at least one procedure selected from the group consisting of: (a) determining the methylation pattern of said nucleic
(b) amplifying a set of loci from said nucleic acids,
(c) calculating the representation bias in said nucleic acids,
(d) calculating the amount of PCR stutter of said nucleic acids,
(e) screening for non-genomic sequences in said nucleic acids,
(f) determining the distribution of nucleic acid fragment lengths in said nucleic acids, and
(g) detecting RNA in the biological sample, for verifying the authenticity of nucleic acid molecules or a biological sample containing nucleic acids.
42. Use according to claim 41 wherein in step (b) said amplifying is carried out using PCR or Restriction and Circularization- Aided Rolling Circle Amplification.
43. Use according to claim 42 wherein the PCR is performed using CODIS STR primers and non-CODIS STR primers and wherein concurrent presence of CODIS STR PCR products and absence of non-CODIS STR PCR products in the sample is indicative that said sample is not authentic.
44. Use according to claim 41 wherein calculating the representation bias comprises:
(a) defining a set of genomic loci;
(b) Calculating the Relative Copy Number (RCN) of each locus and/or allele in the set;
(c) calculating the Representation Bias Value (RBV) of the test sample; and
(d) calculating a likelihood parameter representing the likelihood of obtaining an RBV equal to or greater than the RBV obtained in step c in an in vivo generated DNA sample;
Wherein when the value of the likelihood parameter obtained in step (d) is smaller than a predefined threshold the nucleic acids from the test sample are not authentic, and when the value of the likelihood parameter obtained in step (d) is equal to or larger than a predefined value, the nucleic acids from the test sample are authentic.
45. Use according to claim 41 wherein calculating the representation bias comprises: (a) defining a set of genomic loci;
(b) Calculating the Relative Copy Number (RCN) of each locus and/or allele in the set for a test sample and for a reference sample;
(c) calculating the Representation Bias Value (RBV) of the test sample; and
(d) calculating a likelihood parameter representing the likelihood of obtaining an RBV equal to or greater than the RBV obtained in step c in an in vivo generated DNA sample; wherein when the ratio between the value of the likelihood parameter obtained from the test sample and the value of the likelihood parameter obtained from the reference sample is smaller than a predefined value the nucleic acids from the test sample are not authentic, and when said ratio is equal to or larger than a predefined value, the nucleic acids from the test sample are authentic.
46. Use according to claim 41 wherein calculating the amount of PCR stutter comprises:
(a) subjecting the test sample to PCR analysis using primers specific to selected genetic loci;
(b) analyzing the PCR amplification products using capillary electrophoresis;
(c) processing the capillary electrophoresis data for detection of alleles and stutter peaks;
(d) determining the size and/or area of the -1 and/or +1 stutter fraction;
(e) calculating the likelihood parameters representing the likelihoods of obtaining the stutter values obtained in step d in an in vivo generated nucleic acid sample;
(f) calculating the joint likelihood value of the test sample, representing the likelihood that the test sample was generated in vivo; wherein when the joint likelihood value obtained in step (f) is smaller than a predefined threshold the nucleic acids from the test sample are not authentic, and when the joint likelihood value obtained in step (f) is equal to or larger than a predefined value, the nucleic acids from the test sample are authentic.
47. Use according to claim 41 wherein calculating the amount of PCR stutter comprises: (a) subjecting the test sample and a reference sample to PCR analysis using primers specific to selected genetic loci;
(b) analyzing the PCR amplification products using capillary electrophoresis;
(c) processing the capillary electrophoresis data for detection of alleles and stutter peaks;
(d) determining the size and/or area of the -1 and/or +1 stutter fraction;
(e) calculating the likelihood parameters representing the likelihoods of obtaining the stutter values obtained in step d in an in vivo generated nucleic acid sample;
(f) calculating the joint likelihood value of the test sample, representing the likelihood that the test sample was generated in vivo; wherein when the ratio between the value of the joint likelihood parameter obtained from the test sample in step f and the value of the joint likelihood parameter obtained from a reference sample is smaller than a predefined value the nucleic acids from the test sample are not authentic, and when said ratio is equal to or larger than a predefined value, the nucleic acids from the test sample are authentic.
48. Use according to any of claims 44-47 wherein the likelihood parameter is calculated by comparison to a database or calculated by comparison to a normal distribution of corresponding values.
49. Use according to claim 41 wherein determination of the methylation pattern is performed by analyzing a set of at least one CG loci, said analysis comprising:
(a) determining the methylation status of each CG locus in said set of CG loci wherein said CG loci are constitutively methylated in in vivo generated DNA;
(b) determining the ratio between methylated CG loci and total CG loci in said set of CG loci,
(c) comparing the ratio obtained in step b to a predefined threshold value, wherein a ratio lower than said threshold value is indicative that said nucleic acids are not authentic, and wherein a ratio equal to or larger than said threshold value is indicative that said nucleic acids are authentic.
50. Use according to claim 41 wherein determination of the methylation pattern is performed by analyzing a set of at least one CG loci, said analysis comprising:
(a) determining the methylation status of each CG locus in said set of CG loci wherein said CG loci are constitutively methylated in in vivo generated DNA;
(b) determining the ratio between methylated CG loci and total CG loci in said set of CG loci,
(c) comparing the ratio obtained in step b to a corresponding ratio obtained from an in vitro generated reference sample, wherein a significantly larger ratio obtained from the test sample in comparison to the corresponding ratio obtained from the reference sample is indicative that said nucleic acids are authentic, and wherein the ratio obtained from the test sample is not significantly larger than the corresponding ratio obtained from the test sample is indicative that said nucleic acids are not authentic.
51. Use according to claim 41 wherein determination of the methylation pattern is performed by analyzing a set of at least one CG loci, said analysis comprising:
(a) determining the methylation status of each CG locus in said set of CG loci wherein said CG loci are constitutively methylated in in vivo generated DNA;
(b) determining the ratio between methylated CG loci and total CG loci in said set of CG loci,
(c) comparing the ratio obtained in step b to a corresponding ratio obtained from an in vivo generated reference sample, wherein comparable ratios of the test sample and the reference sample are indicative that said nucleic acids are authentic, and wherein the ratios of the test sample and the reference sample are not comparable is indicative that said nucleic acids are not authentic.
52. Use according to claim 41 wherein determination of the methylation pattern is performed by analyzing two sets each set comprising at least one CG loci, said analysis comprising:
(a) determining the methylation status of each CG locus in said two sets of CG loci wherein in the first of said sets said CG loci are constitutively methylated in in vivo generated DNA; and wherein in the second of said sets said CG loci are constitutively unmethylated in in vivo generated DNA;
(b) determining the ratio between methylated CG loci and total CG loci in said first set of CG loci,
(c) determining the ratio between unmethylated CG loci and total CG loci in said second set of CG loci, and
(d) comparing the ratio obtained in steps b and c to predefined threshold values, wherein when both ratios are greater than said predefined ratios said nucleic acids are authentic, and wherein if at least one ratio is not greater than said predefined ratio said nucleic acids are not authentic.
53. Use according to claim 41 wherein determination of the methylation pattern is performed by analyzing two sets each set comprising at least one CG loci, said analysis comprising:
(a) determining the methylation status of each CG locus in said two sets of CG loci wherein in the first of said sets said CG loci are constitutively methylated in in vivo generated DNA; and wherein in the second of said sets said CG loci are constitutively unmethylated in in vivo generated DNA;
(b) determining the ratio between methylated CG loci and total CG loci in said first set of CG loci,
(c) determining the ratio between unmethylated CG loci and total CG loci in said second set of CG loci, and
(d) comparing the ratio obtained in step b to a corresponding ratio obtained from an in vitro reference sample, and comparing the ratio obtained in step c to a predefined threshold value, wherein if the ratio obtained in step b is significantly greater than the corresponding ratio obtained from the reference sample, and the ratio obtained in step c is greater than a predefined threshold value, this is indicative that said nucleic acids are authentic, and where in if the ratio obtained in step b is not significantly greater than the corresponding ratio obtained from the reference sample, and/or the ratio obtained in step c is not greater than a predefined threshold value, this is indicative that said nucleic acids are not authentic.
54. Use according to claim 41 wherein determination of the methylation pattern is performed by analyzing two sets each set comprising at least one CG loci, said analysis comprising:
(a) determining the methylation status of each CG locus in said two sets of CG loci wherein in the first of said sets said CG loci are constitutively methylated in in vivo generated DNA; and wherein in the second of said sets said CG loci are constitutively unmethylated in in vivo generated DNA;
(b) determining the ratio between methylated CG loci and total CG loci in said first set of CG loci,
(c) determining the ratio between unmethylated CG loci and total CG loci in said second set of CG loci, and
(d) comparing the ratios obtained in steps b and c to corresponding ratios obtained from an in vivo reference sample, wherein if both ratios are comparable, this is indicative that said nucleic acids are authentic, and wherein if at least one of the ratios is not comparable, this is indicative that said nucleic acids are not authentic.
55. A method according to any of claims 49-54, wherein the determination of the methylation pattern is performed using bisulfite sequencing.
56. Use according to any of claims 49-54 wherein the methylation pattern is determined using methylation specific PCR.
57. Use according to any of claims 49-54 wherein the methylation pattern is determined using methylation-sensitive endonuclease digestion.
58. Use according to any of claims 49-57 wherein said CG loci are amplified using loci specific primers.
59. Use according to claim 58 wherein said loci specific primers are selected from the group consisting of SEQ ID NO. 1-15.
60. Use according to claim 41 wherein determining the distribution of nucleic acid fragment lengths in said nucleic acids comprises:
(a) subjecting nucleic acids from a test sample to size fractionation; and
(b) determining the distribution of fragment lengths for said nucleic acids; wherein the absence of fragments larger than about 10 kilobases in said nucleic acids is indicative that the nucleic acids are not authentic, and wherein the presence of fragments larger than about 10 kilobases in said nucleic acids is indicative that the nucleic acids are authentic.
61. Use according to claim 41 wherein determining the distribution of nucleic acid fragment lengths in said nucleic acids comprises:
(a) subjecting nucleic acids from a test sample and from an in vivo generated reference sample to size fractionation;
(b) determining the distribution of fragment lengths for said nucleic acids;
(c) comparing the distributions obtained in step (b); and
(d) determining the probability that both distributions represent random samplings from the same source; wherein, when said probability determined in step (d) is less than about 0.05, this is indicative that the nucleic acids from the test sample are not authentic, and wherein, when said probability determined in step (d) is equal to or larger than about 0.05, this is indicative that the nucleic acids from the test sample are authentic.
62. Use according to claim 41 wherein detecting RNA in said nucleic acids is performed by RT-PCR on one or more specific loci, wherein the absence of RT-PCR amplification products indicates that the nucleic acids are not authentic, and wherein the presence of RT-PCR amplification products indicates that the nucleic acids are authentic.
63. Use according to any one of claims 41-62 wherein said biological sample is selected from a group consisting of: blood, saliva, hair, semen, urine, feces, skin, epidermal cell, buccal cell, and bone sample.
64. Use according to any one of claims 41-63 wherein verification of authenticity is carried out for forensic uses.
65. Use according to any one of claims 41-64 wherein said nucleic acid is from a human source.
66. Use according to any one of claims 41-65 wherein said nucleic acid is selected from a group consisting of genomic DNA, cDNA, hnRNA, mRNA, rRNA, tRNA, a fragmented nucleic acid, and nucleic acids obtained from sub cellular organelles.
PCT/IL2009/000009 2008-01-03 2009-01-04 Methods for dna authentication WO2009083989A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US625808P 2008-01-03 2008-01-03
US61/006,258 2008-01-03

Publications (1)

Publication Number Publication Date
WO2009083989A1 true WO2009083989A1 (en) 2009-07-09

Family

ID=40551064

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IL2009/000009 WO2009083989A1 (en) 2008-01-03 2009-01-04 Methods for dna authentication

Country Status (1)

Country Link
WO (1) WO2009083989A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011132061A3 (en) * 2010-04-20 2012-07-26 Nucleix Methylation profiling of dna samples
WO2011070441A3 (en) * 2009-12-11 2012-11-29 Nucleix Categorization of dna samples
CN104673907A (en) * 2015-02-12 2015-06-03 上海市刑事科学技术研究院 System and method for detecting STR subtype at high throughput
US9089511B2 (en) 2008-07-25 2015-07-28 Reven Pharmaceuticals, Inc. Compositions and methods for the prevention and treatment of cardiovascular diseases
US9458503B2 (en) 2009-07-02 2016-10-04 Nucleix Methods for distinguishing between natural and artificial DNA samples
US9476100B1 (en) 2015-07-06 2016-10-25 Nucleix Ltd. Methods for diagnosing bladder cancer
US9752187B2 (en) 2009-12-11 2017-09-05 Nucleix Categorization of DNA samples
US9783850B2 (en) 2010-02-19 2017-10-10 Nucleix Identification of source of DNA samples
US11434528B2 (en) 2019-03-18 2022-09-06 Nucleix Ltd. Methods and systems for detecting methylation changes in DNA samples

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002010445A2 (en) * 2000-08-02 2002-02-07 Epigenomics Ag Method for determining the age of individuals
WO2003025215A1 (en) * 2001-09-14 2003-03-27 The University Of Queensland Detection of dna methylation
WO2003091382A2 (en) * 2002-04-23 2003-11-06 Accenture Global Services Gmbh Dna authentication based on scattered-light detection
WO2006004659A1 (en) * 2004-06-30 2006-01-12 Applera Corporation Methods for analyzing short tandem repeats and single nucleotide polymorphisms

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002010445A2 (en) * 2000-08-02 2002-02-07 Epigenomics Ag Method for determining the age of individuals
WO2003025215A1 (en) * 2001-09-14 2003-03-27 The University Of Queensland Detection of dna methylation
WO2003091382A2 (en) * 2002-04-23 2003-11-06 Accenture Global Services Gmbh Dna authentication based on scattered-light detection
WO2006004659A1 (en) * 2004-06-30 2006-01-12 Applera Corporation Methods for analyzing short tandem repeats and single nucleotide polymorphisms

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BALLANTYNE ET AL: "Decreasing amplification bias associated with multiple displacement amplification and short tandem repeat genotyping", ANALYTICAL BIOCHEMISTRY, ACADEMIC PRESS INC, NEW YORK, vol. 368, no. 2, 7 August 2007 (2007-08-07), pages 222 - 229, XP022189507, ISSN: 0003-2697 *
SHINDE DEEPALI ET AL: "Taq DNA polymerase slippage mutation rates measured by PCR and quasi-likelihood analysis: (CA/GT)n and (A/T)n microsatellites.", NUCLEIC ACIDS RESEARCH 1 FEB 2003, vol. 31, no. 3, 1 February 2003 (2003-02-01), pages 974 - 980, XP002525338, ISSN: 1362-4962 *
SUMI H ET AL: "Applicability of the parentally imprinted allele (PIA) typing of a VNTR upstream the H19 gene to forensic samples of different tissues", LEGAL MEDICNE, JAPANESE SOCIETY OF LEGAL MEDICINE, TOKYO, JP, vol. 7, no. 3, 1 May 2005 (2005-05-01), pages 179 - 182, XP004853196, ISSN: 1344-6223 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9089511B2 (en) 2008-07-25 2015-07-28 Reven Pharmaceuticals, Inc. Compositions and methods for the prevention and treatment of cardiovascular diseases
US9089602B2 (en) 2008-07-25 2015-07-28 Reven Pharmaceuticals, Inc. Compositions and methods for the prevention and treatment of cardiovascular diseases
US9101537B2 (en) 2008-07-25 2015-08-11 Reven Pharmaceuticals, Inc. Compositions and methods for the prevention and treatment of cardiovascular diseases
US11110053B2 (en) 2008-07-25 2021-09-07 Reven Pharmaceuticals Inc. Compositions and methods for the prevention and treatment of cardiovascular diseases
US9775798B2 (en) 2008-07-25 2017-10-03 Reven Pharmaceuticals, Inc. Compositions and methods for the prevention and treatment of cardiovascular diseases
US9458503B2 (en) 2009-07-02 2016-10-04 Nucleix Methods for distinguishing between natural and artificial DNA samples
WO2011070441A3 (en) * 2009-12-11 2012-11-29 Nucleix Categorization of dna samples
AU2010329552B2 (en) * 2009-12-11 2015-11-26 Nucleix Ltd Categorization of DNA samples
US9752187B2 (en) 2009-12-11 2017-09-05 Nucleix Categorization of DNA samples
US9783850B2 (en) 2010-02-19 2017-10-10 Nucleix Identification of source of DNA samples
JP2013524805A (en) * 2010-04-20 2013-06-20 ニュークレイックス Methylation profiling of DNA samples
WO2011132061A3 (en) * 2010-04-20 2012-07-26 Nucleix Methylation profiling of dna samples
CN104673907A (en) * 2015-02-12 2015-06-03 上海市刑事科学技术研究院 System and method for detecting STR subtype at high throughput
US9476100B1 (en) 2015-07-06 2016-10-25 Nucleix Ltd. Methods for diagnosing bladder cancer
US11434528B2 (en) 2019-03-18 2022-09-06 Nucleix Ltd. Methods and systems for detecting methylation changes in DNA samples

Similar Documents

Publication Publication Date Title
JP6082141B2 (en) Classification of DNA samples
WO2009083989A1 (en) Methods for dna authentication
US9752187B2 (en) Categorization of DNA samples
CN108070658B (en) Non-diagnostic method for detecting MSI
CN108026583A (en) HLA-B*15:02 single nucleotide polymorphism and its application
JP2023130376A (en) Kits and methods for detecting cancer-related mutations
DE10139283A1 (en) Methods and nucleic acids for the analysis of colon cancer
US9458503B2 (en) Methods for distinguishing between natural and artificial DNA samples
US7794983B2 (en) Method for genetic detection using interspersed genetic elements
CN110241234B (en) Fluorescence-labeled 32-plex InDels composite amplification system and application thereof
US20220325317A1 (en) Methods for generating a population of polynucleotide molecules
CN105886497A (en) Allelic ladder of polymorphic short tandem repeat (STR) loci as well as preparation method, identification method and application thereof
Liu et al. DNA and protein analyses of hair in forensic genetics
KR101716108B1 (en) Forensic profiling by differential pre-amplification of STR loci
US20210115435A1 (en) Error-proof nucleic acid library construction method
JP2003534778A (en) Universal variable fragment
Skrant et al. Differentiating monozygotic twins using NGS
JP2004350576A (en) Kit for detecting bladder cancer
JP2017201894A (en) Diabetes examination method
CN116970707A (en) Composite amplification kit for detecting human Y chromosome locus based on NGS technology
JP2016198027A (en) Method for diagnosing ovarian cancer
CN116103410A (en) Breeding method of Babuk sheep and Indel molecular marker of wool color character of Babuk sheep
CN117660622A (en) Methylated molecular marker for detecting lung nodules and application thereof
CN117778593A (en) Molecular marker for identifying Scolopendra subspinipes spinosus and genetic sex thereof
CN111088358A (en) Colorectal cancer molecular marker combination, application thereof, primer group and detection kit

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09700119

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09700119

Country of ref document: EP

Kind code of ref document: A1