EP3347497A2 - Analyse d'acides nucléiques par assemblage de sondes polynucléotidiques à codes barres - Google Patents

Analyse d'acides nucléiques par assemblage de sondes polynucléotidiques à codes barres

Info

Publication number
EP3347497A2
EP3347497A2 EP16845310.8A EP16845310A EP3347497A2 EP 3347497 A2 EP3347497 A2 EP 3347497A2 EP 16845310 A EP16845310 A EP 16845310A EP 3347497 A2 EP3347497 A2 EP 3347497A2
Authority
EP
European Patent Office
Prior art keywords
sequence
complementary
target
probe
polynucleotide
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP16845310.8A
Other languages
German (de)
English (en)
Other versions
EP3347497A4 (fr
Inventor
Heather Koshinsky
John D. Curry
Robert O'CALLAHAN
Adam MCCOY
Daniel Fitzpatrick
Philip H. Dickinson
Anthony C. Schweitzer
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Affymetrix Inc
Original Assignee
Affymetrix Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Affymetrix Inc filed Critical Affymetrix Inc
Publication of EP3347497A2 publication Critical patent/EP3347497A2/fr
Publication of EP3347497A4 publication Critical patent/EP3347497A4/fr
Pending legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6853Nucleic acid amplification reactions using modified primers or templates

Definitions

  • the first and second complementary probes may be complementary to first and second target sequences and may be immediately adjacent one another or adjacent one another and from one to 500 nucleotides apart.
  • the first complementary probe may have a sequence having two portions that is complementary to the target sequence and flanking both 3' and 5' of the interrogation site bar code, and the adjacent universal sequence of the first complementary probe may be 5' to the complementary sequence portion that may be 5' to the non-complementary interrogation site bar code of the first complementary probe.
  • the primer sequence may include a PCR priming sequence.
  • the non-complementary interrogation site bar code and the sample index may be 10, 1 1 , 12, 13, 14, 15 or 16 nucleotides in length, e.g. 12 or 15 nucleotides in length.
  • the interrogation site bar code may be selected from SEQ I D NO: 1 - SEQ I D NO: 384.
  • the sample index bar code may be selected from SEQ ID NO: 1 - SEQ I D NO: 73536.
  • compositions and methods may be solution-based and each of the first and second complementary probes may comprise an inosine 2, 3, 4, 5, 6, 7, 8, 9, 10, or more bases from the 3' and 5' end of the probe, respectively.
  • the disclosure provides compositions, methods and kits that may be used for genotyping, determining copy number variation, and/or for determining the presence or absence or amount of specific target polynucleotides.
  • Figures 1A-E provide a schematic depiction of compositions and methods used in nucleic acid analysis by joining barcoded polynucleotide probes. The figures are described in detail in Example 1.
  • Allele-A x-axis
  • Allele-B y- axis
  • AA animals are along the x-axis
  • BB animals are along the y-axis
  • AB animals are centered between the axes.
  • the plots show that in some cases genotype resolution is similar and in other cases the genotype resolution with one or the other placement is better. As indicated in the Fig.
  • the sequence model (where LHS-T (the first complementary probe) for the affected version of the first complementary probe is shown in the 5' to 3' direction, and the target gDNA or genomic DNA is shown in the 3' to 5' direction) shows the 10 most 3' positions of the first complementary probe containing the 3' T nucleotide mismatched to the G nucleotide in the genomic DNA sequence.
  • a second 3' position (i) is shown corresponding to the "iT2".
  • the underlined portion of the gDNA sequence is where the second complementary probe would hybridize.
  • Solid grey bars are samples that are homozygous GG, striped bars represent samples that are homozygous AA.
  • the Y-axis is the log scale of the number of reads associated with the T version of the first complementary probe.
  • the grey bars represent non-specific ligation due to the stability of the G:T mismatch.
  • the stripped bars represent specific ligation.
  • the results show that deoxyinosine placement at the 2 nd or 3 rd 3' position of the modified version of the first complementary probe significantly reduces the number of reads from non-specific ligation.
  • the deoxyinosine can be used in first complementary probes that have a 3'G and the potential for the G:T mismatch.
  • Figure 4 shows the results of a study where a small amount of target DNA was detected in a sample of background (noise) genomic DNA.
  • Figure 4A shows the average relative concordance of the two best loci for each treatment, the number of signal and noise genomes (Top), and ng of signal and noise genomes in each reaction (Bottom). The results show that as the number of signal genomes decreases the relative concordance of the two loci remain high. Even at 122 ng input signal genomes in a background of the equivalent of 250,000 noise genomes the average relative concordance is 100%. This is the detection of under 0.05% contamination of a signal genome in background of equivalent size noise genomes.
  • Figure 13 is a diagram showing the destabilization site (proximal SNP) and the marker site (target SNP) and their relative positions within polyploidy target genomes.
  • the destabilization site can be on either side of the marker/target SNP. Open arrows point to their respective sites within the target genome.
  • Figure 17 is a diagram showing probes used in genotyping methods for detecting the presence or absence of a target polynucleotide in polyploidy samples.
  • Figure 17A illustrates a scenario wherein no proximal SNP is present in the target DNA.
  • An upfront PCR amplification step is added using PCR primers that only amplify the unique genome or subgenome of interest based on the knowledge of the relative position of the proximal SNP(s) to the target/marker SNPs.
  • hybridizations of LHS and RHS to the PCR amplicons of the target DNA occur and LHS and RHS are ligated (cloud represents the ligation).
  • Figure 17B illustrates a scenario wherein the proximal SNP is present (the cross pointed by an arrow) in the target DNA.
  • the upfront PCR amplification is prevented by the proximal SN P(s) in the target DNA, which interferes with the binding of the PCR primer(s) to the target DNA.
  • Figure 18 illustrates the impact of an upfront PCR amplification step on sequence reads. Number of reads for Allele-A (x-axis) and Allele-B (y-axis) are shown, where each point is a unique sample.
  • Figure 18A shows results of cluster plots on genomic DNA without an upfront PCR amplification step.
  • Figure 18B shows results of cluster plots on PCR amplicons with an upfront PCR amplification step. The resolution of the cluster plots for the loci is improved with an enrichment PCR amplification step.
  • compositions, methods and kits comprising a plurality of first and second complementary probes.
  • Each first complementary probe can include a sequence that is complementary to a first target sequence of interest.
  • Each second complementary probe can include a sequence that is complementary to a second target sequence of interest.
  • first and second complementary probes hybridize to complementary first and second target sequences, first and second probes can be joined to form a product polynucleotide.
  • the disclosure further provides a plurality of samples, each potentially comprising one or more target sequences. Some samples comprise a plurality of target sequences and some samples do not comprise any target sequences.
  • polynucleotides formed from the samples and (f) determining the presence, absence, amount of copy number of each target polynucleotide in one or more samples by analyzing product polynucleotides or the complements thereof.
  • complementary probe may comprise a universal primer sequence that is complementary to a primer sequence which can be used to add one or more of (i) a sample index, (ii) an additional sequence, (iii) an additional sequence for sequence data generation or another form of detection, and (iv) another moiety.
  • the universal sequences of said first and second complementary probes may each comprise a priming sequence that can hybridize to a primer for sequence synthesis.
  • the priming sequence may include a PCR priming sequence.
  • the first complementary probe may comprise a sequence 5' to the interrogation site bar code that is complementary to the first target sequence and a sequence 3' to the interrogation site bar code that is complementary to the first target sequence.
  • the methods may be solution-based.
  • the methods of the disclosure may be for use in determining the copy number variation of a target polynucleotide, and wherein said determining comprises comparing the amount of signal produced for a product polynucleotide or the complement thereof to a known reference or to the amount of signal produced by another product polynucleotide or the complement thereof.
  • the methods of the disclosure may be for use in expression analysis in determining presence of a target polynucleotide, wherein the target polynucleotide is an RNA transcript, and wherein said determining comprises comparing the amount of signal produced for a product polynucleotide or the complement thereof to a known reference or to the amount of signal produced by another product polynucleotide or the complement thereof.
  • the first complementary probe may comprise from 5'-3': the adjacent universal sequence, the sequence portion that is complementary to the first target sequence, and the interrogation site bar code within the sequence portion that is complementary to the first target sequence.
  • the first complementary probe may comprise a sequence that is complementary to the first target sequence both 3' and 5' of the interrogation site bar code.
  • the second complementary probe may comprise from 5'-3': a sequence portion that is complementary to a second target sequence of the target polynucleotide and an immediately adjacent sequence portion that is non-complementary to the second target sequence.
  • the interrogation site bar code may be at least 10, 1 1 , 12, 13, 14, 15 or 16 nucleotides in length. Preferably, the interrogation site bar code is 12 or 15 nucleotides in length.
  • the interrogation site bar code may be selected from the group consisting of SEQ I D NO: 1 - SEQ I D NO: 384.
  • the second complementary probe may comprise an inosine (e.g. deoxyinosine) 2, 3, 4, 5, 6, 7, 8, 9, 10, or more bases from the 5' end of the probe.
  • inosine e.g. deoxyinosine
  • bases e.g. 2, 3, 4, 5, 6, 7, 8, 9, 10, or more bases from the 5' end of the probe.
  • the 3' end of the first complementary probe may be complementary to one form of a single nucleotide polymorphism (SNP) or other genetic variation.
  • SNP single nucleotide polymorphism
  • the disclosure provides a kit for determining the presence, absence, amount, copy number or characteristics of one or more target polynucleotides in a sample comprising: (a) a plurality of first and second complementary probes as disclosed herein; and (b) optionally, buffers and enzymes for ligation and enrichment.
  • the kit may further comprise a ligase.
  • the kit may further comprise software needed to interpret the data.
  • polynucleotides and reference to "a probe” includes two or more probes, or mixtures of probes, and the like.
  • adjacent means that two sequences substantially next to one another on a nucleic acid, however there may be one or more intervening bases between two adjacent sequences.
  • immediately adjacent means that two sequences are next to one another on a nucleic acid with no intervening bases between the immediately adjacent sequences.
  • allele means one of two or more alternative forms of a gene or genetic locus. If a diploid organism has two copies of the same allele, for example, AA or aa, it is homozygous at that location. If the organism has one copy of two different alleles, for example Aa, it is heterozygous at that location. Alternative nomenclature uses A and B for the alleles. A homozygous diploid organism is AA or BB at that location. A heterozygous diploid organism is AB at that location.
  • allele also applies to situations where there are three or more possible alternative forms, and can be extended as known in the art, e.g., with respect to alleles A, B, and C for a triallelic single nucleotide polymorphism.
  • array means an intentionally created collection of molecules which can be prepared either synthetically or biosynthetically.
  • An array can assume a variety of forms, such as libraries of soluble molecules and the utilization of one or more solid supports, such as glass slides, silica chips, micro particles, nanoparticles, or beads.
  • a "solid support” is any material that can be attached to a probe, target nucleotide or product nucleotide, for example, glass and modified or functionalized glass, plastics, polysaccharides, nylon, nitrocellulose, ceramics, resins, silica- based materials, carbon, metals, inorganic materials, and other polymers, for example a flow cell or another solid surface such as a bead or microarray.
  • bar code or “barcode” or “index” are used interchangeably herein with reference to a nucleotide sequence used to identify or “tag” one or more particular target or product polynucleotides.
  • a “bar code” is typically at least 5 nucleotides (nt) in length. In some embodiments, a bar code or a portion thereof may occur in a first and/or second
  • a bar code may have the same sequence present in the target polynucleotide or its complement, it may be a sequence that is partially complementary to sequence in the target polynucleotide or its complement, and it may be a sequence that has no complementarity to the target polynucleotide or its complement or may be any combination of these states.
  • a single sequence serves as both an interrogation site bar code and a sample index.
  • a single sequence has a portion that serves as an interrogation site barcode and a portion that serves as a sample index.
  • base modifications is used herein with reference to polynucleotides that comprise non-standard bases (i.e. , other than adenine, guanine, thymine, cytosine and uracil).
  • non-standard bases may serve a number of purposes, e.g. , to stabilize or destabilize hybridization; to promote or inhibit degradation; or as attachment points for detectable moieties, quencher moieties or other moieties.
  • modified bases other than the modified bases of the invention
  • base analogs are known in the art.
  • complementary polynucleotides is used herein with reference to polynucleotides that form base pairs with one another. Base pairs are typically formed by hydrogen bonds between nucleotide units in antiparallel polynucleotide strands.
  • the degree of "complementarity" between the sequence of the probe and the sequence of the target gene or the complement of the sequence of the target gene does not need to be 100 percent identical. In one embodiment, the degree of “complementarity” is less than 100 percent but sufficient to allow hybridization between the sequence of the probe and the sequence of the target gene or the complement of the sequence of the target gene under certain conditions.
  • CNV copy number variation
  • first complementary probe is used herein with reference to a
  • hybridize or “hybridization” is used herein with reference to the binding, duplexing, or annealing of a nucleic acid molecule preferentially to a particular target polynucleotide, typically, under stringent conditions.
  • stringent conditions refers to conditions under which a probe will hybridize preferentially to its target polynucleotide, and to a lesser extent to, or not at all to, other sequences.
  • stringent hybridization as used in the context of nucleic acid hybridization is sequence-dependent and is different under different environmental parameters. The dependency of hybridization stringency on buffer composition, temperature, and probe length are well known to those of skill in the art (see, e.g.
  • mismatched nucleotide is used herein with reference to a nucleotide in a target polynucleotide that is not complementary to the corresponding nucleotide in a corresponding probe or primer sequence when the sequences are hybridized to one another.
  • the complement of C is G and the complement of A is T.
  • a "C" in a probe is considered to be mismatched with a "T" in a target polynucleotide.
  • a "modified polynucleotide” may be used to refer to a nucleotide sequence comprising a universal base, for example, deoxyinosine (also referred to herein as “inosine”), 3-nitropyrrole, or 5-nitroindole.
  • deoxyinosine also referred to herein as “inosine”
  • 3-nitropyrrole or 5-nitroindole.
  • NGS next generation sequencing
  • NGS may also refer to third, fourth and additional generations of sequence data generation that are not high throughput but have other properties that distinguish them from traditional Sanger sequencing.
  • polynucleotide is represented by a sequence of letters, such as "ATGCCTG,” it will be understood that the nucleotides are in 5'->3' order from left to right (unless otherwise indicated) and that "A” denotes adenosine, “C” denotes cytidine, “G” denotes guanosine, “T” denotes thymidine, and “U” denotes uridine, unless otherwise noted.
  • A denotes adenosine
  • C denotes cytidine
  • G denotes guanosine
  • T denotes thymidine
  • U denotes uridine
  • a primer or probe comprises a region that is "perfectly complementary" to a number of contiguous nucleotides of a target molecule
  • the primer or probe may be referred to as 100% complementary to the target molecule when there are no mismatches along the length.
  • a signal may be generated directly or indirectly, during or after probe hybridization.
  • complementary probes may be the same or different.
  • the first or second complementary probe has a 5' phosphorylated nucleotide.
  • the polynucleotide that is a target for hybridization may or may not be present in a sample.
  • target polynucleotide is used herein with reference to a sequence in a nucleic acid or polynucleotide that is a target for hybridization.
  • the target polynucleotide may or may not be present in a sample.
  • the target polynucleotide comprises RNA or DNA that is partially or fully complementary to a first complementary probe and second complementary probe of the invention.
  • the target polynucleotide can usually be described using the four bases of DNA (A, T, G, and C) or the four bases of RNA (A, U, G, and C).
  • universal base is used herein with reference to bases that can aid in preventing, or decreasing the frequency of joining of molecules when the 3' end of a first complementary probe is not complementary to a target polymorphic nucleotide or nucleotides. Inosine, 3-nitropyrrole, and 5-nitroindole are examples of universal bases.
  • universal sequence is used herein with reference to a sequence component of a first or second complementary probe which may include a universal priming sequence.
  • universal primer sequence or “universal primer binding sequence” comprises a primer sequence that is complementary to a primer sequence such as a PCR primer sequence, and is used to add one or more of (i) a sample index, (ii) additional sequences, (iii) a sequence or sequences for use in sequence data generation or other forms of detection, and (iv) other moieties.
  • a primer sequence such as a PCR primer sequence
  • PCR primer sequences are typically used in pairs and the composition of the two components in the pair may not be identical. Any two pairs of PCR primer sequences may have identical sequence except for the sample index.
  • the sequence of primer #1 in both a first and second pair is identical and the sequence of primer #2 is different from the sequence of primer #1 and the sequence of primer #2 in the first and second pair is identical except for a sample index.
  • the PCR primers contain a universal sequence or sequences, and/or a sample index or indices and/or a sequence moiety or moieties with other functions.
  • a PCR reaction with universal primer sequence(s) can be used to add a sample index.
  • the universal primer sequence in the first complementary probe and its complementary portion in the first PCR primer may or may not be the same length or have a 100% complementary sequence.
  • the universal primer sequence in the second complementary probe and its complementary portion in the second PCR primer may or may not be the same length or have 100% complementary sequence.
  • a universal primer sequence can be used to add adapter sequences for binding to a solid support. In some cases, the binding to a solid support is for purposes of next generation sequencing. In other cases, the binding to a solid support is for array based detection of the product polynucleotide. In some cases, a universal primer sequence is used to add sequences or moieties for other forms of detection or sequence data generation.
  • nucleotide modification such as methylation
  • the methods of the invention may be used for identifying the presence, absence, copy number or amount (or combination thereof) of a large number of target polynucleotides in one or more samples in a solution-based hybridization assay.
  • a plurality of samples (e.g., 2-50,000) which may or may not contain one or more different target polynucleotides.
  • a plurality of first and second complementary probes, each comprising a sequence complementary to a target sequence of interest may be incubated with one or more samples under conditions that allow first and second complementary probe sequences to hybridize to complementary first target sequence and second target sequences.
  • Exemplary first and second complementary probe sequences are from about 50 to 200 nucleotides in length.
  • the first target sequence is on the left side of an interrogation site or polymorphic nucleotide.
  • the methods may be used to identify polymorphisms, for example single or multi-nucleotide polymorphisms, deletions, insertions, translocations, covalent nucleotide modifications, etc.
  • FIG. 8 depicts a variation of the methods used to determine the presence or absence of a target polynucleotide, derived from a tetraploid organism, which comprise two copies (alleles) for each target polynucleotide. Either strand of a given polymorphic locus can be analyzed for the polymorphism.
  • a plurality of first complementary probes is provided, wherein the probes correspond to a number of possible polymorphisms, polymorphic nucleotides, or alleles at a given locus.
  • the probes correspond to a number of possible polymorphisms, polymorphic nucleotides, or alleles at a given locus.
  • a single base substitution, insertion or deletion
  • a probe comprises a detectable label or moiety.
  • a probe is not labeled, such as when a probe is a capture probe, for example when the probe is used for capture on a solid surface such as a microarray or bead.
  • the label is a bar code.
  • a probe is not extendable, e.g. , by a polymerase. In some embodiments, a probe is extendable.
  • a sample can be derived from any animal, plant, microbial, viral, synthetic DNA or synthetic RNA source.
  • a "plurality of samples” refers to two or more samples, from the same or different sources. For example, each sample may be derived from a different animal or a different plant, or the samples may be from different microbial sources.
  • a plurality is 2, 5, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 , 32, 33, 34, 35, 36, 37, 38, 39, 40, 41 , 42, 43, 44, 45, 46, 47, 48, 49, 50, 51 , 52, 53, 54, 55, 56, 57, 58, 59, 60, 61 , 62, 63, 64, 65, 66, 67, 68, 69, 70, 71 , 72, 73, 74, 75, 76, 77, 78, 79, 80, 81 , 82, 83, 84, 85, 86, 87, 88, 89, 90, 91 , 92, 93, 94, 95, 96, 97, 98, 99, 100 or more, for example from 2 to 10,000 samples, from 2 to 20 samples, from 5 to 30 samples, from 10 to 50 samples, from 25 to 75 samples, from 40
  • polynucleotides can be isolated from samples using a variety of methods, for example mechanical isolation (such as glass-bead technology), chemical extraction methods, column based methods, or combinations thereof. Any DNA extraction method, a large number of which are well-known to one of skill in the art, may be used in the methods described herein.
  • the DNA in a nucleic acid sample may be double stranded, single stranded or double stranded DNA denatured into single stranded DNA. Denaturation of double stranded sequences provides two single stranded sequences one or both of which can be assayed using probes specific for the respective strands (in separate reactions).
  • Preferred nucleic acid samples comprise target polynucleotides of genomic DNA, on cDNA, DNA fragments, e.g., restriction fragments, and the like.
  • the sample Prior to combination with a complementary probe set, the sample may be treated to fragment the nucleic acid. This may occur by one or more of the following methods: physical fragmentation using for example, sonication; shearing such as acoustic shearing; needle shearing; point-sink shearing; nebulization; passage through a pressure cell; or heating; enzymatic fragmentation using for example, DNase I , another restriction endonuclease, a non-specific nuclease or a transposase; or chemical fragmentation, e.g. , using heat and divalent metal cations.
  • physical fragmentation using for example, sonication
  • shearing such as acoustic shearing
  • needle shearing point-sink shearing
  • nebulization passage through a pressure cell
  • passage through a pressure cell or heating
  • enzymatic fragmentation using for example, DNase I , another restriction endonuclease, a non-specific
  • the one or more target polynucleotides in the one or more samples may be reversibly denatured.
  • This may, for example, be achieved by a heating step e.g. heating to at least 70°C, 70°C to 100°C, 75°C to 100°C, 80°C to 98°C, 85°C to 95°C, 90°C to 100°C or 95°C to 100°C.
  • the heating step is 95°C to 100°C.
  • the heating step may be performed for at least 30 seconds, at least 1 minute, 1- 30 minutes, 2-25 minutes, 3-20 minutes, 4-15 minutes or 5-10 minutes.
  • the heating step is performed for 1-15 minutes.
  • the nucleic acid in the samples is reversibly denatured.
  • Double stranded DNA can be denatured into single stranded DNA, for example, heating to about 98 °C for about one minute.
  • Double stranded DNA is denatured into single stranded DNA using standard conditions known to those of skill in the art, for example, heating to about 98°C for about five minutes.
  • samples or samples plus first and second complementary probes may be heated to a temperature of from 70°C to 100°C, 75°C to 100°C, 80°C to 98°C, 85°C to 95°C, 90°C to 100°C, 95°C to 100°C, 70°C, 75°C, 80°C, 85°C, 86°C, 87°C, 89°C, 90°C, 91 °C, 92°C, 93°C, 94°C, 95°C, 96°C, 97°C, 98°C, 99°C or 100°C prior to hybridization.
  • a target polynucleotide may be any nucleotide sequence for which a determination of the presence, absence, amount or characteristics is desired.
  • a target polynucleotide may be preselected by the person designing a given assay, and/or be associated with a particular genotype or phenotype of interest, and/or be selected for another reason.
  • the target polynucleotide is a nucleotide sequence that contains, represents or is associated with a polymorphism.
  • alleles can be interrogated by targeting one or more nucleotide polymorphisms.
  • a polymorphism occurs at a single nucleotide position, for example, one allele may have a thymine at a given position and an alternative allele, has for example, cytosine, at the same position.
  • the nucleotide polymorphism may comprise a substitution, deletion, insertion, copy number variation, translocation, methylation or another nucleotide modification, and/or a variant DNA sequence.
  • the polymorphism may include two, three, four, or more contiguous nucleotides.
  • compositions and methods disclosed herein may find utility in identification of a single nucleotide polymorphism (SNP) in a target polynucleotide sequence.
  • SNP single nucleotide polymorphism
  • genomic DNA samples from a diploid mammal with two copies of a given SNP the SNP could be homozygous or heterozygous.
  • a triploid organism has 3 distinct alleles at a given locus.
  • Polyploid cells and organisms contain more than two paired sets of chromosomes and have a numerical change in a whole set of
  • chromosomes Polyploidy is common in plants. For example, wheat has strains that are diploid (two sets of chromosomes), tetraploid (four sets of chromosomes) and hexaploid (six sets of chromosomes). See Example 8.
  • a first and second complementary probe are incubated with one or more samples that may or may not contain a polymorphism in a target polynucleotide sequence under conditions that provide for hybridization of complementary sequences.
  • an optional third probe is provided for a particular probe set. This third probe is typically similar to either the first or second probe, but is directed to a different allele at the same sequence of interest. See Figure 1 E.
  • the complementary polynucleotide probe including a polymorphic nucleotide is complementary to the polymorphic nucleotide in a target polynucleotide sequence, then the complementary probes are joined together to create a product polynucleotide.
  • the polymorphic nucleotide on the complementary polynucleotide probe does not hybridize to the polymorphic nucleotide on the target polynucleotide, the two complementary probes typically are not joined and do not form a product polynucleotide.
  • the product polynucleotides (or a portion or portions of the product polynucleotide, its amplification products, or complements thereof) are sequenced to determine the presence or absence of the polymorphism.
  • the sample identity is also determined by sequencing.
  • an array or other readout is used to determine the presence or absence of the polymorphism.
  • capture probes or oligonucleotides provided on an array are designed to be substantially complementary to the extended part of a primer, so unextended primers will not bind to the capture probes. Alternatively, unreacted probes may be removed prior to addition to the array or sequencing
  • a sample contains one or more or a plurality of different target polynucleotides.
  • the sample comprises at least two different target polynucleotides, at least 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 , 32, 33, 34, 35, 36, 37, 38, 39, 40, 41 , 42, 43, 44, 45, 46, 47, 48, 49, 50, 51 , 52, 53, 54, 55, 56, 57, 58, 59, 60, 61 , 62, 63, 64, 65, 66, 67, 68, 69, 70, 71 , 72, 73, 74, 75, 76, 77, 78, 79, 80, 81 , 82, 83, 84, 85, 86, 87, 88, 89, 90, 91 , 92, 93, 94, 95, 96,
  • polynucleotides from 40 to 100 target polynucleotides from 50 to 120 target polynucleotides, from 60 to 130 target polynucleotides, from 70 to 140 target polynucleotides, from 80 to 150 target polynucleotides, from 90 to 170 target polynucleotides, from 100 to 200 target polynucleotides, from 150 to 250 target polynucleotides, from 200 to 300 target
  • polynucleotides from 250 to 500 target polynucleotides, from 300 to 700 target
  • polynucleotides from 200 to 3000 target polynucleotides, from 300 to 4000 target polynucleotides, from 500 to 5000 target polynucleotides, or from 100 to 10,000 target polynucleotides
  • the target nucleotides are from 20 nt to 30 nt, from 20 nt to 40 nt, from 20 nt to 50 nt, from 20 nt to 60 nt, from 20 nt to 70 nt, from 20 nt to 80 nt, from 20 nt to 90 nt, from 20 nt to 100 nt, from 20 nt to 1 10 nt, from 20 nt to 120 nt, from 20 nt to 130 nt, from 20 nt to 140 nt, from 20 nt to 150 nt, from 20 nt to 160 nt, from 20 nt to 170 nt, from 20 nt to 180 nt, from 20 nt to 190 nt, from 20 nt to 200 nt, from 20 nt to 210 nt, from 20 nt to 220 nt, from 20 nt to 230 nt, from 20 nt
  • the length of the target sequence may be varied depending upon the melting temperature ("Tm") of the sequence, pH, salt concentration, or the temperature of the incubating step.
  • Tm melting temperature
  • the Tm's of the various target polynucleotides evaluated in a given assay are typically within 1 ° C, 2° C, 3° C, 4° C, 5° C, 6° C, 7° C, 8° C, 9° C, or 10° C of each other.
  • the Tm's of the various target polynucleotides are within 1 - 3 °C, 2- 5 °C, 2 - 4 °C, 3 - 6 °C, 3 - 5 °C, 4 - 7 °C, 4 - 6 °C, 5 - 8 °C, 5 - 7 °C, 6 - 9 °C, 6 - 8 °C, 7 - 10 °C, 7 - 9 °C, 8 - 10 °C, or 8 - 9 °C of each other.
  • Hybridization is carried out under various conditions known in the art. Stringent conditions are hybridization conditions under which a polynucleotide will hybridize preferentially to its target subsequence, and optionally, to a lesser extent, or not at all, to other sequences in a mixed population.
  • stringent hybridization conditions are selected to be about 5° C lower than the thermal melting point (Tm) for a specific sequence at a defined ionic strength and pH.
  • Very stringent conditions are selected to be equal to the Tm for a particular probe.
  • a number of aspects of the hybridization reaction conditions may be varied including but not limited to the temperature of the hybridization reaction, the length of incubation, and the ionic strength of the hybridization buffer.
  • first and second complementary probes may be joined.
  • first and second complementary probes are hybridized to target specific sequences adjacent each other, the respective 5'-phosphorylated and 3'-hydroxylated ends of a probe pair may be joined by any suitable means known in the art.
  • first and second complementary probes may be joined non- covalently.
  • the first and second complementary probes may be joined covalently.
  • the covalent joining may be accomplished by use of a ligase, for example a DNA Ligase from T. aquaticus or Ligase-65.
  • the ligase and a ligation buffer solution can be added to a solution comprising adjacent first and second complementary probes bound to target polynucleotides in a sample.
  • the hybridization complex is added to the ligation solution.
  • the temperature of the ligation reaction may be held constant for about 1 to 20 minutes, for example, about 1 minute, 2 minutes, 3 minutes, 4 minutes, 5 minutes, 6 minutes, 7 minutes, 8 minutes, 9 minutes, 10 minutes, 1 1 minutes, 12 minutes, 13 minutes, 14 minutes, 15 minutes, 16 minutes, 17 minutes, 18 minutes, 19 minutes, 20 minutes or longer than 20 min.
  • the ligation reaction is carried out at about 54°C.
  • the target polynucleotide may be converted to cDNA before hybridization with the first and second complementary probes, or the RNA transcript may serve as the hybridization target for the first and second complementary probes.
  • the first and second complementary probes may continue to comprise DNA within embodiments that utilize ligation for joining the first and second complementary probes as the joining step may be modified by methods known in the art to facilitate DNA ligation on an RNA template (e.g. , see U.S. Patent No. 8,790,873).
  • Exemplary ligases for ligating DNAs on an RNA template include SplintR PBCV-1 DNA Ligase or Chlorella virus DNA Ligase.
  • a first and/or second complementary probe comprises a bar code that allows the sample, and/or the target sequence (locus and/or polymorphism or interrogation site) to be identified.
  • the interrogation site bar code may identify only an allele. In such cases it is partially or completely non- complementary to the target sequences. In some embodiments the interrogation site bar code is not complementary to the target polynucleotide sequence.
  • the interrogation site bar code is within the first target sequence, and is therefore not complementary to the first target sequence, as shown for example in Figure 1.
  • An interrogation site barcode is typically 5 or more nucleotides in length.
  • Exemplary interrogation site barcode sequences are 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, or 30 or more nucleotides in length.
  • an interrogation site barcode comprises at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 1 1 , at least 12, at least 13, at least 14, or at least 15 or more nucleotides.
  • a first complementary probe includes an interrogation site bar code and when the first complementary probe is complementary to a first target sequence of a target polynucleotide, the interrogation site bar code sequence does not hybridize to the target, however, the 5' and 3' portions flanking the interrogation site bar code are portions of the first complementary probe that are complementary to the first target sequence. See Figure 1.
  • a sample index is typically 5 or more nucleotides in length. In certain exemplary embodiments, sample indices are 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, or 30 or more nucleotides in length.
  • the total number of unique sample indices is about 16, 128 based on 12-mer sequences. In some embodiments, the total number of unique sample indices is about 50,000 based on 15-mer sequences. In other embodiments, the total number of unique sample indices is about 66,000 based on 15-mer sequences.
  • the sample index is used to determine the identity of a sample by sequencing the sample index for each product polynucleotide.
  • An enriching step may be included in an assay of the invention before the analysis step.
  • the enriching step serves to increase the amount of product polynucleotide and the ratio of product polynucleotide to non-product polynucleotide in the reaction mixture. This may be accomplished by selection of the product polynucleotide and/or removal of non- product polynucleotides.
  • the enriching step is based on size, affinity, charge, or sequence, or by removal of some or all of the non-product polynucleotides, for example by selection, segregation or digestion.
  • the joining and enrichment steps may occur in the same or different reaction mixtures.
  • a product polynucleotide may be selected based on the presence of a specific sequence, for example, a sample index, or a sequence such as the complementary sequence.
  • the product polynucleotide may comprise a bar code that is designed to be selected during an enrichment step.
  • enrichment includes an amplification step.
  • a sample index may be incorporated into the product polynucleotides during the amplification step, using any amplification reaction known to those of skill in the relevant art, e.g. , polymerase chain reaction (PCR).
  • PCR polymerase chain reaction
  • Primer binding sequences may be incorporated into first and/or second complementary polynucleotide probes to facilitate amplification of product polynucleotides, whether linear or exponential. Primer binding sites are used to bind primers to initiate primer elongation or amplification. Primer binding sites are typically located in parts of the probe other than in the first or second target sequence. In some embodiments, the primer binding site is located in a sequence which is non-complementary to the target polynucleotide.
  • PCR is used to add sample indices to product
  • the PCR primers can comprise a sequence that is complementary to a portion of the product polynucleotide or the first or second complementary probe. For example, when a first and a second PCR primer are used to direct PCR amplification of a product
  • the first PCR primer may comprise a sequence that is complementary to a sequence on the product polynucleotide
  • the second PCR primer may comprise a sequence that is non complementary to a sequence on the product polynucleotide.
  • two different sample indices are incorporated into a product polynucleotide and thereby aid in increasing the number of samples that can be identified and thus analyzed in a single assay.
  • only one PCR primer includes a sample index or bar code.
  • enrichment is carried out using PCR amplification.
  • Amplification is typically carried out in an automated thermal cycler to facilitate incubation times at desired temperatures.
  • amplification comprises multiple cycles of sequential annealing of at least one primer with complementary or substantially complementary sequences to at least one target nucleic acid, synthesizing at least one strand of nucleotides in a template-dependent manner using a polymerase, and denaturing the newly-formed nucleic acid duplex to separate the strands.
  • the cycle may or may not be repeated.
  • Amplification can comprise thermocycling or can be performed isothermally.
  • amplification comprises an initial denaturation at about 90°C to about 100°C for about 1 to about 10 minutes, followed by cycling that comprises annealing at about 55°C to about 75°C for about 1 to about 30 seconds, extension at about 55°C to about 75°C for about 5 to about 60 seconds, and denaturation at about 90°C to about 100°C for about 1 to about 30 seconds.
  • cycling comprises annealing at about 55°C to about 75°C for about 1 to about 30 seconds, extension at about 55°C to about 75°C for about 5 to about 60 seconds, and denaturation at about 90°C to about 100°C for about 1 to about 30 seconds.
  • Other times and profiles may also be used.
  • primer annealing and extension may be performed in the same step at a single temperature.
  • the cycle is carried out at least 5 times, at least 10 times, at least 15 times, at least 20 times, at least 25 times, at least 30 times, at least 35 times, at least 40 times, or at least 45 times.
  • the particular cycle times and temperatures will depend on the particular nucleic acid sequence being amplified and can readily be determined by a person of ordinary skill in the art.
  • linker or adaptor sequences that facilitate annealing of PCR primers or processes involved in sequence generation can be added to a product polynucleotide using PCR or another DNA amplification process. This is in contrast to methods traditionally used in the art wherein adapters are ligated to polynucleotides that are to be sequenced. Linkers and adaptors can be used as a component of physical, chemical, or enzymatic processes.
  • Samples may be pooled after enrichment and/or amplification.
  • product polynucleotides from various samples are combined resulting in a pool of product polynucleotides, which are analyzed and/or sequenced together.
  • the invention provides compositions, methods and kits for use in target polynucleotide copy number determinations.
  • Copy number variation (“CNV") is implicated in gene control and human disease.
  • CNV may be evaluated using first and second complementary probes for each potential CNV locus and one or more loci.
  • the probes may include a bar code as described above.
  • the relative amount of each sequence may be determined, for example, using next generation sequencing, wherein the relative read counts of the CNV locus (target polynucleotide) and single copy target
  • polynucleotide(s) can be used to estimate copy number of the CNV locus (target polynucleotide).
  • CNV is determined by comparing samples with known CN and/or CNV to unknown samples or by comparison to a known reference number. For example, if a sample has two copies of a target polynucleotide sequence, the total number of sequence reads would indicate two copies of the target polynucleotide sequence when normalized to a control, and a sample with four copies of the target polynucleotide sequence would yield 4 times the number of sequence reads relative to the normalized sample. A sample with a deletion of all copies would yield no sequence reads.
  • this CNV detection is extended to determining an amount of target polynucleotide present in the sample.
  • the complementary probes can then be joined as discussed above, e.g., via enzymatic ligation.
  • the resulting polynucleotide product can then be analyzed as discussed for embodiments without a gap fill step.
  • a probe, target nucleotide or product nucleotide is attached to a solid support.
  • sample indices to the product polynucleotide by the incorporation of the sample index sequence within a primer sequence (e.g. , a PCR primer). While many possible different sample index sequences are possible (i.e., 4 ⁇ 15 different 15mer sequences), creating an optimized set must address not only differentiating one index sequence from another (e.g., ensuring that a sample index is not called incorrectly even if one of the bases is incorrectly sequenced) but also desirably addresses compatibility and optimization with the overall assay.
  • a primer sequence e.g. 4 ⁇ 15 different 15mer sequences
  • Methods of identifying sequences that will be useful include multiple steps that are outlined in this disclosure.
  • one such step can be identifying and removing those sequences that are not useful or will otherwise hinder assay performance from a previously identified set of possible sequences. Additionally, identifying those sequences that are likely to be only sometimes problematic and removing those is also important as these sequences may pass initial testing that is empirically derived, and yet perform sub-optimally under certain assay conditions.
  • the 73536 indices may be used in a 384 microtiter plate format, which is enough for 169 plates.
  • the first 16, 128 15mer indices of the 65280 indices may also be used as 12mers in a 384 microtiter plate format, which is enough for 42 plates. These indices have been optimized not only with respect to the overall set, but also on a plate by plate basis (e.g., the 1-384, 385-768, 769-1 152, etc.).
  • Orthogonality is desirably maximized not only with respect to the sample index sequence itself within a set of sample indices, but also in the context of the particular assay step. For example, for sample indices that are added to the product polynucleotide during a PCR step, maximum orthogonality considers not only the sample index sequence itself but also the sequence(s) of the PCR primer(s). There may also be other sequences that should be accounted for to maximize orthogonality.
  • orthogonality is desirably maximized with respect to the sample indices and also the primer sequence(s) and the flow cell adapter sequence(s). Maximizing specificity is also an important consideration, and aspects such as avoiding homopolymers (e.g. , avoiding use of the same base for 3 consecutive bases within the sample index) and standardizing GC content within a desired range (e.g., within 40 to 60%, 42 to 58%, 44 to 56%, and so on as may be desired or required for a particular embodiment).
  • Other assay components are also desirably considered during optimization, such as nucleic acid sequences that will be used within the assay for detection, such as the sequences within next generation sequencing library construction.
  • sequence motifs where several bases at the 3' end of the oligonucleotide have fully complementary or nearly full complementarity to a region in another oligonucleotide in the assay or to itself ( Figures 1 1 A&B and Figures 12A&B).
  • the delta G values for dimer products, or length of the complementary section that will be problematic, is inherently variable with temperature (or other specificity determinant). Lower temperatures will allow non-specific amplification by dimers having less complementarity, highly correlated to shorter region of complementarity at the 3' end.
  • a motif with 3' complementarity of 7bp of perfect match or 9 bp with one mismatch can be tolerated in some cases, but not others, dependent upon the degree of additional complementarity throughout the dimer molecule.
  • This is therefore a useful motif to use to identify otherwise useful sequences in that it identifies a number of possible oligonucleotides that will fail to perform well under most assay conditions, and also identifies those sequences that are likely to perform adequately under one set of conditions, but be prone to failure under very slightly lower specificity conditions.
  • sequences can particularly, but not exclusively, occur due to the 3' complementarity spanning different "regions" within the oligonucleotide, for example with part, but not all, of the complementarity being due to variable regions within the oligonucleotide.
  • An example here is the "barcode” portion ( Figures 1 1 A&B and Figures 12A&B).
  • the assay could be run with an anneal/extension temperature of 70°C which would further limit the off target effects but also impose other constraints on the design.
  • the disclosure of a set of 15mer sample index barcodes includes multiple specific design elements that taken together produce an optimal set of indexes both in total, and the various subsets therein for the given reaction conditions and other conditions of similar
  • Genotyping Methods for Detection of a Target Polynucleotide in Polyploidy Samples are Genotyping Methods for Detection of a Target Polynucleotide in Polyploidy Samples.
  • genotyping methods are used to detect the presence or absence of a target polynucleotide in polyploidy samples.
  • the target polynucleotide may be an SNP or the result of a deletion/insertion event (Indel).
  • Indel deletion/insertion event
  • probes are designed that are selective for the genome of interest using the proximal SNP/indel destabilization strategy to reduce ploidy through biological complexity.
  • Target markers that are genotyped on Axiom and demonstrate diploid clusters are selected. It is ensured that there are no proximal SNP/indel in the 9 bases on either side of the target markers (See Figures 14A-C).
  • one form of the first complementary probe was designed to be complementary to the target sequence with the SNP/indel (LHS), the other form of the first complementary probe was designed to be complementary to the target sequence that does not have the SN P/indel (LHS')-
  • the second complementary probe (RHS) is immediately adjacent to the 3' sequence on both forms of the first complementary probe.
  • the presence of a proximal SNP near the target SNP causes a destabilization effect that prevents ligation.
  • selection was accomplished for the genome of interest (i.e. , the target genome with the proximal SNP will generate low sequence reads).
  • Accommodating the proximal SNP in the probe design causes a locus that produces no reads to become fully functioning (See, Figures 15A&B).
  • one form of the first complementary probe was designed to be complementary to the target sequence with the SNP/indel (LHS), the other form of the first complementary probe was designed to be complementary to the target sequence that does not have the SN P/indel (LHS')-
  • the second complementary probe (RHS) is immediately adjacent to the 3' sequence on both forms of the first complementary probe.
  • Blocking/competing oligos that are complementary to target sequence containing the proximal SN P/indel are added. Blocking oligos prevent hybridization of RHS to target DNA. As a result, selection was accomplished for the genome of interest (i.e. , the target genome with the proximal SNP will generate low to none sequence reads). Accommodating the proximal SN P in the probe design with the addition of blocking oligos causes a locus that produces no reads (See, Figures 16A&B). This approach is suitable for when proximal SNP is between base 1 and 10 of target marker. Secondary polymorphism may not be
  • a 600X average coverage approach may be used for a small number of selected markers.
  • This approach requires parallel workflow (i.e. , samples are divided into two).
  • samples are split between the markers at issue with nearby proximal SN Ps or single base indels, and for the affected markers, instead of sequencing for 200X coverage as used in other approaches, it simply increases that for the split with the affected markers to having 600X coverage (i.e. , to use additional sequencing time and expense to compensate instead of trying to compensate on the upfront Eureka portion).
  • This approach has been used in other contexts, such as deep sequencing for expression with RNA-Seq to assist with detection of rare transcripts, so in a different context this would be the Eureka equivalent.
  • RNA analysis of RNA often suffers from a bias due to its conversion to cDNA prior to analysis.
  • the methods described herein are directed to direct detection of a target RNA without conversion to cDNA.
  • Detection of a target RNA includes but is not limited to interrogation of the exon boundaries which allows for detection of alternative splicing and splice variants of mRNA transcripts, detection of fusion genes (at least portions of two separate genes), and more general expression analysis of detecting expression of mRNA transcripts.
  • the methods for detection of a target RNA utilizes next generation sequencing and enables the simultaneous detection of hundreds of thousands of RNA samples for tens to thousands of loci.
  • the method for detection of a target RNA is based on ligation dependent PCR amplification and uses interrogation site probes as well as sample index barcodes that are added during PCR amplification.
  • interrogation site probes as well as sample index barcodes that are added during PCR amplification.
  • the utility of this method is demonstrated by performing a highly multiplexed reaction that uses a commercially available DNA ligase to ligate DNA probes hybridized to RNA templates.
  • the ligation products are PCR amplified.
  • Next generation sequencing data is generated from the resulting PCR products. Each read is assigned to a sample (based on the sample index) and to a locus. Examination of the sequencing data generated from the PCR products will reveal splice variants or fusion genes of mRNA transcripts, as well as the expression of mRNA transcripts.
  • Example 13 and Figures 19-21 the results of a 778-plex panel of probes designed to interrogate the RNA produced from housekeeping genes and from human gene exons selected for cancer fusion gene detection are shown.
  • RNA targets were found in the house-keeping genes.
  • the methods (and associated data analysis) for detecting and interrogating RNA targets are used in targeted studies of expression analysis, allele-specific expression analysis, alternative splicing analysis, and fusion gene detection. This method of direct detection of RNA is a simplified assay that also removes the RNA to cDNA conversion bias.
  • a sequence determination is performed using next generation sequencing, for example, Illumina sequencing.
  • polynucleotides may be directly sequenced, or a copy of the product polynucleotide, or its complement generated in the assay may be sequenced.
  • the first and/or second complementary probes may comprise a universal primer sequence.
  • the adapters for attaching product polynucleotides to an Illumina flow cell for sequencing may be added to the product polynucleotides (or reaction products) by PCR or another method of copying and/or amplifying product polynucleotides.
  • the flow cell adapters may also be added to the product polynucleotides according to other techniques known in the art, e.g. , ligation.
  • an Illumina flow cell with eight or more lanes (HiSeq® flow cells) is employed as a solid phase support.
  • Each lane can accommodate over 300 million amplified clusters and therefore can be used for high throughput analysis.
  • NextSeq® flow cells or other flow cells are used which accommodate different numbers of amplified clusters.
  • Sequencing techniques that may be used in the methods of the disclosure include next-generation sequencing techniques such as ion semiconductor sequencing (e.g. Ion Torrent sequencing), pyrosequencing (e.g. 454 sequencing), sequencing by ligation (e.g. SOLiD sequencing), sequencing by synthesis (e.g. Illumina sequencing) and single-molecule real-time sequencing (e.g. Pacific Biosciences).
  • next-generation sequencing techniques such as ion semiconductor sequencing (e.g. Ion Torrent sequencing), pyrosequencing (e.g. 454 sequencing), sequencing by ligation (e.g. SOLiD sequencing), sequencing by synthesis (e.g. Illumina sequencing) and single-molecule real-time sequencing (e.g. Pacific Biosciences).
  • Exemplary arrays include chip or platform arrays, bead arrays, liquid phase arrays, "zip- code” arrays, microarrays and the like. Materials suitable for construction of arrays such as nitrocellulose, glass, silicon wafers, optical fibers, etc. are known to those of skill in the art.
  • kits comprising reagents for performing any of the methods disclosed herein.
  • the first complementary probe may have a sequence 5' to the non-complementary interrogation site bar code of the first complementary probe that is complementary to the first target sequence and a sequence 3' to the non-complementary interrogation site bar code of the first complementary probe that is complementary to the first target sequence.
  • the kit comprises at least one PCR primer, a polymerase and a set of dNTPs for purposes of enrichment/amplification.
  • the kit comprises a ligase.
  • the kit comprises a license to use the software needed to interpret the sequence data.
  • the kit comprises instructions for use.
  • the first and second complementary probes may be provided in dried form (e.g. lyophilized). If provided in a dried form, the probes may be dried with a preservative e.g. trehalose.
  • a preservative e.g. trehalose.
  • a composition for detecting the presence, absence, absence, amount or characteristics of one or more targets in one or more samples includes: a plurality of first and second complementary probes, (i) each first complementary probe having two sequence portions that are complementary to different sections of a first target sequence, and two sequence portions that are non-complementary to the first target sequence wherein the non-complementary portions include an interrogation site bar code sequence and an a universal sequence, and (ii) each second complementary probe having a sequence portion that is complementary to a second target sequence and an immediately adjacent sequence portion that is non-complementary to the second target sequence and includes a universal sequence.
  • the first complementary probe comprises a sequence having two portions that are complementary to the target sequence both 3' and 5' of the interrogation site bar code.
  • the composition may be solution-based or bound to a solid support or portions of both.
  • part of the complementary portion of the first complementary probe is 5' of the non-complementary interrogation site bar code sequence and part of the first complementary probe is 3' of the non-complementary interrogation site bar code sequence.
  • the non-complementary interrogation site bar code sequence may be referred to as "anchored" to the target by the 5' and 3' complementary sequences of the first complementary probe.
  • the non-complementary interrogation site bar code sequence may be from about 10 to 16 nucleotides in length, for example, 10, 1 1 , 12, 13, 14, 15, 16 nucleotides in length.
  • sequences of the product polynucleotides may be determined either by direct sequencing or by sequencing of complementary sequences.
  • the methods described herein may be used to generate sequencing data that can be analyzed by a mathematical algorithm to determine the presence or absence of particular SNPs, indels and other mutations, whether particular loci are heterogeneous or homogeneous, whether a particular transcript is present or absent, the copy number of specific target polynucleotides, and/or other characteristics of the target polynucleotides.
  • compositions, methods and kits described herein are useful to analyze large numbers of samples for the presence, absence, amount or characteristics of multiple target polynucleotides in a single assay.
  • first and second complementary probes are provided in a single assay as a means to evaluate the presence, absence, amount or characteristics of multiple sequences, e.g., polymorphisms in a single assay.
  • multiple polymorphisms are determined for a plurality of samples in a single assay.
  • the compositions, methods and kits described herein find utility in genotyping and may involve next generation sequencing (NGS) technology in order to simultaneously generate a genotype for large numbers of both samples and loci in a single assay.
  • NGS next generation sequencing
  • a method for determining the presence, absence or amount of one or more target polynucleotides in two or more samples comprising the steps of:
  • each sample comprising one or more target polynucleotides, each target polynucleotide comprising a first target sequence and a second target sequence;
  • each first complementary probe having a sequence portion that is complementary to a first target sequence, and a sequence portion that is non-complementary to the first target sequence wherein the non-complementary portion includes an interrogation site bar code sequence and an adjacent universal sequence
  • each second complementary probe having a sequence portion that is complementary to a second target sequence and an immediately adjacent sequence portion that is non-complementary to said second target sequence
  • complementary probes hybridize to their complementary target polynucleotide in a sample to form a hybridization complex
  • complementary probes are complementary to first and second target sequences that are adjacent and from 1 to 500 nucleotides apart.
  • the adjacent universal sequence of said first complementary probe comprises a universal primer sequence that is complementary to a priming sequence which can be used to add one or more of (i) a sample index, (ii) an additional sequence for sequence data generation or another form of detection, (iii) additional sequences, or (iv) other moieties.
  • the immediately adjacent universal sequence of said second complementary probe comprises a universal primer sequence that is complementary to a primer sequence which can be used to add one or more of (i) a sample index, (ii) an additional sequence for sequence data generation or another form of detection, (iii) additional sequences, and (iv) other moieties.
  • the universal primer sequence includes a PCR priming sequence and a primer sequence to add additional sequences for use in sequence data generation or other forms of detection.
  • said enriching comprises, (a) providing a set of PCR priming sequences comprising a first primer that is complementary to a priming sequence on the first complementary probe, and a second primer that is complementary to a PCR priming sequence on the second complementary probe, and (b) amplifying the product polynucleotide.
  • complementary probe comprises an inosine 2, 3, 4, 5, 6, 7, 8, 9, 10, or more bases from the 3' end of the probe.
  • first and second complementary probes are complementary to first and second target sequences, and the 3' end of the first complementary probe is complementary to one form of a single nucleotide polymorphism (SNP) or other genetic variation.
  • SNP single nucleotide polymorphism
  • means for joining is treating the first and the second complementary probes that are hybridized to first and second target sequences (hybridization complex) to form a product polynucleotide using a ligase.
  • a composition for determining the presence, absence or amount of one or more target polynucleotides in a sample comprising: a plurality of first and second complementary probes, (i) each first complementary probe having two sequence portions that are complementary to different sections of a first target sequence, and two sequence portions that are non-complementary to the first target sequence wherein the non-complementary portions include an interrogation site bar code sequence and a universal sequence, and (ii) each second complementary probe having a sequence portion that is complementary to a second target sequence and an immediately adjacent sequence portion that is non- complementary to the second target sequence and includes a universal sequence.
  • composition according to paragraph 29, wherein said first complementary probe comprises a sequence 5' to the non-complementary interrogation site bar code of the first complementary probe that is complementary to the first target sequence and a sequence 3' to the non-complementary interrogation site bar code of the first complementary probe that is complementary to the first target sequence.
  • composition according to paragraph 29, wherein said first complementary probe comprises a sequence that is complementary to the target sequence both 3' and 5' of the interrogation site bar code.
  • composition according to any one of paragraphs 29-31 wherein the universal sequence of said first and second complementary probes each comprises a priming sequence that can hybridize to a primer for sequence synthesis.
  • composition according to any one of paragraphs 29-34, wherein the adjacent universal sequence of said first complementary probe is 5' to the complementary sequence that is 5' to the non-complementary interrogation site bar code of the first complementary probe.
  • composition according to paragraph 34, wherein the universal sequence is a PCR primer sequence.
  • composition according to paragraph 34, wherein the additional sequence for sequence data generation or another form of detection is an adapter for next generation sequencing.
  • composition according to paragraph 34, wherein the additional sequence for sequence data generation or another form of detection is a capture sequence.
  • composition according to paragraph 40, wherein the interrogation site bar code is 12 or 15 nucleotides in length.
  • composition according to paragraph 41 wherein the sample index is 12 or 15 nucleotides in length.
  • composition according to any one of paragraphs 29-44, wherein the first complementary probe comprises an inosine 2, 3, 4, 5, 6, 7, 8, 9, 10, or more bases from the 3' end of the probe.
  • composition according to any one of paragraphs 29-45, wherein the second complementary probe comprises an inosine 2, 3, 4, 5, 6, 7, 8, 9, 10, or more bases from the 5' end of the probe.
  • kits for determining the presence, absence, amount or characteristics of one or more target polynucleotides in a sample comprising:
  • each first complementary probe having a sequence portion that is complementary to a first target sequence, and a sequence portion that is non-complementary to the first target sequence wherein the non- complementary portion includes an interrogation site bar code sequence and an adjacent universal sequence
  • each second complementary probe having a sequence portion that is complementary to a second target sequence and an immediately adjacent sequence portion that is non-complementary to said second target sequence
  • kit according to paragraph 47 further comprising, at least one PCR primer, a polymerase, and a set of dNTPs to amplify extended target polynucleotides for purposes of enrichment.
  • kit according to paragraph 47 or paragraph 48 further comprising a ligase.
  • Performing nucleic acid analysis by joining barcoded polynucleotide probes is accomplished by providing two complementary probes that hybridize to two portions (the first target sequence and the second target sequence) of a target polynucleotide ( Figure 1A).
  • the first and second complementary probes may be immediately adjacent or separated by 1 to 500 or more nucleotides.
  • the interrogation site bar code may contain information about the locus only, the allele only, the locus and allele combined or the locus and allele as separate sequences.
  • the use of the interrogation site bar code allows the sequence that reports on a genetic locus to correlate with size, placement and nucleotide composition.
  • the first complementary probe may also contain a universal sequence ( Figures. 1A- D; dashed line).
  • This universal sequence may be called "universal primer 1". This describes its common function as a PCR priming site. However, it is understood that the universal sequence may not have this function and may have other functions including but not limited to one or more of other forms of amplification and capture.
  • the universal sequence may also serve to facilitate the addition of one or more of other sequences or of other moieties.
  • the second complementary probe has a sequence that is complementary to the second target sequence ( Figure 1A; thick line) and a universal sequence ( Figure 1A; dashed line).
  • This universal sequence may be called "universal primer 2", which describes its common function as a PCR priming site.
  • the universal sequence in the first and second complementary probes may or may not be the same sequence (or may or may not be complements of one another).
  • the first and /or second complementary probes may or may not further contain sequences for size adjustment.
  • the universal sequence is non-complementary to the target sequences.
  • complementary probe may or may not be extended to become immediately adjacent to the second complementary probe as there may be no gap between the first and second complementary probes, or there may be a gap of one or more bases between the first and second complementary probes that may be filled by a gap fill step.
  • first and second complementary probes are joined (as shown in Figure 1A, chevron) to generate a product polynucleotide (extending from the 5' universal primer 1 to 3' universal primer 2 in Figure 1 B).
  • This product polynucleotide is then the template for an amplification reaction or other form of enrichment ( Figure 1 B).
  • the enrichment is through a PCR reaction.
  • PCR primer 2 has a portion that is the complementary sequence to universal primer 2 (from the second complementary probe), a portion that is a sample index sequence and a portion that is an adaptor sequence (medium line).
  • DNA synthesis proceeds from the PCR primer 2 (closed arrow head in Figure 1 B) using the product polynucleotide as the template.
  • PCR primer 1 has a portion that is the complementary sequence to universal primer 1 (dashed line from the first complementary probe) and a portion that is an adaptor sequence (medium line). DNA synthesis proceeds from the PCR primer 1 (closed arrow head) using the product of the first round of amplification as the template.
  • PCR primer 1 can also have a portion that is a sample index sequence similar to PCR primer 2 depicted in Figure 1 B.
  • sample index (or portion thereof) is added with PCR primer 1.
  • sample index is added in both PCR primer 1 and PCR primer 2.
  • PCR primer 2 and/or PCR primer 1 a sample identification sequence (sample index) or other sample identification moiety is attached to each product
  • a sample index is added to PCR primer 1 , it is near the interrogation site bar code to facilitate sequencing of both the interrogation site bar code and the sample index while minimizing the total number of bases that need to be sequenced (e.g., without the need to sequence the first and second target sequences that would otherwise be between the interrogation side bar code and the sample index if the sample index was at least partially added with PCR primer 2).
  • sequence data is generated on portions of each amplicon (or the entire amplicon). There may or may not be portions of each amplicon where no sequence data is generated.
  • Each sequence produced is compared to a database to assign to the appropriate sample and allele and/or locus. Mis-assignments can occur due to various factors including, but not limited to, sequence error, polymerase error and nonspecific joining.
  • the tabulated number of reads is analyzed to determine presence, absence, amount or copy number of the target sequence, SNP, or genetic locus.
  • Figure 1 E there are two or more versions of the first complementary probe.
  • Each version has a different sequence at the 3' end (depicted A and B). This different sequence may be one or more bases.
  • the two versions of the first complementary probes are complementary to completely different versions of first target sequences.
  • the two or more versions of the first complementary probes are between these two extremes and retain the other elements of first complementary probes.
  • the multiple versions of first complementary probes is commonly used to generate classic genotype information.
  • the number of reads assigned to the A allele and the number of reads assigned to the B allele are compared. For each locus and taking into account the ratio of the number of reads assigned to the A allele relative to the number of reads assigned to the B allele (and mis-assignments), a sample that has reads that are predominately assigned to the A allele is AA, a sample that has reads that are predominately assigned to the B allele is BB, and a sample that has a significant number or reads assigned to both alleles is AB.
  • a and B nomenclatures are only for discrimination and do not reference any convention on a nucleotide sequence associated with the A or B allele.
  • the interrogation site bar code can be placed in various positions in the first complementary probe. It can be placed within the universal sequence, it can be placed between the universal sequence and the target specific sequence (as is common in prior art methods such as disclosed in US Patent No. US 8,460,866), and it can be placed within the target specific sequence, as exemplified herein.
  • the interrogation site bar code is placed within the target specific sequence and is non-complementary to the target polynucleotide, there are complementary sequence portions on both sides of the
  • the complementary portion was increased on the 5' end by several bases (12 bases away from the rest of the complementary region).
  • the complementary region 3' of the 6mer, or 12merinterrogation site bar code was identical in size and composition.
  • the 12 base non-complementary interrogation site bar code contains the information for the allele and the locus combined.
  • the 6 base non-complementary interrogation site bar code contains the information for the allele.
  • the information to assign the read to a locus is the sequence of the target sequence (and would be similarly contained in the database).
  • Bovine genomic DNA 50 ng/ul was placed in wells of multiwall plate and heated to 98°C for 15 minutes. Following this a portion of each sample was transferred to a new plate and mixed with a PC (12mer). These reactions were then melted at 98°C for 1 minute and then incubated at 60°C for 20°hours for hybridization. After hybridization 3.2ul of the reaction was added to a waiting plate containing 12.8ul of NEB 1X Taq DNA ligase buffer together with Taq DNA ligase enzyme. The new plate was sealed, reactions mixed, centrifuged and then held at 54°C for 15minutes followed by 98°C for 10seconds and brought to 4°C and held.
  • T:G mishybridization is possible when a C/T SNP is being detected.
  • the partial hydrogen bonding between the described G:T "mismatched" nucleotide is sufficiently stable to permit the ligase to (inefficiently) join the mismatched first complementary probe to the second complementary probe. This results in a non-specific target polynucleotide occurring 0-25% of the time.
  • a universal base, deoxyinosine was employed proximal to the interrogating 3' position of the affected first complementary probe.
  • a deoxyinosine in the unaffected version of the first complementary probe does not destabilize hybridization sufficiently to impact genotype resolution, and the ligation reaction proceeds in a (largely) specific manner and specific product polynucleotides are produced. As the position of the deoxyinosine moves to the 5' side of the first
  • Genotyping methods and associated data analysis require double or single stranded nucleic acid (NA) as the target polynucleotide.
  • the first complementary probe and the second complementary probe require access to single stranded NA for hybridization to the target polynucleotide.
  • the results of experiments have shown that to render double stranded and even single stranded NA accessible, the sample must be heated to a high temperature. Exemplary temperatures included a range of 70°C to 100°C and with heating times from 1 second to 15 minutes. This reversible denaturation step improves the detection of target polynucleotides (especially target polynucleotides that are present in the sample as double stranded).
  • the genotyping assay described herein consist of nucleic acids mixed with high salt concentrations and a blend of probes.
  • the first and second complementary probes are in solution with each single probe being at 50pM concentration in the "Probe Component” or "PC” (probes, TE, and hybridization buffer).
  • PC probes, TE, and hybridization buffer.
  • This example demonstrates an improved method of setting up a genotyping assay reaction. It is desirable to place the probe component into a reaction well and dry it down and seal the plate, providing for long term (i.e., years), of room temperature storage.
  • a set of 135 probe triplets two forms of the first complementary probe and one form of the second complementary probe for genotyping bovine genomic DNA, were dried in reaction wells.
  • a single PC was created at the working concentration of 50pM.
  • PC 3ul of the PC was placed in the wells of six columns of a 384 well plate.
  • Another PC was prepared which contained the same 135 probe triplets, TE and buffer and 0.4mM trehalose sugar.
  • the trehalose sugar is a useful preservative of dried poly nucleic acids, and it secures the dried PC to the bottom of the reaction well.
  • PC with trehalose was similarly used to add 3 ul to the wells in 6 columns of a 384 well plates.
  • One of each plate PC type (with and without trehalose), was dried to completion by placing the plates in a laminar flow hood overnight, where sterile dust free air passed over the plates.
  • One plate without trehalose plate was sealed and frozen at -20°C for storage. The dried plates were sealed and stored at room temperature.
  • the copy number analysis methods may be used to determine copy number variation (CNV) where zero copies of an allele are discriminated from one or two copies of the same allele.
  • CNV copy number variation
  • next generation sequencing reads produced from a copy number analysis assay (96 bovine samples with normalized amount of input DNA across all the samples) were compared to the database appropriate for the interrogation site bar codes in that probe component and the sample index sequences.
  • the number of reads created from each sample and a single allele of a single locus was tabulated (and includes mis- assigned reads) and analyzed.
  • animals that are BB homozygous have zero or near zero reads that have the interrogation site bar code for the A allele (at this locus).
  • Animals that are AB heterozygous have around 200 reads that have the
  • Example 8 Use of Genotyping to Evaluate a Tetraploid Genome.
  • tetraploid organisms In tetraploid organisms, four copies of an allele can exist, one on each chromosome. To mimic a tetraploid organism, DNA from two different diploid animals (same species) was mixed together, producing a sample with four copies of any given allele. A probe component containing probe triplets (two forms of the first complementary probe and one form of the second complementary probe) for multiple target polynucleotides was added and the method was carried out as described in Example 2, except that the cluster plots allowed for five genotypes.
  • complementary probes also has a unique interrogation site bar code that provided the ability to identify the allele and locus that was detected with that exact form of the first
  • Genotyping methods are used to detect the presence of a target polynucleotide in a sample.
  • the target polynucleotide may be the result of a deletion/insertion event.
  • one form of the first complementary probe was designed to be complementary to the target sequence with the deletion
  • the other form of the first complementary probe was designed to be complementary to the target sequence that does not have the deletion.
  • the second complementary probe is immediately adjacent to the 3' sequence on both forms of the first complementary probe.
  • the workflow proceeded as described in Example 2.
  • the objective was to generate a total of 96,000 complementary probes having a 15mer sample index barcode between the universal primer sequence and the adaptor sequence ( Figures 1 C and 1 D).
  • These 15mer sample index barcodes also have 12 nucleotide (nt) reduced read-length compatibility for applications in which a lower number of different samples are being processed, thus allowing potential savings in, e.g., sequencing cost and time as only the first 12 nucleotides would need to be sequenced to identify a particular sample index.
  • Index Plate Ordering the index plates were grouped by performance metrics to include, e.g., higher orthogonality/specificity in subsets of all plates. For each (384 well) plate of indexes to be generated, an optimum subset of barcodes was selected based on criteria from unassigned set of barcodes, e.g., 15/12nt read edit distances. The subsets were assigned to individual plate and calculated for performance metrics per plate. The performance metrics was based on sequencing read counts. An example of performance metric is shown below. 4i$*mz* «ay « ( iSfi s*e «d> sis s cs co nt ( Znt resssj
  • a motif in the barcode was discovered that caused interaction between the barcode and the adapter sequences. This motif was found to contain a sequence of about 7 bases (CTAGCCTCC) and can cause self-complementarity between the 3' end and the internal sequences of the complementary probes. Variations to this 7 bp motif theme were also discovered. Examples are shown in Figures 1 1A&B and Figures 12A&B.
  • a computer program was built to substitute for these problematic sequences, i.e., by incrementally optimizing the sequences as much as possible, and up to the full range of 96K barcodes. Since this particular motif seemed to affect performance more than the problematic tri-mers and edit distance, both of which can be factored in, all these were accounted for in the design/binning flows. However, with all the substitution, and under same criteria for edit distance globally, as well as locally optimized for each plate, 84,096 sample indexes were generated. The first 16, 128 of these indices can also be used as 12mers for experiments and applications where a more limited number of samples will be processed (e.g., to process ten 384 well microtiter plates, with one sample per well).
  • Example 12 Genotyping Methods for Detection of a Target Polynucleotide in Polyploidy Samples.
  • genotyping methods for detecting the presence or absence of a target polynucleotide in polyploid wheat samples are described. Ploidy reduction strategies are used to reduce the generation of sequence data in non- informative genomes.
  • the target polynucleotide may be an SNP or the result of a deletion/insertion event (Indel).
  • One form of the first complementary probe was designed to be complementary to the target sequence with the SNP (LHS), the other form of the first complementary probe was designed to be complementary to the target sequence that does not have the SN P or indel (LHS')-
  • the second complementary probe (RHS) is immediately adjacent to the 3' sequence on both forms of the first complementary probe. Selection was accomplished for the genome of interest (i.e. , the target genome with the proximal SNP will generate low sequence counts). Accommodating the proximal SNP in the probe design causes a locus that produces no reads to become fully functioning (See, Figures 15A&B). The workflow proceeded as described in Example 2.
  • blocking oligos that are complementary to the target genome sequence having the proximal SNP/indel are added to prevent hybridization of RHS to the target genome.
  • One form of the first complementary probe was designed to be complementary to the target sequence with the SNP/indel (LHS), the other form of the first complementary probe was designed to be complementary to the target sequence that does not have the SNP/indel (LHS').
  • the second complementary probe is immediately adjacent to the 3' sequence on both forms of the first complementary probe. Blocking/competing oligos that are complementary to sequences containing the proximal SNPs in the target genome are added. Blocking oligos prevent hybridization of RHS to target genome. Selection was accomplished for the genome of interest (i.e. , the target genome with the proximal SNP will generate low to none sequence reads).
  • PCR primers are designed that selectively amplify unique genome or subgenome of interest.
  • an upfront PCR amplification step is added.
  • One or both of the PCR primers that are complementary to the target genome sequence may hybridize to the proximal SN P in the target genome sequence.
  • This upfront step using PCR amplification may be in parallel workflow format (i.e. , the samples are divided into two).
  • One form of the first complementary probe was designed to be complementary to the target sequence with the SNP/indel (LHS), the other form of the first complementary probe was designed to be complementary to the target sequence that does not have the SNP/indel (LHS')-
  • the second complementary probe (RHS) is immediately adjacent to the 3' sequence on both forms of the first complementary probe.
  • An upfront PCR amplification step is added using PCR primers designed to specifically amplify unique genome or subgenome of interest.
  • the proximal SN P/indel likely destabilizes the hybridization of the PCR primers. Selection is accomplished for the desired genome of interest (i.e. , the none desired genome containing the proximal SNPs are eliminated from subsequent workflow).
  • Various locus and genome combinations are accommodated using multiple PCR primer set and probe set combinations (See, Figures 17A&B). The workflow proceeded as described in Example 2.
  • a 600X average coverage may be used for a small number of select markers.
  • This approach requires parallel workflow (i.e., samples are divided into two).
  • samples are split between the markers at issue with nearby proximal SNPs or single base indels, and for the affected markers, instead of sequencing for 200X coverage as used in other approaches, it simply increases that for the split with the affected markers to having 600X coverage (i.e., to use additional sequencing time and expense to compensate instead of trying to compensate on the upfront Eureka portion).
  • This approach has been used in other contexts, such as deep sequencing for expression with RNA-Seq to assist with detection of rare transcripts, so in a different context this would be the Eureka equivalent.
  • Example 13 Methods for Detection of a Target RNA without Conversion to cDNA.
  • the method described herein has potential uses in interrogating RNA, with the benefit of detecting strand specific allele usages, copy number determination of RNA and mRNA transcripts, alternative splicing and splice variants analysis, and detection of fusion genes.
  • the method described in this example is a direct extension of the herein described DNA based multiplexed ligation- mediated PCR detection methods, but with RNA rather than DNA as targets.
  • a set of 778 loci amongst various human mRNA transcripts where the exon to exon boundaries were known were chosen. Probes were designed to interrogate these mRNA transcripts. The fusion genes are usually joined between the 5' end of one gene and the 3' end of another gene. The breakpoint of each gene occurs at varying locations in the DNA, but most often occurs in introns so that spliced RNA usually finds the breakpoint at an exon boundary. The probes were designed to cover the ends of exons bracketing introns with known fusion breakpoints. For positive controls, probes were designed to place at the end of exons bracketing introns for Beta Actin and GAPDH genes, which have no known fusions. For negative controls, probes were designed to place at the ends of introns for Beta Actin and GAPDH genes, which would only amplify in the presence of DNA.
  • the ligation mediated PCR requires a pair of two types of DNA probes, a first complementary probe, and a second complementary probe that is phosphorylated at the 5' end.
  • a DNA or RNA specific ligase will be able to join the 3 ⁇ group of the first
  • the first complementary probes were designed to hybridize at the exon boundaries and the second complementary probes were designed to hybridize at an exon that is immediately adjacent to the exon to which the first complementary probes hybridize. For example, if the first complementary probes were designed to hybridize to Exon II, the respective second complementary probes would be designed to hybridize to Exon III. In this way, the first and second complementary probe pair would only be able to detect the RNA transcripts containing properly spliced Exon II to Exon III events.
  • second complementary probes can be designed, e.g., when second complementary probes are designed for hybridizing to Exon IV, the assay will detect Exon ll/Exon IV splice variants.
  • the DNA probes were between 20 and 50 bases in length in order to achieve the calculated annealing temperature of between 68°C to 74°C.
  • Each first complementary probe has a common/universal PCR primer site at the 5' end while each second complementary has a different
  • the ligation mixture including the Splintr enzyme (units/Rx) with its 1X reaction buffer was dispensed into 32 ⁇ per reaction and cooled to wet ice temperature.
  • the PCR mixture included a standard PCR reaction buffer, a common PCR primer in the first complementary probe bearing an lllumina sequencing flow cell binding sequence and a common PCR primer in the second complementary probe that is uniquely indexed (sample index) near the end of the other half of the lllumina flow cell binding sequence.
  • the PCR primers in this mixture will amplify any ligated product of the first and second complementary probes.
  • Sample indexed PCR reaction products were pooled, cleaned up on a silica column to remove excess salt, enzyme and small probes and primers. This pooled library was quantified and qualified for size requirements. The hallmark of a successful reaction is the product size shift from a 150bp long (noise artifact) to a 210bp long (the signal) resulting from the PCR amplification of successfully ligated products of the first and second complementary probes.
  • Sequencing of the PCR amplification products will reveal information of the first and second complementary probe junctions, e.g. , of Exon l l/Exon I I I or perhaps Exon l l/Exon IV splice variants.
  • Duplicate reads can be counted (binned) and those counts can be used to infer the relative copy number of the RNA transcripts.
  • the total read counts of the mRNA transcripts of the g!yceraldehyde 3-phosphate dehydrogenase (GADPH) gene (arbitrarily assigned as locus 745 of the 778 loci panel) against a titration of the input RNA study indicated that the ligation reaction is dependent upon the amount of input RNA ( Figure 21 ).

Abstract

L'invention concerne des compositions, des procédés et des kits permettant de déterminer la présence, l'absence, la quantité, le nombre de copies, ou d'autres caractéristiques d'une ou de plusieurs séquences polynucléotidiques dans deux échantillons ou plus et l'utilisation de ceux-ci dans le génotypage, l'évaluation de la variation du nombre de copies, l'analyse de l'expression, la détermination de variants d'épissage et de gènes de fusion, et d'autres analyses génétiques.
EP16845310.8A 2016-01-31 2016-11-08 Analyse d'acides nucléiques par assemblage de sondes polynucléotidiques à codes barres Pending EP3347497A4 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201662289303P 2016-01-31 2016-01-31
US201662317879P 2016-04-04 2016-04-04
US201662353088P 2016-06-22 2016-06-22
PCT/US2016/060991 WO2017044993A2 (fr) 2015-09-08 2016-11-08 Analyse d'acides nucléiques par assemblage de sondes polynucléotidiques à codes barres

Publications (2)

Publication Number Publication Date
EP3347497A2 true EP3347497A2 (fr) 2018-07-18
EP3347497A4 EP3347497A4 (fr) 2019-01-23

Family

ID=62083370

Family Applications (1)

Application Number Title Priority Date Filing Date
EP16845310.8A Pending EP3347497A4 (fr) 2016-01-31 2016-11-08 Analyse d'acides nucléiques par assemblage de sondes polynucléotidiques à codes barres

Country Status (3)

Country Link
EP (1) EP3347497A4 (fr)
CN (1) CN108026568A (fr)
WO (1) WO2017044993A2 (fr)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200165687A1 (en) * 2017-06-27 2020-05-28 The University Of Tokyo Probe and method for detecting transcript resulting from fusion gene and/or exon skipping
US11639928B2 (en) * 2018-02-22 2023-05-02 10X Genomics, Inc. Methods and systems for characterizing analytes from individual cells or cell populations
SG11202008080RA (en) * 2018-02-22 2020-09-29 10X Genomics Inc Ligation mediated analysis of nucleic acids
CN111100935B (zh) * 2018-10-26 2023-03-31 厦门大学 一种细菌耐药基因检测的方法
WO2020142153A1 (fr) * 2018-12-31 2020-07-09 Htg Molecular Diagnostics, Inc. Procédés de détection d'adn et d'arn dans le même échantillon
CN110408717A (zh) * 2019-07-23 2019-11-05 四川省农业科学院生物技术核技术研究所 灵芝属线粒体rns基因的特异扩增引物及其应用
US20230332205A1 (en) * 2020-10-01 2023-10-19 Google Llc Linked dual barcode insertion constructs
JP2024507210A (ja) * 2021-02-17 2024-02-16 エーシーティー ジェノミックス (アイピー) リミテッド Dnaフラグメント連結検出方法及びそのキット
AU2022227563A1 (en) 2021-02-23 2023-08-24 10X Genomics, Inc. Probe-based analysis of nucleic acids and proteins

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5912124A (en) * 1996-06-14 1999-06-15 Sarnoff Corporation Padlock probe detection
WO2005001113A2 (fr) * 2003-06-27 2005-01-06 Thomas Jefferson University Procedes de detection de variations d'acides nucleiques
DK1664345T3 (da) * 2003-09-02 2015-06-22 Keygene Nv OLA-baserede fremgangsmåder til detektion af target-nukleinsyresekvenser
EP1730312B1 (fr) * 2004-03-24 2008-07-02 Applera Corporation Reactions de codage et de decodage permettant de determiner des polynucleotides cibles
WO2006086502A2 (fr) * 2005-02-09 2006-08-17 Stratagene California Compositions de sondes cles et procedes de detection de polynucleotides
CN101395280A (zh) * 2006-03-01 2009-03-25 凯津公司 基于测序的高通量SNPs连接检测技术
WO2013106807A1 (fr) * 2012-01-13 2013-07-18 Curry John D Caractérisation échelonnable d'acides nucléiques par séquençage parallèle
CN104830993B (zh) * 2015-06-08 2017-08-18 中国海洋大学 一种高通量、多种类型分子标记通用的分型技术

Also Published As

Publication number Publication date
EP3347497A4 (fr) 2019-01-23
WO2017044993A2 (fr) 2017-03-16
CN108026568A (zh) 2018-05-11
WO2017044993A3 (fr) 2017-04-27

Similar Documents

Publication Publication Date Title
US20220049296A1 (en) Nucleic acid analysis by joining barcoded polynucleotide probes
US20210198658A1 (en) Methods for targeted genomic analysis
EP3347497A2 (fr) Analyse d'acides nucléiques par assemblage de sondes polynucléotidiques à codes barres
US20210246498A9 (en) Human identification using a panel of snps
JP6525473B2 (ja) 複製物配列決定リードを同定するための組成物および方法
US8980551B2 (en) Use of class IIB restriction endonucleases in 2nd generation sequencing applications
US7459273B2 (en) Methods for genotyping selected polymorphism
US20140378340A1 (en) Methods for Genotyping
US20190144927A1 (en) Methods for genotyping selected polymorphism
CA3213538A1 (fr) Methode d'identification et d'enumeration de changements en matiere de sequence d'acide nucleique, expression, copie ou methylation d'adn en utilisant des reactions associant nucl ease, ligase, polymerase et sequencage
EP3102702B1 (fr) Séquençage d'adn sans erreur
KR102398479B1 (ko) 카피수 보존 rna 분석 방법
JP2011518568A (ja) Dnaに基づくプロファイリングアッセイのための物質及び方法
JP2007530026A (ja) 核酸配列決定
US20230374574A1 (en) Compositions and methods for highly sensitive detection of target sequences in multiplex reactions
US20210395799A1 (en) Methods for variant detection
CN110468179B (zh) 选择性扩增核酸序列的方法
van Pelt-Verkuil et al. Principles of PCR
KR102237248B1 (ko) 소나무 개체식별 및 집단의 유전 분석용 snp 마커 세트 및 이의 용도
US11306351B2 (en) Methods for genotyping

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20180406

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

A4 Supplementary search report drawn up and despatched

Effective date: 20190104

RIC1 Information provided on ipc code assigned before grant

Ipc: C12Q 1/68 20180101AFI20181220BHEP

Ipc: G01N 33/00 20060101ALI20181220BHEP

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20200707

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS