WO2023097325A2 - Systems and methods for identifying genetic phenotypes using programmable nucleases - Google Patents

Systems and methods for identifying genetic phenotypes using programmable nucleases Download PDF

Info

Publication number
WO2023097325A2
WO2023097325A2 PCT/US2022/080553 US2022080553W WO2023097325A2 WO 2023097325 A2 WO2023097325 A2 WO 2023097325A2 US 2022080553 W US2022080553 W US 2022080553W WO 2023097325 A2 WO2023097325 A2 WO 2023097325A2
Authority
WO
WIPO (PCT)
Prior art keywords
wells
nucleic acid
candidate
allele
well
Prior art date
Application number
PCT/US2022/080553
Other languages
French (fr)
Other versions
WO2023097325A3 (en
Inventor
Fnu Vaishnavi NAGESH
Clare Louise Fasching
Bridget Ann Paine MCKAY
James Paul BROUGHTON
Jesus Ching
Janice Sha CHEN
Charles Y. CHIU
Venice SERVELLITA
Original Assignee
Mammoth Biosciences, Inc.
The Regents Of The University Of California
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mammoth Biosciences, Inc., The Regents Of The University Of California filed Critical Mammoth Biosciences, Inc.
Publication of WO2023097325A2 publication Critical patent/WO2023097325A2/en
Publication of WO2023097325A3 publication Critical patent/WO2023097325A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/70Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving virus or bacteriophage
    • C12Q1/701Specific hybridization probes
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/10Signal processing, e.g. from mass spectrometry [MS] or from PCR

Definitions

  • This specification describes technologies generally relating to detection and discrimination of genetic mutations in a biological sample.
  • programmable nuclease-based e.g., CRISPR-Cas-based
  • amplificationbased assays for recognition of mutated sequences, such as in SARS-CoV-2 variants.
  • mutations in the spike protein which binds to the human ACE2 receptor, can render the SARS-CoV-2 virus more infectious and/or more resistant to antibody neutralization, resulting in increased transmissibility and/or escape from immunity, whether vaccine-mediated or naturally acquired immunity.
  • Variant identification can also be clinically significant, as some mutations substantially reduce the effectiveness of available monoclonal antibody therapies for the COVID-19 disease.
  • compositions, computing systems, methods, and non-transitory computer readable storage mediums for addressing the above identified problems are provided in the present disclosure.
  • the compositions, systems, and methods described herein satisfy the abovementioned needs, among others, and provide related advantages.
  • Some advantages of the assays, compositions, computing systems, methods, and non- transitory computer readable storage mediums described herein for use in laboratory and point of care settings include low cost, minimal instrumentation, and a sample-to-answer turnaround time of under 2 hours.
  • one aspect of the present disclosure provides a method for determining a mutation call for a target locus in a biological sample, at a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors.
  • the method includes obtaining a signal dataset comprising, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus.
  • the signal dataset represents a plurality of time points.
  • the plurality of wells comprises a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus.
  • the plurality of wells further comprises a second set of wells representing a second allele for the target locus, where each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus.
  • Each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value, and each respective well in the first set of wells and each respective well in the second set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample.
  • a corresponding signal yield for the respective well is determined using the corresponding plurality of reporting signals for the respective well across the plurality of time points.
  • a respective candidate call identity is determined based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding second signal yield for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities.
  • a voting procedure is performed across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus.
  • the signal dataset is obtained by a procedure comprising amplifying a first plurality of nucleic acids derived from the biological sample, thereby generating a plurality of amplified nucleic acids.
  • the method includes partitioning, from the plurality of amplified nucleic acids, the respective corresponding aliquot of nucleic acid; contacting the respective corresponding aliquot of nucleic acid with at least a programmable nuclease, a guide nucleic acid, and one or more reporters; and cleaving the one or more reporters using the programmable nuclease if the first allele or the second allele, respectively, is present, thereby generating a reporting signal.
  • the method further comprises, for each candidate second allele in a plurality of candidate second alleles, repeating the process of obtaining, determining, determining, and performing, thereby obtaining a plurality of candidate mutation calls for the target locus, and performing a mutation call voting procedure across the plurality of candidate mutation calls for the target locus, thereby obtaining a final mutation call for the target locus.
  • the first allele is a wild-type allele and each respective candidate second allele in the plurality of candidate second alleles is a different respective mutant allele for the target locus.
  • the respective candidate mutation call is determined as the final mutation call for the target locus.
  • the obtaining the final mutation call for the target locus is based upon a comparison of signal yields for each pair of candidate second alleles in the sub-plurality of candidate second alleles.
  • Another aspect of the present disclosure provides a method for determining a mutation call for a target locus in a biological sample.
  • the method comprises, using a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors, obtaining a signal dataset comprising, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus.
  • the signal dataset represents a plurality of time points.
  • the plurality of wells comprises a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus. Further still, the plurality of wells also comprises, for each respective candidate second allele in a set of candidate second alleles, a respective additional set of wells representing the respective candidate second allele for the target locus, where each well in this respective additional set of wells includes a corresponding plurality of guide nucleic acids that have the respective candidate second allele for the target locus.
  • Each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value. Also, each respective well in the first set of wells and each respective well in each respective additional set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample. The method continues by determining, for each respective well in the plurality of wells, a corresponding signal yield for the respective well using the corresponding plurality of reporting signals for the respective well across the plurality of time points.
  • the method further determines, for each respective well in the first set of wells, for each respective candidate second allele in the set of candidate second alleles, a respective candidate call identity based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding respective second signal yield for a corresponding well in the respective additional set of wells for the respective candidate second allele, thereby obtaining a plurality of candidate call identities.
  • a voting procedure is then performed across the plurality of candidate call identities, thereby obtaining a mutation call for the target locus that is one of the candidate second alleles in the set of candidate second alleles or the first allele.
  • the first allele is a wild-type allele and each respective candidate second allele in the set of candidate second alleles is a different respective mutant allele for the target locus. In some such embodiments, the first allele is other than a wild-type allele. In some such embodiments, the set of candidate second alleles consists of between 2 and 10 candidate second alleles. In some such embodiments the set of candidate second alleles consists of a single second allele.
  • Another aspect of the present disclosure provides a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors, the one or more programs comprising instructions for performing any of the methods and/or embodiments disclosed above.
  • Another aspect of the present disclosure provides a non-transitory computer readable storage medium storing one or more programs configured for execution by a computer, the one or more programs comprising instructions for performing any of the methods and/or embodiments disclosed above.
  • Another aspect of the present disclosure provides a CRISPR-based COVID-19 variant DETECTR® assay (henceforth abbreviated as DETECTR assay) for the detection of SARS-CoV-2 mutations.
  • the assay combines RT-LAMP pre-amplification followed by fluorescent detection using a CRISPR-Casl2 enzyme.
  • a comparative evaluation of multiple candidate Cast 2 enzymes and robust assay performance was found with a CRISPR-Casl2 enzyme called CasDxl, which had high specificity in identifying key SNP mutations of functional relevance in the spike protein at amino acid positions 452, 484, and 501.
  • the disclosure provides method of assaying for a SARS-CoV-2 variant in an individual, the method comprising: collecting a nasal swab or a throat swab from the individual; optionally extracting a target nucleic acid comprising a segment of a SARS-CoV-2 Spike (“S”) gene from the nasal swab or the throat swab; amplifying the target nucleic acid to produce an amplification product, wherein amplifying the target nucleic acid comprises contacting the target nucleic acid to a plurality of LAMP amplification primers; contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising (i) an effector protein and (ii) a guide nucleic acid comprising a nucleotide sequence that is at least 90% identical to any one of SEQ ID NOS: 1 to 6, 22 to 27, or 40-42; assaying for a change in a signal produced by clea
  • S SARS-Co
  • the nasal swab is a nasopharyngeal swab. In some embodiments, the throat swab is an oropharyngeal swab.
  • amplifying the target nucleic acid comprises contacting the target nucleic acid to at least one reagent for amplification. In some embodiments, the amplifying comprises loop mediated amplification (LAMP). In some embodiments, the at least one reagent for amplification comprises a polymerase, dNTPs, or a combination thereof.
  • LAMP loop mediated amplification
  • the plurality of LAMP amplification primers comprise an FIP primer, a BIP primer, an F3 primer, a B3 primer, an LF primer, and an LB primer.
  • the amplification primers comprise SEQ ID NOS: 8-13, SEQ ID NOS: 14-19, SEQ ID NOS: 28-33, and/or SEQ ID NOS: 34-39.
  • the amplifying comprises reverse transcription-LAMP.
  • the method further comprises lysing the sample.
  • lysing the sample comprises contacting the sample to a lysis buffer.
  • determining an SNP call comprises determining whether the segment of the S-gene comprises one or more S-gene mutation(s) relative to a reference wild-type SARS-CoV-2 S-gene.
  • the reference wild-type SARS-CoV-2 gene is from a SARS-CoV-2-Wuhan-Hul sequence or the USA-WA1/2020 sequence.
  • the one or more S-gene mutations is a single nucleotide polymorphism (SNP).
  • the one or more S-gene mutations is associated with one or more Spike protein mutations.
  • the one or more Spike protein mutations is (i) a mutation in amino acid position 484 from E to K (E484K), (ii) a mutation in amino acid position 501 from N to Y (N501Y), (iii) a mutation in amino acid position 452 from L to R (L452R), (iv) a mutation in amino acid position 484 from E to Q (E484Q), (v) a mutation in amino acid position 484 from E to A (E484A), or a combination thereof.
  • determining an SNP call of the sample comprises comparing the signal produced by the cleavage of the reporter nucleic acid by a composition comprising a guide nucleic acid comprising a nucleotide sequence that is least 90% identical to any one of SEQ ID NOS: 1-3 or 22-24 to a signal produced by contacting a composition comprising a guide nucleic acid comprising a nucleotide sequence that is at least 90% identical to any one of SEQ ID NOS: 4-6, 25-27, or 40-42 to a nucleic acid identical to the target nucleic acid.
  • the method comprises determining a variant call of the sample.
  • determining the variant call comprises determining whether the sample comprises a wild-type SARS-CoV-2 or a SARS-CoV-2 variant.
  • the SARS-CoV-2 variant is any one of an Alpha, Beta, Gamma, Delta, Epsilon, Kappa, Omicron, or Zeta SARS-CoV-2 variant.
  • determining the variant call comprises determining one or more SNP calls of the sample.
  • determining whether the sample comprises an Alpha SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an N501Y mutation.
  • determining whether the sample comprises a Beta SARS-CoV-2 variant comprises detecting S-gene mutations associated with E484K and N501Y mutations. In some embodiments, determining whether the sample comprises a Mu SARS-CoV-2 variant comprises detecting S-gene mutations associated with E484K and N501Y mutations. In some embodiments, determining whether the sample comprises a Gamma SARS-CoV-2 variant comprises detecting S-gene mutations associated with E484K and N501Y mutations. In some embodiments, determining whether the sample comprises a Delta SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an L452R mutation.
  • determining whether the sample comprises an Epsilon SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an L452R mutation. In some embodiments, determining whether the sample comprises a Kappa SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an L452R mutation. In some embodiments, determining whether the sample comprises an Omicron SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an E484A mutation. In some embodiments, determining whether the sample comprises a Zeta SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an E484K mutation. In some embodiments, in some embodiments, the effector protein is a Type V Cas effector protein.
  • the Type V effector protein is a Cas 12 effector protein, a Cas 13 effector protein, a Cas 14 effector protein, or a CasPhi effector protein.
  • the guide nucleic acid comprises a nucleotide sequence that is at least 95% identical to one of SEQ ID NO: 1 to 6, 22 to 27, or 40 to 42. In some embodiments, the guide nucleic acid comprises a nucleotide sequence that is at least 98% identical to one of SEQ ID NO: 1 to 6, 22 to 27, or 40 to 42. In some embodiments, the guide nucleic acid comprises a nucleotide sequence of one of SEQ ID NO: 1 to 6, 22 to 27, or 40 to 42.
  • the reporter nucleic acid comprises a nucleotide sequence that is at least 75% identical to SEQ ID NO: 7, wherein the nucleotide sequence is flanked with a fluorescent dye on the 5’ end and a quencher on the 3’ end. In some embodiments, the reporter nucleic acid has a nucleotide sequence of SEQ ID NO: 7 flanked with a fluorescent dye on the 5’ end and a quencher on the 3’ end.
  • FIG. 1 is an example block diagram illustrating a computing device and related data structures used by the computing device in accordance with some implementations of the present disclosure.
  • FIGS. 2A, 2B, 2C, 2D, 2E, 2F, and 2G illustrate an example method in accordance with an embodiment of the present disclosure, in which optional steps are indicated by broken lines.
  • FIG. 3 illustrates an example schematic for a method of determining a mutation call for a target locus in a sample using programmable nucleases, in accordance with some embodiments of the present disclosure.
  • FIGS. 4A, 4B, and 4C collectively illustrate a method for determining a mutation call for a target locus in a sample, in accordance with some embodiments of the present disclosure.
  • FIGS. 5A, 5B, and 5C collectively illustrate identification of differentiated phenotypes called as wild-type, mutated, and no-call in a signal dataset, in accordance with an embodiment of the present disclosure.
  • Figure 5 A provides an allele discrimination plot visualizing the signal yields obtained from a COVID Variant programmable nuclease-based assay on gene fragments. The allele discrimination plots represent scatter plots of scaled WT and MUT fluorescence values plotted against each other.
  • Figure 5B shows Mean Average (MA) plots of the COVID Variant programmable nuclease-based assay data on gene fragments to decrease ambiguity of the signal yields.
  • MA Mean Average
  • FIGS. 6A, 6B, 6C, 6D, 6E, 6F, 6G, 6H, 61, 6J, and 6K illustrate design and workflow for a COVID-19 Variant DETECTR® assay, in accordance with some embodiments of the present disclosure.
  • FIG. 6A shows a schematic of CRISPR-Cas gRNA design for SARS-CoV-2 S gene mutations.
  • 6C shows raw fluorescence curves visualizing the SNP differentiation capability of CasDxl using synthetic gene fragments of an S-gene fragment of interest from (i) a wild-type form of the virus containing no major mutations and (ii) SARS-CoV-2 variants having mutations at amino acid positions 452, 484, and 501 relative to the wild-type virus.
  • FIG. 6C shows raw fluorescence curves visualizing the SNP differentiation capability of CasDxl using synthetic gene fragments of an S-gene fragment of interest from (i) a wild-type form of the virus containing no major mutations and (ii) SARS-CoV-2 variants having mutations at amino acid positions 452, 484, and 501 relative to the wild-type virus.
  • FIG. 6D shows raw fluorescence curves visualizing the SNP differentiation capability of LbCasl2a (which may be referred to as “LbaCasl2a”, “LbCasl2”, and “LbCasl2a” interchangeably) using synthetic gene fragments of an S-gene fragment of interest from (i) a wild-type form of the virus containing no major mutations and (ii) SARS-CoV-2 variants having mutations at amino acid positions 452, 484, and 501 relative to the wild-type virus.
  • LbCasl2a which may be referred to as “LbaCasl2a”, “LbCasl2”, and “LbCasl2a” interchangeably
  • FIG. 6E shows raw fluorescence curves visualizing the SNP differentiation capability of AsCasl2a using synthetic gene fragments of an S-gene fragment of interest from (i) a wildtype form of the virus containing no major mutations and (ii) SARS-CoV-2 variants having mutations at amino acid positions 452, 484, and 501 relative to the wild-type virus.
  • FIG. 6F shows a schematic of multiplexed RT-LAMP primer design showing the SARS-CoV-2 S gene mutations and gRNA positions.
  • FIG. 7 shows an example workflow comparison between the COVID Variant DETECTR® assay and SARS-CoV-2 Whole-Genome-Sequencing, in accordance with some embodiments of the present disclosure.
  • FIGS. 8A, 8B, 8C, 8D, 8E, 8F, 8G, and 8H illustrate example plots obtained using a DETECTR® data analysis pipeline for SARS-CoV-2 SNP mutation calling, as illustrated in FIGS. 4A-4C, in accordance with an embodiment of the present disclosure.
  • FIG. 8A shows a key providing orientation for FIGS. 8B-8H relative to each other.
  • FIG. 8B- 8H shows raw fluorescence RT-LAMP curves for each clinical sample.
  • RT-LAMP replicates that passed quality control (QC) are represented with solid black lines and dark gray shading and failed LAMP replicates are shown with solid gray lines and light gray shading. Only valid RT-LAMP replicates were used in subsequent data analysis.
  • QC quality control
  • FIGS. 9A, 9B, 9C, 9D, 9E, 9F, 9G, 9H, 91, 9J, 9K, 9L, and 9M illustrate example plots obtained using the DETECTR® data analysis pipeline for SARS-CoV-2 SNP mutation calling illustrated in FIGS. 4A-4C and 8A-8H, in accordance with an embodiment of the present disclosure.
  • FIG. 9A shows a key providing orientation for FIGS. 9B-9M relative to each other.
  • FIGS. 9B-9M shows raw fluorescence CasDxl curves for each clinical sample amplified by RT-LAMP. Each clinical sample was amplified with RT-LAMP in triplicate, and the resulting amplicons were detected by CasDxl in triplicate.
  • the raw fluorescence curves show WT detection in thick black lines and MUT detection in thin gray lines.
  • FIGS. 10A, 10B, 10C, 10D, 10E, and 10F illustrate example plots obtained using the DETECTR® data analysis pipeline for SARS-CoV-2 SNP mutation calling illustrated in FIGS. 4A-4C, 8A-8H, and 9A-9M, in accordance with an embodiment of the present disclosure.
  • FIG. 10A shows allele discrimination plots visualizing the scaled signals from the COVID Variant DETECTR® assay on gene fragments. The allele discrimination plots represent scatter plots of scaled WT and MUT fluorescence values plotted against each other.
  • FIG. 10A shows allele discrimination plots visualizing the scaled signals from the COVID Variant DETECTR® assay on gene fragments. The allele discrimination plots represent scatter plots of scaled WT and MUT fluorescence values plotted against each other.
  • FIGS. 10C-10D collectively show three representative clinical samples of different SARS-CoV-2 lineages used in the workflow of the COVID- 19 Variant DETECTR® assay.
  • FIG. 10C raw fluorescence curves of each sample run in RT-LAMP amplification and subsequent triplicate DETECTR® reactions targeting both WT and MUT SNPs for L452(R), E484(K), and N501(Y) are shown.
  • FIG. 10D shows box plot visualization of the end point fluorescence in DETECTR® across each SNP for the three representative clinical samples shown in FIG. 10C. Calls were made for each SNP by evaluating the median values of the DETECTR® calls and overall calls through the LAMP replicates, and given a designation of WT, MUT, or NoCall. Final calls were made on the lineage determined by each SNP.
  • Non-shaded elements represent WT and shaded elements represent MUT.
  • FIG. 10F shows a schematic of a data analysis pipeline workflow describing the RT-LAMP QC and subsequent CasDxl signal scaling. The scaled signals were compared across SNPs and the calls were made for each RT-LAMP replicate. The combined replicate calls defined the mutation call, which informed the final lineage classification.
  • FIGS. 11A, 11B, 11C, and 11D illustrate an example evaluation of the COVID- 19 Variant DETECTR® assay compared to SARS-CoV-2 Whole-Genome Sequencing, in accordance with an embodiment of the present disclosure.
  • FIG. 11A shows determination of RT-LAMP threshold with a ROC curve. Thresholds for LAMP quality analysis were derived to determine which samples had amplified sufficiently. The exact score value for this qualitative QC metric was determined using a ROC analysis.
  • FIG. 11A shows determination of RT-LAMP threshold with a ROC curve. Thresholds for LAMP quality analysis were derived to determine which samples had amplified sufficiently. The exact score value for this qualitative QC metric was determined using a ROC analysis.
  • FIG. 11B illustrates a heat
  • FIG. 11C shows allele discrimination plots visualizing the scaled signals from the COVID Variant DETECTR® assay on clinical samples.
  • FIGS. 12A, 12B, 12C, 12D, 12E, 12F, 12G, 12H, 121, 12J, 12K, 12L, and 12M shows visualization of SNP calls by the data analysis pipeline illustrated in FIGS. 11A-11D. Box plots of all the clinical samples illustrate the spread of the scaled signals for each of the samples across the replicates in the experiment. SNP calls were made on each sample agreement with the median values depicted on the box plot of the sample, which also provided an analytical confirmation of the DETECTR® results. WT detection is represented by shaded boxes and MUT detection is represented by non-shaded boxes.
  • FIGS. 13A, 13B, 13C, 13D, 13E, 13F, 13G, 13H, 131, and 13J illustrate evaluation of the COVID- 19 Variant DETECTR® assay compared to SARS-CoV-2 Whole- Genome Sequencing as illustrated in FIGS. 11A-11D and 12A-12M, in accordance with an embodiment of the present disclosure.
  • FIGS. 13A-13D collectively show raw fluorescence CasDxl curves for the clinical samples with discordant DETECTR® and WGS results. WT detection is represented by black lines and MUT detection is represented by gray lines.
  • FIG. 13E shows visualization of the COVID Variant DETECTR® and SARS-CoV-2 WGS assays showing the alignment of final calls. Across all of the clinical samples in this cohort, 80 out of the 91 clinical sample COVID Variant DETECTR® assay calls were consistent with the SARS-CoV-2 WGS calls.
  • FIG. 13F shows alignment of final mutation calls comparing the COVID-19 Variant DETECTR® and SARS-CoV-2 WGS assay results across 91 clinical samples after discordant samples (indicated by asterisk) were resolved.
  • FIG. 13G shows final lineage classification on each clinical sample by the COVID- 19 Variant DETECTR® compared to the SARS-CoV-2 lineage determined by the viral WGS.
  • FIGS. 13H-13I collectively show a summary of re-testing of discordant samples from the original clinical sample where nearly all SNP discrepancies are resolved.
  • FIG. 13J shows final lineage classification on each clinical sample by the COVID- 19 Variant DETECTR® compared to the SARS-CoV-2 lineage determined by the viral WGS following resolution of discordant samples.
  • FIGS 14A, 14B, 14C, and 14D collectively illustrate an overall results summary of final SNP calls by COVID-19 Variant DETECTR® assay and viral WGS, in accordance with an embodiment of the present disclosure.
  • a summary table of the final SNP calls from the COVID-19 Variant DETECTR® assay and the SARS-CoV-2 whole genome sequencing assay after discordant testing is shown.
  • the table includes the lineage classification from DETECTR® calls as well as the PANGO lineage and WHO labels assigned to the WGS calls.
  • Ct values from running an FDA EUA authorized SARS-CoV-2 RT-PCR assay, the TaqpathTM COVID-19 RT-PCR kit, are shown.
  • FIGS. 15A, 15B, and 15C illustrate specific detection of 484 mutations, which can enable rapid Omicron identification, in accordance with an embodiment of the present disclosure.
  • FIG. 15A shows a schematic of Omicron mutations within the S-gene LAMP amplicon and relative position of 484-specific gRNAs and degenerate LAMP primers.
  • FIG. 15A shows a schematic of Omicron mutations within the S-gene LAMP amplicon and relative position of 484-specific gRNAs and degenerate LAMP primers.
  • FIG. 15C shows alignment of final 484 mutation calls comparing the DETECTR® and SARS-CoV-2 WGS assay results across 36 clinical samples.
  • FIG. 16 shows a comparison of RT-LAMP primers for processing Omicron clinical samples, in accordance with an embodiment of the present disclosure.
  • Omicron clinical samples (1802), WT (1804) control RNA, and Alpha (1806) control RNAs were amplified using the original RT-LAMP primer set and degenerate RT-LAMP primer set and plotted relative to negative control (1808).
  • the degenerate primers see Table 3B
  • the original primer set see Table 3A
  • FIGS. 17A, 17B, 17C, and 17D show a summary of results of final SNP calls by the COVID Variant DETECTR® and WGS assays, in accordance with an embodiment of the present disclosure, with reference to Example 4 below.
  • FIG. 17A shows a schematic of the workflow for determining the final variant calls. If the result was an A484, K484, or Q484, the final variant call was made. If the result was an E484, the sample was reflexed to DETECTR® analysis at the 452 and 501 positions to make the variant determination.
  • FIG. 17B shows an interpretation table including the specific 484 SNPs.
  • FIG. 17C shows a summary table of the final SNP calls from the DETECTR® assay and the SARS-CoV-2 whole genome sequencing assay of Example 4 including the lineage classification from DETECTR® calls as well as the PANGO lineage and WHO labels assigned to the WGS calls. Ct values were obtained from the FDA EUA authorized S TaqpathTM COVID-19 RT- PCR kit.
  • Tracking the evolution and spread of pathogenic variants can inform public policy regarding testing and vaccination, as well as guide contact tracing and containment efforts during local outbreaks.
  • pathogenic variants e.g., SARS-CoV-2 variants
  • the ability to detect and discriminate genetic phenotypes such as wild-type and/or mutant sequences responsible for disease and infection facilitates a wide range of clinical and epidemiological applications including detection of infectious variants, monitoring the evolution and spread of pathogenic or intervention-resistant strains, and/or discovery of novel target sequences for intervention.
  • SARS-CoV-2 variants threatens to substantially prolong the COVID- 19 pandemic.
  • SARS-CoV-2 variants especially Variants of Concern (VOCs)
  • VOCs Variants of Concern
  • Variant identification can also be clinically significant, as some mutations substantially reduce the effectiveness of monoclonal antibody or convalescent plasma therapies for the disease.
  • CRISPR Clustered Interspaced Short Palindromic Repeats
  • EUA Emergency Use Authorization
  • FDA US Food and Drug Administration
  • the systems and methods disclosed herein further overcome the limitations of conventional methods for identification of genetic phenotypes in biological samples by providing improved mutation calling for target loci using programmable nuclease-based and/or amplification-based assays.
  • these systems and methods provide improved sensitivity and accuracy in identifying target nucleic acids due to the use of specific primers during amplification and/or highly precise programmable nucleases for detection of target sequences.
  • Such identification can be used to determine, e.g, pathogenesis, transmission risk, treatment response, diagnosis, and/or epidemiology of pathogens and infectious diseases, as well as tumor DNA and/or cancer-related viruses.
  • the systems and methods disclosed herein can be used to screen for rare or novel pathogenic variants (e.g, SARS-CoV-2 variants), alone or in conjunction with conventional sequencing methods. Because the sequencing capacity for most clinical and public health laboratories is limited, the systems and methods of the present disclosure would enable rapid detection of newly emerging variants or currently circulating variants that have acquired additional mutations. The information thus obtained could directly inform outbreak investigation and public health containment efforts, such as quarantine decisions. Thus, the systems and methods disclosed herein have potential as a rapid diagnostic test alternative to sequencingbased methods. Furthermore, identification of specific mutations associated with intervention resistance, such as neutralizing antibody evasion in SARS-CoV-2 variants, could guide the care of individual patients, for instance, with regard to the use of monoclonal antibodies that remain effective in treating the infection.
  • intervention resistance such as neutralizing antibody evasion in SARS-CoV-2 variants
  • one aspect of the present disclosure provides systems and methods for detecting and determining mutation calls for a target locus in a biological sample, such as a single nucleotide polymorphism in a target sequence.
  • a signal dataset is obtained comprising, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus.
  • the signal dataset represents a plurality of time points
  • the plurality of wells comprises a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus, and the plurality of wells further comprises a second set of wells representing a second allele for the target locus, where each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus.
  • Each corresponding plurality of reporting signals includes, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value, and each respective well in the first set of wells and each respective well in the second set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample.
  • a corresponding signal yield is determined using the corresponding plurality of reporting signals for the respective well across the plurality of time points. Furthermore, for each respective well in the first set of wells, a respective candidate call identity is determined based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding second signal yield for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities. A voting procedure across the plurality of candidate call identities for the first set of wells is performed, thereby obtaining a mutation call for the target locus.
  • the present disclosure provides various systems and methods for assaying for and detecting single nucleotide polymorphisms (SNPs) in a target sequence.
  • the various systems and methods disclosed herein use a programmable nuclease complexed with non-naturally occurring (e.g., engineered) guide nucleic acid sequence to detect the presence or absence of, and/or quantify the amount of, a target sequence having one or more SNPs.
  • the various systems and methods disclosed herein are used to distinguish or discriminate between sequences having different mutations or variations therein (and/or between SNP -containing sequences and wild-type sequences).
  • the SNP-containing target nucleic acids are amplified and/or comprise amplicons.
  • Amplifying SNP-containing target nucleic acids can use, for example, reverse transcription (RT) and/or isothermal amplification (e.g., loop- mediated amplification (LAMP)) or thermal amplification (e.g., polymerase chain reaction (PCR)) of RNA or DNAfy.g. , RNA or DNA extracted from a patient sample).
  • RT reverse transcription
  • LAMP loop- mediated amplification
  • PCR polymerase chain reaction
  • the present disclosure is described in relation to systems and methods for coronavirus variant detection or discrimination in a sample.
  • the compositions and methods disclosed herein may be used to detect and/or determine mutations (e.g, SNPs) in other target sequences of interest.
  • the compositions and methods described herein may be used to detect mutations (e.g, SNPs) associated with other viruses or strains or variants thereof, bacteria or strains or variants thereof, diseases, disorders, and/or genetic traits or susceptibilities of interest.
  • the present disclosure provides various systems and methods of use thereof for assaying for and detecting mutations or variations of interest or concern in a segment of a Spike (S) gene of a coronavirus in a sample.
  • the coronavirus is SARS-CoV-2 (also known as 2019 novel coronavirus, Wuhan coronavirus, or 2019-nCoV), 229E (alpha coronavirus), NL63 (alpha coronavirus), OC43 (beta coronavirus), HKU1 (beta coronavirus), MERS-CoV, or SARS-CoV.
  • the coronavirus is a variant of SARS-CoV-2, particularly the alpha variant (also referred to herein as the United Kingdom (UK) variant) known as 20B/501Y.V1, VOC 202012/01, or B.1.1.7 lineage; beta variant (also referred to herein as the South African variant) known as: 20C/501Y.V2 or B.1.351 lineage; the delta variant known as B.1.617.2; the gamma variant known as P.l.
  • the terms “2019-nCoV,” “SARS-CoV-2,” and “COVID-19” may be used interchangeably herein. Exemplary variants of concern or interest are shown in Table 1. The genetic characteristics of these variants are discussed in Leung et.
  • the systems and methods disclosed herein are used to detect the presence or absence of the segment of the S-gene of the SARS-CoV-2 in a patient sample.
  • a patient is diagnosed with COVID- 19 if the presence of SARS-CoV-2 is detected in a sample from the patient.
  • the assays disclosed herein provide single nucleotide target specificity, enabling specific detection of a single coronavirus.
  • candidate call identities are obtained using DETECTR assays, such as those described herein and in Example 1, below. See, for example, Broughton et al., “CRISPR-Cas 12-based detection of SARS-CoV-2,” Nature Biotechnology 38, 870- 874 (2020), which is hereby incorporated herein by reference in its entirety.
  • DETECTR assays disclosed herein use amplification of samples (e.g., RT and/or isothermal amplification (e.g, LAMP) of RNA (e.g, RNA extracted from a patient sample) and/or PCR), followed by Casl2 detection of predefined target loci (e.g., coronavirus sequences), followed by cleavage of a reporter molecule to detect the presence of nucleic acids in the sample having the target loci (e.g., a viral sequence).
  • a DETECTR assay targets the E (envelope) genes or N (nucleoprotein) genes of a coronavirus (e.g., SARS-CoV-2).
  • a DETECTR assay targets the S-gene of a coronavirus (e.g, SARS-CoV-2) or coronavirus variant. Isothermal amplification can also be performed to amplify one or more regions of the S gene.
  • a coronavirus e.g, SARS-CoV-2
  • Isothermal amplification can also be performed to amplify one or more regions of the S gene.
  • Table 1 Genetic Changes Characterizing the SARS-CoV-2 Variants of Concern and Variants of Interest
  • any of the regions of the Spike gene comprising the groups of mutations detailed in Table 1 may be selected as target.
  • compositions and methods of use thereof for assaying for and detecting a SARS-CoV-2 variant in a sample.
  • the various compositions, methods, and reagents disclosed herein use an effector protein complexed with guide nucleic acid sequence to distinguish among SARS-CoV-2 variants or between a SARS-CoV-2 variant and a wildtype SARS-CoV-2.
  • the variant of SARS-CoV-2 (also known as 2019 novel coronavirus or 2019-nCoV) is one of the variants known as Alpha (B.l.1.7), Beta (B.1.315), Gamma (P.l), Epsilon (B.1.427 and B.1.429), Kappa (B.1.617.1), Omicron (B.1.1.529), Zeta (P.2), and Delta (B.1.617.2) and lineages thereof.
  • the assays disclosed herein target the S (spike) gene of the SARS-CoV-2 variant.
  • a method of detecting a SARS-CoV-2 variant in an individual comprises a) collecting a nasal swab or a throat swab from the individual; b) optionally extracting a target nucleic acid from the nasal swab or the throat swab; c) amplifying the target nucleic acid to produce an amplification product; d) contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof; e) assaying for a change in a signal produced by cleavage of a reporter nucleic acid wherein the effector protein trans cleaves the reporter nucleic acid upon hybridization of the non-naturally occurring guide nucleic acid to the segment of the target nucleic acid or the a
  • methods described herein comprises amplifying the target nucleic acid (e.g, via reverse transcription-loop-mediated isothermal amplification (RT-LAMP)).
  • reagents for the amplification reaction comprise an FIP primer, a BIP primer, an F3 primer, a B3 primer, an LF primer, and a LB primer.
  • the amplification primers are selected from SEQ ID NOS: 8-13, SEQ ID NOS: 14-19, SEQ ID NOS: 28-33, or SEQ ID NOS: 34-39.
  • determining the variant call of the SARS-CoV-2 variant comprises detecting one or more S-gene mutation(s) (e.g., associated with a L452R, E484K, E484Q, E484A, and N501Y mutation of the S-gene product) relevant to a wild-type SARS-CoV-2.
  • S-gene mutation(s) e.g., associated with a L452R, E484K, E484Q, E484A, and N501Y mutation of the S-gene product
  • the non-naturally occurring guide nucleic acid comprises a nucleotide sequence that is at least 80%, 90%, 95%, 98% or 100% identical to SEQ ID NO: 1, 2, 3, 4, 5, or 6 or any one or SEQ ID NO: 22-27 or 40-42.
  • the reporter nucleic acid comprises a nucleotide sequence that is at least 75% or 100% identical to SEQ ID NO: 7, wherein the nucleotide sequence is flanked with a fluorescent dye on the 5’ end and a quencher on the 3’ end.
  • the terms “individual,” “subject,” and “patient” are used interchangeably and include any member of the animal kingdom, including humans.
  • percent (%) sequence identity describes the number of matches (“hits”) of identical nucleotides/amino acids of two or more aligned nucleic acid or amino acid sequences as compared to the number of nucleotides or amino acid residues making up the overall length of the reference nucleic acid or amino acid sequences.
  • the percentage of amino acid residues or nucleotides that are the same may be determined, when the (sub)sequences are compared and aligned for maximum correspondence over a window of comparison, or over a designated region as measured using a sequence comparison algorithm as known in the art, or when manually aligned and visually inspected.
  • This definition also applies to the complement of any sequence to be aligned.
  • amplification or “amplifying” is a process by which a nucleic acid molecule is enzymatically copied to generate a progeny population with the same sequence as the parental one.
  • the method provided herein includes repeating hybridizing a primer to the target nucleic acid, and extending a nucleic acid complementary to the strand of the target nucleic acid from the primer using a polymerase for one or more times, e.g, until a desired amount of amplification is achieved.
  • amplification of the target nucleic acid increases the concentration of the target nucleic acid in the sample relative to the concentration of nucleic acids that do not correspond to the target nucleic acid.
  • the term ‘cleavage assay’ refers to an assay designed to visualize, quantitate or identify the cleavage activity of an effector protein.
  • the cleavage activity may be cis-cleavage activity.
  • the cleavage activity may be transcleavage activity.
  • the effector protein is an activated CRISPR effector protein.
  • the term ‘activated effector protein’ refers to an effector protein associated with a guide RNA bound to a target RNA , thereby forming an ‘activated’ complex capable of exhibiting trans- or cis-cleavage.
  • CRISPR-RNA or “crRNA” or “spacer” refers to an RNA molecule having a sequence with sufficient complementarity to a target nucleic acid sequence to direct sequence-specific binding of an RNA-targeting complex to the target RNA sequence.
  • crRNAs contain a sequence that mediates target recognition and a sequence that duplexes with a tracrRNA.
  • the crRNA and tracrRNA duplex are present as parts of a single larger guide RNA molecule.
  • the crRNA comprises a repeat region that interacts with the effector protein.
  • the repeat region may also be referred to as a “protein-binding segment.”
  • the repeat region is adjacent to the spacer region.
  • a guide nucleic acid that interacts with the effector protein may comprise a repeat region that is 5’ of the spacer region.
  • the spacer region of the guide nucleic acid may be complementarity to (e.g., hybridize to) a target sequence of a target nucleic acid.
  • the spacer region is 15- 28 linked nucleosides in length.
  • the spacer region is 15-26, 15-24, 15-22, 15- 20, 15-18, 16-28, 16-26, 16-24, 16-22, 16-20, 16-18, 17-26, 17-24, 17-22, 17-20, 17-18, 18- 26, 18-24, or 18-22 linked nucleosides in length. In some cases, the spacer region is 18-24 linked nucleosides in length. In some cases, the spacer region is at least 15 linked nucleosides in length. In some cases, the spacer region is at least 16, 18, 20, or 22 linked nucleosides in length. In some cases, the spacer region comprises at least 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides. In some cases, the spacer region is at least 17 linked nucleosides in length. In some cases, the spacer region is at least 18 linked nucleosides in length.
  • a positive “detectable signal” may be any signal that can be detected using optical, fluorescent, chemiluminescent, electrochemical or other detection methods known in the art.
  • a first detectable signal may be generated by binding of the CRISPR effector protein complex to the target nucleic acid, indicating that the sample contains the target nucleic acid.
  • a detectable signal may be generated upon the cleavage event, the event comprising the cleavage of a detector nucleic acid by a Cas effector protein.
  • the detectable signal may be generated indirectly by the cleavage event.
  • the cleavage event is a trans-cleavage reaction or a cis-cleavage reaction.
  • effector protein refers to a polypeptide, or a fragment thereof, possessing enzymatic activity, and that is capable of binding to a target nucleic acid molecule with the support of a guide nucleic acid molecule.
  • the binding is sequence-specific.
  • the guide nucleic acid molecule is DNA or RNA.
  • the target nucleic acid molecule may be DNA or RNA.
  • effector protein refers to a protein that is capable of modifying a nucleic acid molecule (e.g, by cleavage, deamination, recombination).
  • enzymatic activities include, but are not limited to, nuclease activity, methyltransferase activity, demethylase activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer forming activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity, and glycosylase activity.
  • the term “guide nucleic acid” or “guide sequence” is any polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct binding of a nucleic acid-targeting complex to the target sequence.
  • the term “gRNA”, as used here refers to any RNA molecule that supports the targeting of an effector protein described here to a target nucleic acid.
  • gRNAs include, but are not limited to, crRNAs or crRNAs in combination with associated trans-activating RNAs (tracrRNAs). The latter can be independent RNAs or portions of sequences thereof can be fused into a single RNA using a linker.
  • the gRNA is designed to contain a chemical or biochemical modification.
  • a gRNA can contain one or more nucleotides.
  • non-naturally occurring or “engineered” are used interchangeably and indicate the involvement of the hand of man.
  • the terms, when referring to nucleic acid molecules or polypeptides mean that the nucleic acid molecule or the polypeptide is at least substantially free from at least one other component with which they are naturally associated in nature and as found in nature.
  • nuclease activity refers to the enzymatic activity of an enzyme which allows the enzyme to cleave the phosphodiester bonds between the nucleotide subunits of nucleic acids; the term “endonuclease activity” refers to the enzymatic activity of an enzyme which allows the enzyme to cleave the phosphodiester bond within a polynucleotide chain.
  • Nucleotide refers to a base-sugar-phosphate compound. Nucleotides are the monomeric subunits of both types of nucleic acid polymers, RNA and DNA. “Nucleotide” refers to ribonucleoside triphosphates, rATP, rGTP, rUTP, and rCTP, and deoxyribonucleoside triphosphates, such as dATP, dGTP, dTTP, and dCTP. As used herein, a “nucleoside” refers to a base-sugar combination without a phosphate group.
  • Base refers to the nitrogen-containing base, for example, adenine (A), cytidine (C), guanine (G), and thymine (T) and uracil (U).
  • a 2’ deoxyribonucleoside-5’ -triphosphate refers to a base-sugar-phosphate compound, which has hydrogen at the 2’ position of the sugar, including, but not limited to, the four common deoxyribose-containing substrates (dATP, dCTP, dGTP, dTTP), and derivatives and analogs thereof.
  • a ribonucleoside-5’ -triphosphate refers to a base-sugar-phosphate compound, which has a hydroxyl group at the 2’ position of the sugar, including, but not limited to, the four common ribose-containing substrates for an RNA polymerase- ATP, CTP, GTP and UTP, and derivatives and analogs thereof.
  • RNA polymerase- ATP RNA polymerase- ATP
  • CTP CTP
  • GTP GTP
  • UTP universal ribonucleoside-5’ -triphosphate
  • the upper (sense) strand sequence is in general, understood as going in the direction from its 5'- to 3'-end, and the complementary sequence is thus understood as the sequence of the lower (antisense) strand in the same direction as the upper strand.
  • the reverse sequence is understood as the sequence of the upper strand in the direction from its 3'- to its 5'-end, while the ‘reverse complement’ sequence or the ‘reverse complementary’ sequence is understood as the sequence of the lower strand in the direction of its 5'- to its 3'-end.
  • reporter is used interchangeably with “reporter nucleic acid,” “reporter molecule,” or “detector nucleic acid.”
  • reporter nucleic acid refers generally to an off-target nucleic acid molecule that is capable of providing a detectable signal upon cleavage by an ‘activated’ effector protein.
  • a reporter may comprise a single stranded nucleic acid and a detection moiety (e.g, a labeled single stranded RNA reporter), wherein the nucleic acid is capable of being cleaved by an effector protein (e.g, a CRISPR/Cas protein as disclosed herein) or a multimeric complex thereof, releasing the detection moiety, and generating a detectable signal.
  • a detection moiety e.g, a labeled single stranded RNA reporter
  • the effector proteins disclosed herein, activated upon hybridization of a guide nucleic acid to a target nucleic acid may cleave the reporter.
  • an ‘activated’ effector protein is a CRISPR nuclease that is activated upon hybridization of a guide nucleic acid to a target nucleic acid.
  • the reporter nucleic acid is further attached to a moiety, chemical compound or other component that can be used to visualize, quantitate or identify the cleavage of the reporter molecule. Cleaving the “reporter” may be referred to herein as cleaving the “reporter nucleic acid,” the “reporter molecule,” or the “nucleic acid”
  • reporters may comprise RNA. Reporters may comprise DNA. In some embodiments, reporters may be double-stranded. In some embodiments, reporters may be single-stranded. In some instances, reporters comprise a protein capable of generating a signal. A signal may be a calorimetric, potentiometric, amperometric, optical (e.g, fluorescent, colorimetric, etc.), or piezo-electric signal. In some instances, the reporter comprises a detection moiety.
  • Suitable detectable labels and/or moieties that may provide a signal include, but are not limited to, an enzyme, a radioisotope, a member of a specific binding pair; a fluorophore; a fluorescent protein; a quantum dot; and the like.
  • the reporter comprises a detection moiety and a quenching moiety.
  • the reporter comprises a cleavage site, wherein the detection moiety is located at a first site on the reporter and the quenching moiety is located at a second site on the reporter, wherein the first site and the second site are separated by the cleavage site.
  • the quenching moiety is a fluorescence quenching moiety.
  • the quenching moiety is 5' to the cleavage site and the detection moiety is 3' to the cleavage site. In some instances, the detection moiety is 5' to the cleavage site and the quenching moiety is 3' to the cleavage site. Sometimes the quenching moiety is at the 5' terminus of the nucleic acid of a reporter. Sometimes the detection moiety is at the 3' terminus of the nucleic acid of a reporter. In some instances, the detection moiety is at the 5' terminus of the nucleic acid of a reporter. In some instances, the quenching moiety is at the 3' terminus of the nucleic acid of a reporter.
  • the terms “trans-activating crRNA” or “tracrRNA” refer to a transactivating or transactivated RNA molecule or a fragment thereof which contains a sequence with sufficient complementarity to allow association with a crRNA molecule.
  • the tracrRNA can refer to an RNA molecule that provides a scaffold for binding to a CRISPR effector protein.
  • the tracrRNA forms a structure facilitating the binding of a CRISPR-associated or a CRISPR effector protein to a specific target nucleic acid.
  • at least a portion of a crRNA sequence and at least a portion of a tracrRNA sequence are present as parts of a larger single guide RNA molecule.
  • sgRNA single guide RNA
  • single guide nucleic acid refers to a single nucleic acid system comprising: a first nucleotide sequence that binds non-covalently with an effector protein; and a second nucleotide sequence that hybridizes to a target nucleic acid.
  • the first nucleotide sequence is referred to as a handle sequence.
  • a handle sequence may comprise at least a portion of a tracrRNA sequence, at least a portion of a repeat sequence, or a combination thereof.
  • a sgRNA does not comprise a tracrRNA.
  • target sequence refers to a DNA or RNA sequence to which a DNA or RNA-targeting guide RNA is designed to have complementarity, where hybridization between a target sequence and a guide RNA promotes the association of the CRISPR effector protein with the target RNA.
  • tissue sample is any biological samples derived from a patient. This term includes, but is not limited to, biological fluids such as blood, serum, plasma, urine, cerebrospinal fluid, tears, saliva, lymph, dialysate, lavage fluid, semen, and other biological fluid samples. Cells and tissues of scientific origin are included as tissue samples. The term also includes cells or cells derived therefrom and their progeny, including cells in culture, cell supernatants, and cell lysates. This definition includes samples that have been manipulated in any way after they are obtained, such as treatment with reagents, solubilization, or enrichment of specific components such as polynucleotides or polypeptides. The term also includes derivatives and fractions of patient samples.
  • trans-cleavage activity also referred to as “collateral” or “transcollateral” cleavage may be non-specific cleavage of nearby single-stranded nucleic acid by an activated effector protein, such as trans cleavage of detector nucleic acids with a detection moiety.
  • trans cleavage assay refers to an assay designed to visualize, quantitate or identify the trans-collateral activity of an activated effector protein.
  • an ‘activated’ effector protein is a CRISPR nuclease that is activated upon hybridization of a guide nucleic acid to a target nucleic acid, thereby inducing the formation of the nuclease-guide complex.
  • a “DETECTR®” assay is a trans cleavage assay.
  • compositions and systems comprising at least one of an engineered effector protein and an engineered guide nucleic acid, which may simply be referred to herein as an effector protein and a guide nucleic acid, respectively.
  • an engineered effector protein and an engineered guide nucleic acid refer to an effector protein and a guide nucleic acid, respectively, that are not found in nature.
  • systems and compositions comprise at least one non-naturally occurring component.
  • compositions and systems may comprise a guide nucleic acid, wherein the sequence of the guide nucleic acid is different or modified from that of a naturally-occurring guide nucleic acid.
  • compositions and systems comprise at least two components that do not naturally occur together.
  • compositions and systems may comprise a guide nucleic acid comprising a repeat region and a spacer region which do not naturally occur together.
  • composition and systems may comprise a guide nucleic acid and a Cas protein that do not naturally occur together.
  • an effector protein or guide nucleic acid that is “natural,” “naturally-occurring,” or “ found in nature” includes Cas proteins and guide nucleic acids from cells or organisms that have not been genetically modified by a human or machine.
  • the guide nucleic acid comprises a non-natural nucleobase sequence.
  • the non-natural sequence is a nucleobase sequence that is not found in nature.
  • the non-natural sequence may comprise a portion of a naturally-occurring sequence, wherein the portion of the naturally-occurring sequence is not present in nature absent the remainder of the naturally-occurring sequence.
  • the guide nucleic acid comprises two naturally-occurring sequences arranged in an order or proximity that is not observed in nature.
  • compositions and systems comprise a ribonucleotide complex comprising a CRISPR/Cas effector protein and a guide nucleic acid that do not occur together in nature.
  • Engineered guide nucleic acids may comprise a first sequence and a second sequence that do not occur naturally together.
  • an engineered guide nucleic acid may comprise a sequence of a naturally-occurring repeat region and a spacer region that is complementary to a naturally-occurring eukaryotic sequence.
  • the engineered guide nucleic acid may comprise a sequence of a repeat region that occurs naturally in an organism and a spacer region that does not occur naturally in that organism.
  • An engineered guide nucleic acid may comprise a first sequence that occurs in a first organism and a second sequence that occurs in a second organism, wherein the first organism and the second organism are different.
  • the guide nucleic acid may comprise a third sequence at a 3’ or 5’ end of the guide nucleic acid, or between the first and second sequences of the guide nucleic acid.
  • an engineered guide nucleic acid may comprise a naturally occurring crRNA and tracrRNA coupled by a linker sequence.
  • an engineered guide nucleic acid may comprise at least a portion of a crRNA sequence and at least a portion of a tracrRNA sequence.
  • compositions and systems described herein comprise an engineered effector protein that is similar to a naturally occurring effector protein.
  • the engineered effector protein may lack a portion of the naturally occurring effector protein.
  • the effector protein may comprise a mutation relative to the naturally-occurring effector protein, wherein the mutation is not found in nature.
  • the effector protein may also comprise at least one additional amino acid relative to the naturally-occurring effector protein.
  • the nucleotide sequence encoding the effector protein is codon optimized relative to the naturally occurring sequence.
  • programmable nucleases and uses thereof, e.g., detection of target nucleic acids.
  • a programmable nuclease is capable of being activated when complexed with the guide nucleic acid and the target nucleic acid segment.
  • a programmable nuclease can be capable of being activated when complexed with a guide nucleic acid and the target sequence.
  • the programmable nuclease can be activated upon binding of the guide nucleic acid to its target nucleic acid and can non-specifically degrade a non-target nucleic acid in its environment.
  • the programmable nuclease has trans-cleavage activity once activated.
  • a programmable nuclease can be a Cas protein (also referred to, interchangeably, as a Cas nuclease or Cas effector protein).
  • a guide nucleic acid (e.g., crRNA) and Cas protein can form a CRISPR enzyme.
  • one or more programmable nucleases as disclosed herein can be activated to initiate trans-cleavage activity of a reporter (also referred to herein as a reporter molecule).
  • a programmable nuclease as disclosed herein can, in some cases, bind to a target sequence or target nucleic acid to initiate trans-cleavage of a reporter.
  • the programmable nuclease can be referred to as an RNA-activated programmable RNA nuclease.
  • the programmable nuclease as disclosed herein can bind to a target DNA to initiate trans-cleavage of an RNA reporter.
  • a programmable nuclease can be referred to herein as a DNA-activated programmable RNA nuclease.
  • a programmable nuclease as described herein can be activated by a target RNA or a target DNA.
  • a programmable nuclease e.g, a Cas enzyme
  • the programmable nuclease can bind to a target ssDNA which initiates trans- cleavage of RNA reporters.
  • a programmable nuclease as disclosed herein can bind to a target DNA to initiate trans-cleavage of a DNA reporter, and this programmable nuclease can be referred to as a DNA-activated programmable DNA nuclease.
  • the programmable nuclease can become activated after binding of a guide nucleic acid that is complexed with the programmable nuclease with a target nucleic acid, and the activated programmable nuclease can cleave the target nucleic acid, which can result in a trans-cleavage activity.
  • Trans-cleavage activity can be non-specific cleavage of nearby single-stranded nucleic acids by the activated programmable nuclease, such as trans-cleavage of reporter nucleic acids comprising a detection moiety.
  • the detection moiety can be released or separated from the reporter and can directly or indirectly generate a detectable signal.
  • the reporter and/or the detection moiety can be immobilized on a support medium.
  • the detection moiety is at least one of a fluorophore, a dye, a polypeptide, or a nucleic acid.
  • the detection moiety binds to a capture molecule on the support medium to be immobilized.
  • the detectable signal can be visualized on the support medium to assess the presence or concentration of one or more target nucleic acids associated with an ailment, such as a SNP associated with a disease, cancer, or genetic disorder.
  • a programmable nuclease is any enzyme that can be or has been designed, modified, or engineered by human contribution so that the enzyme targets or cleaves a nucleic acid in a sequence-specific manner.
  • Programmable nucleases can include, for example, zinc-finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and/or RNA-guided nucleases such as the bacterial clustered regularly interspaced short palindromic repeat (CRISPR)-Cas (CRISPR-associated) nucleases or Cpfl.
  • Programmable nucleases can also include, for example, PfAgo and/or NgAgo.
  • Cas proteins are programmable nucleases used in the methods and systems disclosed herein.
  • Cas proteins can include any of the known Classes and Types of CRISPR/Cas enzymes.
  • Programmable nucleases disclosed herein include Class 1 Cas proteins, such as the Type I, Type IV, or Type III Cas proteins.
  • Programmable nucleases disclosed herein also include the Class 2 Cas proteins, such as the Type II, Type V, and Type VI Cas proteins.
  • Programmable nucleases included in the methods disclosed herein and methods of use thereof include a Type V or Type VI Cas proteins.
  • the programmable nuclease is a Type V Cas protein.
  • a Type V Cas effector protein comprises a RuvC domain but lacks an HNH domain.
  • the RuvC domain of the Type V Cas effector protein comprises three patrial RuvC domains (RuvC-I, RuvC-II, and RuvC-III, also referred to herein as subdomains).
  • the three RuvC subdomains are located within the C-terminal half of the Type V Cas effector protein.
  • none of the RuvC subdomains are located at the N terminus of the protein.
  • the RuvC subdomains are contiguous.
  • the RuvC subdomains are not contiguous with respect to the primary amino acid sequence of the Type V Cas protein, but form a ruvC domain once the protein is produced and folds. In some instances, there are zero to about 50 amino acids between the first and second RuvC subdomains. In some instances, there are zero to about 50 amino acids between the second and third RuvC subdomains.
  • the Cas effector is a Casl4 effector.
  • the Casl4 effector is a Casl4a, Casl4al, Casl4b, Casl4c, Casl4d, Casl4e, Casl4f, Casl4g, Casl4h, or Casl4u effector.
  • the Cas effector is a CasPhi (also referred to herein as a Cas)) effector.
  • the Cas effector is a Casl2 effector.
  • the Casl2 effector is a Casl2a, Casl2b, Cast 2c, Cast 2d, Casl2e, or Casl2j effector.
  • the Type V Cas protein comprises a Casl4 protein.
  • Casl4 proteins can comprise a bilobed structure with distinct amino-terminal and carboxy -terminal domains.
  • the amino- and carboxy-terminal domains can be connected by a flexible linker.
  • the flexible linker can affect the relative conformations of the amino- and carboxyl-terminal domains.
  • the flexible linker can be short, for example less than 10 amino acids, less than 8 amino acids, less than 6 amino acids, less than 5 amino acids, or less than 4 amino acids in length.
  • the flexible linker can be sufficiently long to enable different conformations of the amino- and carboxy-terminal domains among two Cas 14 proteins of a Cas 14 dimer complex (e.g., the relative orientations of the amino- and carboxy-terminal domains differ between two Cas 14 proteins of a Cas 14 homodimer complex).
  • the linker domain can comprise a mutation which affects the relative conformations of the amino- and carboxyl-terminal domains.
  • the linker can comprise a mutation which affects Casl4 dimerization. For example, a linker mutation can enhance the stability of a Cas 14 dimer.
  • the amino-terminal domain of a Cas 14 protein comprises a wedge domain, a recognition domain, a zinc finger domain, or any combination thereof.
  • the wedge domain can comprise a multi-strand P-barrel structure.
  • a multi-strand P-barrel structure can comprise an oligonucleotide/oligosaccharide-binding fold that is structurally comparable to those of some Casl2 proteins.
  • the recognition domain and the zinc finger domain can each (individually or collectively) be inserted between P-barrel strands of the wedge domain.
  • the recognition domain can comprise a 4-a-helix structure, structurally comparable but shorter than those found in some Cas 12 proteins.
  • the recognition domain can comprise a binding affinity for a guide nucleic acid or for a guide nucleic acid-target nucleic acid heteroduplex.
  • a REC lobe can comprise a binding affinity for a PAM sequence in the target nucleic acid.
  • the amino-terminal can comprise a wedge domain, a recognition domain, and a zinc finger domain.
  • the carboxy-terminal can comprise a RuvC domain, a zinc finger domain, or any combination thereof.
  • the carboxy-terminal can comprise one RuvC and one zinc finger domain.
  • Cas 14 proteins comprise a RuvC domain or a partial RuvC domain.
  • the RuvC domain can be defined by a single, contiguous sequence, or a set of partial RuvC domains that are not contiguous with respect to the primary amino acid sequence of the Cas 14 protein.
  • a partial RuvC domain does not have any substrate binding activity or catalytic activity on its own.
  • a Casl4 protein of the present disclosure can include multiple partial RuvC domains, which can combine to generate a RuvC domain with substrate binding or catalytic activity.
  • a Casl4 can include 3 partial RuvC domains (RuvC-I, RuvC-II, and RuvC-III, also referred to herein as subdomains) that are not contiguous with respect to the primary amino acid sequence of the Casl4 protein but form a RuvC domain once the protein is produced and folds.
  • a Casl4 protein can comprise a linker loop connecting a carboxy terminal domain of the Cast 4 protein with the amino terminal domain of the Cas 14 protein, and where the carboxy terminal domain comprises one or more RuvC domains and the amino terminal domain comprises a recognition domain.
  • Cas 14 proteins comprise a zinc finger domain.
  • a carboxy terminal domain of a Casl4 protein comprises a zinc finger domain.
  • an amino terminal domain of a Cas 14 protein comprises a zinc finger domain.
  • the amino terminal domain comprises a wedge domain (e.g., a multi-P-barrel wedge structure), a zinc finger domain, or any combination thereof.
  • the carboxy terminal domain comprises the RuvC domains and a zinc finger domain, and the amino terminal domain comprises a recognition domain, a wedge domain, and a zinc finger domain.
  • the Type V Cas protein is a Cas) protein.
  • a Cas protein can function as an endonuclease that catalyzes cleavage at a specific sequence in a target nucleic acid.
  • a programmable Cas nuclease can have a single active site in a RuvC domain that is capable of catalyzing pre-crRNA processing and nicking or cleaving of nucleic acids. This compact catalytic site can render the programmable Cas nuclease especially advantageous for genome engineering and new functionalities for genome manipulation.
  • the programmable nuclease is a Type VI Cas protein.
  • the Type VI Cas protein is a programmable Cas 13 nuclease.
  • the general architecture of a Cas 13 protein includes an N-terminal domain and two HEPN (higher eukaryotes and prokaryotes nucleotide-binding) domains separated by two helical domains.
  • the HEPN domains each comprise aR-X4-H motif. Shared features across Cas 13 proteins include that upon binding of the crRNA of the guide nucleic acid to a target nucleic acid, the protein undergoes a conformational change to bring together the HEPN domains and form a catalytically active RNase.
  • programmable Cas 13 nucleases also consistent with the present disclosure include Cast 3 nucleases comprising mutations in the HEPN domain that enhance the Cast 3 proteins cleavage efficiency or mutations that catalytically inactivate the HEPN domains.
  • Programmable Cast 3 nucleases consistent with the present disclosure also Cast 3 nucleases comprising catalytic components.
  • the Cas effector is a Cas 13 effector.
  • the Cas 13 effector is a Casl3a, a Casl3b, a Cas 13c, a Cas 13d, or a Cas 13e effector protein.
  • the programmable nuclease is Cas 13. In some instances, the Cas 13 is Casl3a, Casl3b, Casl3c, Casl3d, or Casl3e. In some cases, the programmable nuclease is Mad7 or Mad2. In some cases, the programmable nuclease is Cas 12. In some embodiments, the Casl2 is Casl2a, Casl2b, Casl2c, Casl2d, or Casl2e. In some cases, the programmable nuclease is Csml, Cas9, C2c4, C2c8, C2c5, C2cl0, C2c9, or CasZ.
  • the Csml is also called smCmsl, miCmsl, obCmsl, or suCmsl.
  • Casl3a is called C2c2.
  • CasZ is called Casl4a, Casl4b, Casl4c, Casl4d, Casl4e, Casl4f, Casl4g, or Casl4h.
  • the programmable nuclease is a type V CRISPR-Cas system. In some cases, the programmable nuclease is a type VI CRISPR-Cas system.
  • the programmable nuclease is a type III CRISPR-Cas system. In some cases, the programmable nuclease is from at least one of Leptotrichia shahii (Lsh), Listeria seeligeri (Lse), Leptotrichia buccalis (Lbu), Leptotrichia wadeu (Lwa), Rhodobacter capsulatus (Rea), Herbinix hemicellulosilytica (Hhe), Paludibacter propionicigen.es (Ppr), Lachnospiraceae bacterium (Lba), [Eubacterium] rectale (Ere), Listeria newyorkensis (Lny), Clostridium aminophilum (Cam), Prevotella sp.
  • Leptotrichia shahii Lsh
  • Listeria seeligeri Lse
  • Leptotrichia buccalis Lbu
  • Leptotrichia wadeu Lwa
  • Psm Capnocytophaga canimorsus
  • Ca Lachnospiraceae bacterium
  • Bzo Bergeyella zoohelcum
  • Prevotella intermedia Pin
  • Prevotella buccae Pbu
  • Alistipes sp. Asp
  • Riemerella anatipestifer Ran
  • Prevotella aurantiaca Pau
  • Prevotella saccharolytica Psa
  • Pin2 Capnocytophaga canimorsus
  • Porphyromonas gulae Pgu
  • Prevotella sp Prevotella sp.
  • the Casl3 is at least one of LbuCasl3a, LwaCasl3a, LbaCasl3a, HheCasl3a, PprCasl3a, EreCasl3a, CamCasl3a, or LshCasl3a.
  • the trans-cleavage activity of the programmable nuclease can be activated when the guide nucleic acid is complexed with the target nucleic acid.
  • the target nucleic acid is RNA or DNA.
  • the programmable nuclease comprises a Cas 12 protein, where the Cas 12 enzyme binds and cleaves double stranded DNA and single stranded DNA.
  • programmable nuclease comprises a Cast 3 protein, where the Cast 3 enzyme binds and cleaves single stranded RNA.
  • programmable nuclease comprises a Casl4 protein, where the Casl4 enzyme binds and cleaves both double stranded DNA and single stranded DNA.
  • Table 2 provides illustrative amino acid sequences of programmable nucleases having trans-cleavage activity.
  • programmable nucleases described herein comprise an amino acid sequence that is at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, or at least 98%, at least 99%, or 100% identical to any one of SEQ ID NOS: 45-117.
  • the programmable nuclease consists of an amino acid sequence that is at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100% identical to any one of SEQ ID NOS: 45-117.
  • the programmable nuclease comprises at least about 50, at least about 100, at least about 150, at least about 200, at least about 250, at least about 300, at least about 350, at least about 400, at least about 450, or at least about 500 consecutive amino acids of any one of SEQ ID NOS: 45-117.
  • effector proteins disclosed herein are engineered proteins.
  • Engineered proteins are not identical to a naturally-occurring protein.
  • Engineered proteins can provide enhanced nuclease or nickase activity as compared to a naturally occurring nuclease or nickase.
  • An engineered protein can comprise a modified form of a wild-type counterpart protein.
  • effector proteins comprise at least one amino acid change (e.g, deletion, insertion, or substitution) that enhances or reduces the nucleic acidcleaving activity of the effector protein relative to the wild-type counterpart.
  • a programmable nuclease is thermostable.
  • known programmable nucleases e.g, Casl2 nucleases
  • a thermostable protein can have enzymatic activity, stability, or folding comparable to those at 37 °C.
  • the trans-cleavage activity (e.g, the maximum trans- cleavage rate as measured by fluorescent signal generation) of a programmable nuclease in a trans-cleavage assay at 40 °C, 45 °C, 50 °C, 55 °C, 60 °C, 65 °C, 70 °C, 75 °C, 80 °C, or more is at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 100%, at least 1-fold, at least 2- fold, at least 3-fold, at least 4-fold, at least 5-fold, at least 6-fold, at least 7-fold, at least 8- fold, at least 9-fold, at least 10-fold, at least 11-fold, at least 12-fold, at least 13-fold, at least 14-fold, at least 15-fold, at least 20-fold, at least 25-fold, at least 30-fold, at least 35-
  • a guide nucleic acid refers to a nucleic acid that comprises a sequence that targets (e.g., is reverse complementary to) the sequence of a target nucleic acid.
  • a guide nucleic acid can be DNA, RNA, or a combination thereof (e.g., RNA with a thymine base), or include a chemically modified nucleobase or phosphate backbone.
  • Guide nucleic acids are often referred to as a “guide RNA” or “gRNA.”
  • a guide nucleic acid can comprise deoxyribonucleotides and/or modified nucleobases.
  • a guide nucleic acid is a nucleic acid molecule that binds to an effector protein (e.g., a Cas effector protein), thereby forming a ribonucleoprotein complex (RNP).
  • an effector protein e.g., a Cas effector protein
  • the engineered guide nucleic acid imparts activity or sequence selectivity to the effector protein.
  • a guide nucleic acid when complexed with an effector protein, brings the effector protein into proximity of a target nucleic acid molecule.
  • An engineered guide nucleic acid can comprise a CRISPR RNA (crRNA) that is at least partially complementary to a target nucleic acid.
  • the engineered guide nucleic acid comprises a trans-activating crRNA (tracrRNA), at least a portion of which interacts with the effector protein.
  • a guide nucleic acid can comprise or be coupled to a tracrRNA.
  • the tracrRNA can comprise deoxyribonucleosides in addition to ribonucleosides.
  • the tracrRNA can be separate from but form a complex with a crRNA.
  • at least a portion of a crRNA sequence and at least a portion of a tracrRNA sequence are provided as a single guide nucleic acid, also referred to as a single guide RNA (sgRNA).
  • a crRNA and tracrRNA function as two separate, unlinked molecules.
  • a reporter can comprise a single stranded nucleic acid and a detection moiety (e.g., a labeled single stranded RNA reporter), where the nucleic acid is capable of being cleaved by a programmable nuclease (e.g., a Type V or VI CRISPR/Cas protein as disclosed herein) or a multimeric complex thereof, releasing the detection moiety, and generating a detectable signal.
  • a programmable nuclease e.g., a Type V or VI CRISPR/Cas protein as disclosed herein
  • reporter signal refers to any readout from a reporter, such as a fluorescence intensity and/or a lateral flow readout.
  • Reporters can comprise RNA. Reporters can comprise DNA. Reporters can be doublestranded. Reporters can be single-stranded.
  • the systems and methods disclosed herein provide a Type V CRISPR/Cas protein and a reporter nucleic acid configured to undergo transcollateral cleavage by the Type V CRISPR/Cas protein.
  • Transcollateral cleavage of the reporter can generate a signal from the reporter or alter a signal from the reporter.
  • the signal is an optical signal, such as a fluorescence signal or absorbance band.
  • Transcollateral cleavage of the reporter can alter the wavelength, intensity, or polarization of the optical signal.
  • the reporter can comprise a fluorophore and a quencher, such that transcollateral cleavage of the reporter separates the fluorophore and the quencher thereby increasing a fluorescence signal from the fluorophore.
  • detection of reporter cleavage (directly or indirectly) to determine the presence of a target nucleic acid sequence is, in some embodiments, referred to as “DETECTR.”
  • a method of assaying for a target nucleic acid in a sample comprising contacting the target nucleic acid with a programmable nuclease, a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid, and a reporter nucleic acid, and assaying for a change in a signal, where the change in the signal is produced by or indicative of cleavage of the reporter nucleic acid.
  • the reporter comprises a detection moiety.
  • the reporter comprises a cleavage site, where the detection moiety is located at a first site on the reporter, where the first site is separated from the remainder of reporter upon cleavage at the cleavage site.
  • the detection moiety is 3’ to the cleavage site.
  • the detection moiety is 5’ to the cleavage site.
  • the detection moiety is at the 3’ terminus of the nucleic acid of a reporter.
  • the detection moiety is at the 5’ terminus of the nucleic acid of a reporter.
  • the reporter comprises a nucleic acid and a detection moiety.
  • a reporter is connected to a surface by a linkage.
  • a reporter comprises at least one of a nucleic acid, a chemical functionality, a detection moiety, a quenching moiety, or a combination thereof.
  • a reporter is configured for the detection moiety to remain immobilized to the surface and the quenching moiety to be released into solution upon cleavage of the reporter.
  • a reporter is configured for the quenching moiety to remain immobilized to the surface and for the detection moiety to be released into solution, upon cleavage of the reporter.
  • the detection moiety is at least one of a label, a polypeptide, a dendrimer, or a nucleic acid, or a combination thereof.
  • the reporter contains a label.
  • the label is FITC, DIG, TAMRA, Cy5, AF594, or Cy3.
  • the label comprises a dye, a nanoparticle configured to produce a signal.
  • the dye is a fluorescent dye.
  • the at least one chemical functionality comprises biotin.
  • the at least one chemical functionality is configured to be captured by a capture probe.
  • the at least one chemical functionality comprises biotin and the capture probe comprises anti-biotin, streptavidin, avidin or other molecule configured to bind with biotin.
  • the dye is the chemical functionality.
  • a capture probe comprises a molecule that is complementary to the chemical functionality.
  • the capture antibodies are anti-FITC, anti-DIG, anti-TAMRA, anti-Cy5, anti-AF594, or any other appropriate capture antibody capable of binding the detection moiety or conjugate.
  • the detection moiety is the chemical functionality.
  • reporters comprise a detection moiety capable of generating a signal.
  • a signal can be a calorimetric, potentiometric, amperometric, optical (e.g, fluorescent, colorimetric, etc.), or piezo-electric signal.
  • Suitable detectable labels and/or moieties that can provide a signal include, but are not limited to, an enzyme, a radioisotope, a member of a specific binding pair, a fluorophore, a fluorescent protein, a quantum dot, and the like.
  • the reporter comprises a detection moiety and a quenching moiety.
  • the reporter comprises a cleavage site, where the detection moiety is located at a first site on the reporter and the quenching moiety is located at a second site on the reporter, where the first site and the second site are separated by the cleavage site.
  • the quenching moiety is a fluorescence quenching moiety.
  • the quenching moiety is 5’ to the cleavage site and the detection moiety is 3’ to the cleavage site.
  • the detection moiety is 5’ to the cleavage site and the quenching moiety is 3’ to the cleavage site.
  • the quenching moiety is at the 5’ terminus of the nucleic acid of a reporter. In some embodiments, the detection moiety is at the 3’ terminus of the nucleic acid of a reporter. In some cases, the detection moiety is at the 5’ terminus of the nucleic acid of a reporter. In some cases, the quenching moiety is at the 3’ terminus of the nucleic acid of a reporter.
  • Suitable fluorescent proteins include, but are not limited to, green fluorescent protein (GFP) or variants thereof, blue fluorescent variant of GFP (BFP), cyan fluorescent variant of GFP (CFP), yellow fluorescent variant of GFP (YFP), enhanced GFP (EGFP), enhanced CFP (ECFP), enhanced YFP (EYFP), GFPS65T, Emerald, Topaz (TYFP), Venus, Citrine, mCitrine, GFPuv, destabilised EGFP (dEGFP), destabilised ECFP (dECFP), destabilised EYFP (dEYFP), mCFPm, Cerulean, T-Sapphire, CyPet, YPet, mKO, HcRed, t- HcRed, DsRed, DsRed2, DsRed-monomer, J-Red, dimer2, t-dimer2(12), mRFPl, pocilloporin, Renilla GFP, Monster GFP, paGFP
  • Suitable enzymes include, but are not limited to, horseradish peroxidase (HRP), alkaline phosphatase (AP), beta-galactosidase (GAL), glucose-6-phosphate dehydrogenase, beta-N-acetylglucosaminidase, CE ⁇ -glucuronidase, invertase, Xanthine Oxidase, firefly luciferase, and glucose oxidase (GO).
  • HRP horseradish peroxidase
  • AP alkaline phosphatase
  • GAL beta-galactosidase
  • glucose-6-phosphate dehydrogenase beta-N-acetylglucosaminidase
  • CE ⁇ -glucuronidase invertase
  • Xanthine Oxidase firefly luciferase
  • glucose oxidase GO
  • the detection moiety comprises an invertase.
  • the substrate of the invertase can be sucrose.
  • a DNS reagent can be included in the system to produce a colorimetric change when the invertase converts sucrose to glucose.
  • the reporter nucleic acid and invertase are conjugated using a heterobifunctional linker via sulfo- SMCC chemistry.
  • suitable fluorophores provide a detectable fluorescence signal in the same range as 6-Fluorescein (Integrated DNA Technologies), IRDye 700 (Integrated DNA Technologies), TYE 665 (Integrated DNA Technologies), Alex Fluor 594 (Integrated DNA Technologies), or ATTO TM 633 (NHS Ester) (Integrated DNA Technologies).
  • fluorophores are fluorescein amidite, 6-Fluorescein, IRDye 700, TYE 665, Alex Fluor 594, or ATTO TM 633 (NHS Ester).
  • the fluorophore can be an infrared fluorophore.
  • the fluorophore can emit fluorescence in the range of 500 nm and 720 nm.
  • the fluorophore emits fluorescence at a wavelength of 700 nm or higher. In other cases, the fluorophore emits fluorescence at about 665 nm. In some cases, the fluorophore emits fluorescence in the range of 500 nm to 520 nm, 500 nm to 540 nm, 500 nm to 590 nm, 590 nm to 600 nm, 600 nm to 610 nm, 610 nm to 620 nm, 620 nm to 630 nm, 630 nm to 640 nm, 640 nm to 650 nm, 650 nm to 660 nm, 660 nm to 670 nm, 670 nm to 680 nm, 690 nm to 690 nm, 690 nm to 700 nm, 700 nm to 710 nm, 710 nm to 720 nm, or
  • systems comprise a quenching moiety.
  • a quenching moiety may be chosen based on its ability to quench the detection moiety.
  • a quenching moiety can be a non-fluorescent fluorescence quencher.
  • a quenching moiety may quench a detection moiety that emits fluorescence in the range of 500 nm and 720 nm.
  • a quenching moiety may quench a detection moiety that emits fluorescence in the range of 500 nm and 720 nm. In some cases, the quenching moiety quenches a detection moiety that emits fluorescence at a wavelength of 700 nm or higher.
  • the quenching moiety quenches a detection moiety that emits fluorescence at about 660 nm or about 670 nm. In some cases, the quenching moiety quenches a detection moiety that emits fluorescence in the range of 500 to 520, 500 to 540, 500 to 590, 590 to 600, 600 to 610, 610 to 620, 620 to 630, 630 to 640, 640 to 650, 650 to 660, 660 to 670, 670 to 680, 690 to 690, 690 to 700, 700 to 710, 710 to 720, or 720 to 730 nm.
  • the quenching moiety quenches a detection moiety that emits fluorescence in the range 450 nm to 750 nm, 500 nm to 650 nm, or 550 to 650 nm.
  • a quenching moiety may quench fluorescein amidite, 6-Fluorescein, IRDye 700, TYE 665, Alex Fluor 594, or ATTO TM 633 (NHS Ester).
  • a quenching moiety can be Iowa Black RQ, Iowa Black FQ or IRDye QC-1 Quencher.
  • a quenching moiety quenches fluorescein amidite, 6-Fluorescein (Integrated DNA Technologies), IRDye 700 (Integrated DNA Technologies), TYE 665 (Integrated DNA Technologies), Alex Fluor 594 (Integrated DNA Technologies), or ATTO TM 633 (NHS Ester) (Integrated DNA Technologies).
  • a quenching moiety can be Iowa Black RQ (Integrated DNA Technologies), Iowa Black FQ (Integrated DNA Technologies) or IRDye QC-1 Quencher (LiCor). Any of the quenching moieties described herein can be from any commercially available source, can be an alternative with a similar function, a generic, or a non-trade name of the quenching moieties listed.
  • the generation of the detectable signal from the release of the detection moiety indicates that cleavage by the programmable nucleases has occurred and that the sample contains the target nucleic acid.
  • the detection moiety comprises a fluorescent dye.
  • the detection moiety comprises a fluorescence resonance energy transfer (FRET) pair.
  • the detection moiety comprises an infrared (IR) dye.
  • the detection moiety comprises an ultraviolet (UV) dye.
  • the detection moiety comprises a protein.
  • the detection moiety comprises a biotin.
  • the detection moiety comprises at least one of avidin or streptavidin.
  • the detection moiety comprises a polysaccharide, a polymer, or a nanoparticle.
  • the detection moiety comprises a gold nanoparticle or a latex nanoparticle.
  • a detection moiety can be any moiety capable of generating a calorimetric, potentiometric, amperometric, optical (e.g., fluorescent, colorimetric, etc.), or piezo-electric signal.
  • a nucleic acid of a reporter in some embodiments, is protein-nucleic acid that is capable of generating a calorimetric, potentiometric, amperometric, optical (e.g, fluorescent, colorimetric, etc.), or piezo-electric signal upon cleavage of the nucleic acid. Often a calorimetric signal is heat produced after cleavage of the nucleic acids of a reporter.
  • a calorimetric signal is heat absorbed after cleavage of the nucleic acids of a reporter.
  • a potentiometric signal for example, is electrical potential produced after cleavage of the nucleic acids of a reporter.
  • An amperometric signal can be movement of electrons produced after the cleavage of nucleic acid of a reporter.
  • the signal is an optical signal, such as a colorimetric signal or a fluorescence signal.
  • An optical signal is, for example, a light output produced after the cleavage of the nucleic acids of a reporter.
  • an optical signal is a change in light absorbance between before and after the cleavage of nucleic acids of a reporter.
  • a piezo-electric signal is a change in mass between before and after the cleavage of the nucleic acid of a reporter.
  • Other methods of detection may also be used, such as optical imaging, surface plasmon resonance (SPR), and/or interferometric sensing.
  • the detectable signal is a colorimetric signal or a signal visible by eye.
  • the detectable signal is fluorescent, electrical, chemical, electrochemical, or magnetic.
  • a detectable signal (e.g., a first detectable signal) is generated by binding of the detection moiety to the capture molecule in the detection region, where the detectable signal indicates that the sample contained the target nucleic acid.
  • systems are capable of detecting more than one type of target nucleic acid, where the system comprises more than one type of guide nucleic acid and more than one type of reporter nucleic acid.
  • systems may be capable of distinguishing between two different target nucleic acids in a sample (e.g, wild-type or mutant when investigating SNPs as described herein).
  • the detectable signal is generated directly by the cleavage event. Alternatively, or in combination, the detectable signal is generated indirectly by the cleavage event. In some instances, the detectable signal is not a fluorescent signal. In some instances, the detectable signal is a colorimetric or colorbased signal.
  • the detected target nucleic acid is identified based on its spatial location on the detection region of the support medium. In some cases, a second detectable signal is generated in a spatially distinct location than a first detectable signal when two or more detectable signals are generated.
  • the reporter is an enzyme-nucleic acid.
  • the enzyme can be sterically hindered when present as in the enzyme-nucleic acid, but then functional upon cleavage from the nucleic acid by the programmable nuclease.
  • the enzyme is an enzyme that produces a reaction with an enzyme substrate.
  • An enzyme can be invertase.
  • the substrate of invertase is sucrose and DNS reagent.
  • the reporter is a substrate-nucleic acid.
  • the substrate is a substrate that produces a reaction with an enzyme. Release of the substrate upon cleavage by the programmable nuclease may free the substrate to react with the enzyme.
  • a reporter is attached to a solid support.
  • the solid support for example, can be a surface. A surface can be an electrode.
  • the solid support is a bead. Often the bead is a magnetic bead.
  • the detection moiety is liberated from the solid support and interacts with other mixtures.
  • the detection moiety is an enzyme, and upon cleavage of the nucleic acid of the enzyme-nucleic acid, the enzyme flows through a chamber into a mixture comprising the substrate. When the enzyme meets the enzyme substrate, a reaction occurs, such as a colorimetric reaction, which is then detected.
  • the detection moiety is an enzyme substrate, and upon cleavage of the nucleic acid of the enzyme substrate-nucleic acid, the enzyme flows through a chamber into a mixture comprising the enzyme. When the enzyme substrate meets the enzyme, a reaction occurs, such as a calorimetric reaction, which is then detected.
  • the reporter comprises a nucleic acid conjugated to an affinity molecule which is in turn conjugated to the fluorophore (e.g, nucleic acid - affinity molecule - fluorophore) or the nucleic acid conjugated to the fluorophore which is in turn conjugated to the affinity molecule (e.g, nucleic acid - fluorophore - affinity molecule).
  • a linker conjugates the nucleic acid to the affinity molecule.
  • a linker conjugates the affinity molecule to the fluorophore.
  • a linker conjugates the nucleic acid to the fluorophore.
  • a linker can be any suitable linker known in the art.
  • the nucleic acid of the reporter can be directly conjugated to the affinity molecule and the affinity molecule can be directly conjugated to the fluorophore or the nucleic acid can be directly conjugated to the fluorophore and the fluorophore can be directly conjugated to the affinity molecule.
  • “directly conjugated” indicates that no intervening molecules, polypeptides, proteins, or other moi eties are present between the two moieties directly conjugated to each other.
  • a reporter comprises a nucleic acid directly conjugated to an affinity molecule and an affinity molecule directly conjugated to a fluorophore
  • the affinity molecule is biotin, avidin, streptavidin, or any similar molecule.
  • the reporter comprises a substrate-nucleic acid.
  • the substrate can be sequestered from its cognate enzyme when present as in the substrate-nucleic acid, but then is released from the nucleic acid upon cleavage, where the released substrate can contact the cognate enzyme to produce a detectable signal.
  • the substrate is sucrose and the cognate enzyme is invertase, and a DNS reagent can be used to monitor invertase activity.
  • a reporter can be a hybrid nucleic acid reporter.
  • a hybrid nucleic acid reporter comprises a nucleic acid with at least one deoxyribonucleotide and at least one ribonucleotide.
  • the nucleic acid of the hybrid nucleic acid reporter is of any length and/or has any mixture of DNAs and RNAs. For example, in some cases, longer stretches of DNA can be interrupted by a few ribonucleotides. Alternatively, longer stretches of RNA can be interrupted by a few deoxyribonucleotides. Alternatively, every other base in the nucleic acid can alternate between ribonucleotides and deoxyribonucleotides.
  • hybrid nucleic acid reporter is increased stability as compared to a pure RNA nucleic acid reporter.
  • a hybrid nucleic acid reporter can be more stable in solution, lyophilized, or vitrified as compared to a pure DNA or pure RNA reporter.
  • a reporter and/or guide nucleic acid comprises one or more modifications, e.g., a base modification, a backbone modification, a sugar modification, etc., to provide the nucleic acid with anew or enhanced feature (e.g., improved stability).
  • modifications e.g., a base modification, a backbone modification, a sugar modification, etc.
  • suitable modifications include modified nucleic acid backbones and non-natural intemucleoside linkages. Other suitable modifications include nucleic acid mimetics.
  • the nucleic acids described herein can include one or more substituted sugar moieties.
  • the nucleic acids described herein can include nucleobase modifications or substitutions.
  • the nucleic acids described and referred to herein can comprise a plurality of base pairs.
  • a base pair refers to a biological unit comprising two nucleobases bound to each other by hydrogen bonds.
  • Nucleobases can comprise adenine, guanine, cytosine, thymine, and/or uracil.
  • the nucleic acids described and referred to herein can comprise different base pairs.
  • the nucleic acids described and referred to herein can comprise one or more modified base pairs. The one or more modified base pairs can be produced when one or more base pairs undergo a chemical modification leading to new bases.
  • the one or more modified base pairs can be, for example, Hypoxanthine, Inosine, Xanthine, Xanthosine, 7-Methylguanine, 7-Methylguanosine, 5,6- Dihydrouracil, Dihydrouridine, 5 -Methylcytosine, 5-Methylcytidine, 5- hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC), or 5-carboxylcytosine (5caC).
  • locus refers to one or more nucleotides that map to a reference sequence, such as a genome.
  • a locus refers to a position (e.g., a site) within a reference sequence, e.g., on a particular chromosome of a genome.
  • a locus refers to a single nucleotide position within a reference sequence, e.g., on a particular chromosome within a genome.
  • a locus refers to a group of nucleotide positions within a reference sequence.
  • a locus is defined by a mutation (e.g., substitution, insertion, deletion, inversion, or translocation) of consecutive nucleotides within a reference sequence.
  • a locus is defined by a gene, a sub-genic structure (e.g., a regulatory element, exon, intron, or combination thereof), or a predefined span of a chromosome.
  • a locus e.g. , a target locus
  • a locus is defined by a biomarker and/or antimicrobial resistance gene that maps to the position of the respective locus.
  • a respective target locus comprises a plurality of alleles including an allele having a mutation that confers resistance to a microorganism against antimicrobial interventions that target the respective locus (e.g., antibacterial resistance, antiprotozoal resistance, antifungal resistance, antihelminthic resistance, and/or antiviral resistance).
  • a respective target locus comprises a genetic marker for the target locus that indicates a resistance to antimicrobial interventions.
  • antimicrobial resistance markers e.g., genes and/or amino acid residues
  • examples of antimicrobial resistance markers include, but are not limited to, the antimicrobial resistance markers listed below in, e.g, Capela et al., 2019, “An Overview of Drug Resistance in Protozoal Diseases,” Int J Mol Sci.
  • the target nucleic acid is a single stranded nucleic acid.
  • the target nucleic acid is a double stranded nucleic acid and is prepared into single stranded nucleic acids before or upon contacting the programmable nuclease-based detection reagents (e.g, programmable nuclease, guide nucleic acid, and/or reporter).
  • the target nucleic acid is a double stranded nucleic acid.
  • the double stranded nucleic acid is DNA.
  • the target nucleic acid is an RNA.
  • Target nucleic acids include but are not limited to mRNA, rRNA, tRNA, non-coding RNA, long non-coding RNA, and microRNA (miRNA).
  • the target nucleic acid is complementary DNA (cDNA) synthesized from a single-stranded RNA template in a reaction catalyzed by a reverse transcriptase.
  • the target nucleic acid is single-stranded RNA (ssRNA) or mRNA.
  • the target nucleic acid is from a virus, a parasite, or a bacterium described herein.
  • the target nucleic acid comprises 5 to 100, 5 to 90, 5 to 80, 5 to 70, 5 to 60, 5 to 50, 5 to 40, 5 to 30, 5 to 25, 5 to 20, 5 to 15, or 5 to 10 nucleotides in length. In some cases, the target nucleic acid comprises 10 to 90, 20 to 80, 30 to 70, or 40 to 60 nucleotides in length.
  • the target nucleic acid sequence can be from 10 to 95, from 20 to 95, from 30 to 95, from 40 to 95, from 50 to 95, from 60 to 95, from 10 to 75, from 20 to 75, from 30 to 75, from 40 to 75, from 50 to 75, from 5 to 50, from 15 to 50, from 25 to 50, from 35 to 50, or from 45 to 50 nucleotides in length.
  • the target nucleic acid comprises 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 45, 50, 60, 70, 80, 90, or 100 nucleotides in length.
  • the target nucleic acid comprises at least 10, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, or at least 100 nucleotides in length.
  • the target nucleic acid can be reverse complementary to a guide nucleic acid. In some cases, at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
  • nucleotides of a guide nucleic acid can be reverse complementary to a target nucleic acid.
  • a target nucleic acid can be an amplified nucleic acid of interest.
  • the nucleic acid of interest can be any nucleic acid disclosed herein or from any sample as disclosed herein.
  • the nucleic acid of interest can be an RNA that is reverse transcribed before amplification.
  • the nucleic acid of interest is amplified, and then the amplicons are transcribed into RNA.
  • target nucleic acids activate a programmable nuclease to initiate sequence-independent cleavage of a nucleic acid-based reporter (e.g., a reporter comprising an RNA sequence, or a reporter comprising DNA and RNA).
  • a programmable nuclease of the present disclosure is activated by a target nucleic acid to cleave reporters having an RNA (also referred to herein as an “RNA reporter”).
  • RNA reporter also referred to herein as a “RNA reporter”.
  • the RNA reporter can comprise a single-stranded RNA labeled with a detection moiety or can be any RNA reporter as disclosed herein.
  • the target nucleic acid as described in the methods herein does not initially comprise a PAM sequence.
  • any target nucleic acid of interest can be generated using the methods described herein to comprise a PAM sequence, and thus be a PAM target nucleic acid.
  • a PAM target nucleic acid refers to a target nucleic acid that has been amplified to insert a PAM sequence that is recognized by a CRISPR/Cas system.
  • the target nucleic acid is in a cell.
  • the cell is a single-cell eukaryotic organism; a plant cell an algal cell; a fungal cell; an animal cell; a cell an invertebrate animal; a cell a vertebrate animal such as fish, amphibian, reptile, bird, and mammal; or a cell a mammal such as a human, a non-human primate, an ungulate, a feline, a bovine, an ovine, and a caprine.
  • the cell is a eukaryotic cell.
  • the cell is a mammalian cell, a human cell, or a plant cell.
  • the target nucleic acid sequence comprises a nucleic acid sequence of a virus, a bacterium, or other pathogen responsible for a disease in a plant (e.g, a crop).
  • Methods and compositions of the disclosure can be used to treat or detect a disease in a plant.
  • the methods of the disclosure can be used to target a viral nucleic acid sequence in a plant.
  • a programmable nuclease of the disclosure cleaves the viral nucleic acid.
  • the target nucleic acid sequence comprises a nucleic acid sequence of a virus or a bacterium or other agents (e.g, any pathogen) responsible for a disease in the plant (e.g, a crop).
  • the target nucleic acid comprises RNA.
  • the target nucleic acid in some cases, is a portion of a nucleic acid from a virus or a bacterium or other agents responsible for a disease in the plant (e.g, a crop).
  • the target nucleic acid is a portion of a nucleic acid from a genomic locus, or any NA amplicon, such as a reverse transcribed mRNA or a cDNA from a gene locus, a transcribed mRNA, or a reverse transcribed cDNA from a gene locus in at a virus or a bacterium or other agents (e.g, any pathogen) responsible for a disease in the plant (e.g, a crop).
  • a virus infecting the plant can be an RNA virus.
  • a virus infecting the plant can be a DNA virus.
  • TMV Tobacco mosaic virus
  • TSWV Tomato spotted wilt virus
  • CMV Cucumber mosaic virus
  • PVY Potato virus Y
  • PMV Cauliflower mosaic virus
  • PV Plum pox virus
  • BMV Brome mosaic virus
  • PVX Potato virus X
  • target nucleic acids comprise a mutation.
  • a sequence comprising a mutation is modified to a wild-type sequence with a composition, system or method described herein.
  • a sequence comprising a mutation is detected with a composition, system or method described herein.
  • the mutation can be a mutation of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more nucleotides.
  • Non-limiting examples of mutations are insertion-deletion (indel), single nucleotide polymorphism (SNP), and frameshift mutations.
  • guide nucleic acids described herein hybridize to a region of the target nucleic acid comprising the mutation.
  • a target locus comprises a plurality of alleles including a first allele ( .g., a wild-type allele) and a second allele (e.g, a mutant allele), where the first allele and the second allele differ by a mutation in the nucleic acid sequence of the respective locus, and the respective guide nucleic acid for each respective allele is a nucleic acid that is reverse complementary to the respective allele.
  • the mutation is located in a non-coding region or a coding region of a gene.
  • target nucleic acids comprise a mutation, where the mutation is a SNP.
  • the single nucleotide mutation or SNP can be associated with a phenotype of the sample or a phenotype of the organism from which the sample was taken.
  • the SNP in some embodiments, is associated with altered phenotype from wild-type phenotype.
  • the SNP can be a synonymous substitution or a nonsynonymous substitution.
  • the nonsynonymous substitution can be a missense substitution or a nonsense point mutation.
  • the synonymous substitution can be a silent substitution.
  • the mutation can be a deletion of one or more nucleotides.
  • the single nucleotide mutation, SNP, or deletion is associated with a disease such as cancer or a genetic disorder.
  • the mutation such as a single nucleotide mutation, a SNP, or a deletion, can be encoded in the sequence of a target nucleic acid from the germline of an organism or can be encoded in a target nucleic acid from a diseased cell, such as a cancer cell.
  • target nucleic acids comprise a mutation, where the mutation is a deletion of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more nucleotides.
  • the mutation can be a deletion of about 5, about 10, about 15, about 20, about 25, about 30, about 35, about 40, about 45, about 50, about 55, about 60, about 65, about 70, about 75, about 80, about 85, about 90, about 95, about 100, about 200, about 300, about 400, about 500, about 600, about 700, about 800, about 900, or about 1000 nucleotides.
  • the mutation can be a deletion of 1 to 5, 5 to 10, 10 to 15, 15 to 20, 20 to 25, 25 to 30, 30 to 35, 35 to 40, 40 to 45, 45 to 50, 50 to 55, 55 to 60, 60 to 65, 65 to 70, 70 to 75, 75 to 80, 80 to 85, 85 to 90, 90 to 95, 95 to 100, 100 to 200, 200 to 300, 300 to 400, 400 to 500, 500 to 600, 600 to 700, 700 to 800, 800 to 900, 900 to 1000, 1 to 50, 1 to 100, 25 to 50, 25 to 100, 50 to 100, 100 to 500, 100 to 1000, or 500 to 1000 nucleotides.
  • single nucleotide polymorphism refers to the variation of a single nucleotide or nucleobase at a specific position in a nucleic acid sequence.
  • the single nucleotide or nucleobase variation is generally between the genomes of two members of the same species, or some other specific population.
  • a SNP occurs at a specific nucleic acid site in genomic DNA in which different alternative sequences, e.g., “alleles,” exist more frequently in certain member of a population.
  • a less frequent allele comprises the SNP and has an abundance of at least 1%, 0.8%, 0.5%, 0.4%, 0.3%, 0.2% or 0.1%.
  • a SNP is any point mutation that is sufficiently present in a population (e.g, 1%, 0.8%, 0.5%, 0.4%, 0.3%, 0.2%, 0.1% or more).
  • a SNP can be disease-causing, at least partially, or can be associated with a disease.
  • SNPs are known to skilled artisans and can be located in relevant published papers and genomic databases.
  • allelic frequency can be determined, thereby providing the fraction of all chromosomes that carry a particular allele of a gene within a specific population.
  • wild-type refers to a segment or region of nucleic acid sequence, or fragment thereof, that is the universal form (e.g, present in at least 40%) within a population.
  • wild-type refers to a segment or region of nucleic acid sequence, or fragment thereof, lacking commonly known sequence variations or allelic variations which can be silent, causal, disease-associated, or disease-risk causing.
  • wild-type refers to a to a polypeptide or protein expressed by a naturally occurring organism, or a polypeptide or protein having the characteristics of a polypeptide or protein isolated from a naturally occurring organism, where the polypeptide or protein is relatively constant (e.g, present in at least 40% of) a species population.
  • mutations are associated with a disease, that is the mutation in a subject indicates that the subject is susceptible to, or suffers from, a disease, disorder, or pathological state.
  • a mutation associated with a disease refers to a mutation which causes the disease, contributes to the development of the disease, or indicates the existence of the disease.
  • a mutation associated with a disease can also refer to any mutation which generates transcription or translation products at an abnormal level, or in an abnormal form, in cells affected by a disease relative to a control without the disease.
  • Nonlimiting examples of diseases associated with mutations are hemophilia, sickle cell anemia, P-thalassemia, Duchene muscular dystrophy, severe combined immunodeficiency (SCID, also known as “bubble boy syndrome”), Huntington’s disease, cystic fibrosis, and various cancers.
  • SCID severe combined immunodeficiency
  • Huntington Huntington’s disease
  • cystic fibrosis and various cancers.
  • the systems and methods of the present disclosure can be used to detect one or more target sequences or nucleic acids in one or more samples.
  • the one or more samples can comprise one or more target sequences or nucleic acids for detection of an ailment, such as a disease, cancer, or genetic disorder, or genetic information, such as for phenotyping, genotyping, or determining ancestry and are compatible with the reagents and support mediums as described herein.
  • a sample can be taken from any place where a nucleic acid can be found.
  • Samples can be taken from an individual/human, a non-human animal, or a crop, or an environmental sample can be obtained to test for presence of a disease, virus, pathogen, cancer, genetic disorder, or any mutation or pathogen of interest.
  • a biological sample can be blood, serum, plasma, lung fluid, exhaled breath condensate, saliva, spit, urine, stool, feces, mucus, lymph fluid, peritoneal , cerebrospinal fluid, amniotic fluid, breast milk, gastric secretions, bodily discharges, secretions from ulcers, pus, nasal secretions, sputum, pharyngeal exudates, urethral secretions/mucus, vaginal secretions/mucus, anal secretion/mucus, semen, tears, an exudate, an effusion, tissue fluid, interstitial fluid (e.g, tumor interstitial fluid), cyst fluid, tissue, or, in some instances, any combination thereof.
  • tissue fluid interstitial fluid
  • a sample can be an aspirate of a bodily fluid from an animal (e.g, human, animals, livestock, pet, etc.) or plant.
  • a tissue sample can be from any tissue that can be infected or affected by a pathogen (e.g, a wart, lung tissue, skin tissue, and the like).
  • a tissue sample (e.g, from animals, plants, or humans) can be dissociated or liquified prior to application to detection system of the present disclosure.
  • a sample can be from a plant (e.g, a crop, a hydroponically grown crop or plant, and/or house plant). Plant samples can include extracellular fluid, from tissue (e.g, root, leaves, stem, trunk etc.).
  • a sample can be taken from the environment immediately surrounding a plant, such as hydroponic fluid/ water, or soil.
  • a sample from an environment can be from soil, air, or water.
  • the environmental sample is taken as a swab from a surface of interest or taken directly from the surface of interest.
  • the raw sample is applied to the detection system.
  • the sample is diluted with a buffer or a fluid or concentrated prior to application to the detection system.
  • the sample is contained in no more than about 200 nanoliters (nL). In some cases, the sample is contained in about 200 nL. In some cases, the sample is contained in a volume that is greater than about 200 nL and less than about 20 microliters (pL).
  • the sample is contained in no more than 20 pl. In some cases, the sample is contained in no more than 1, 5, 10, 15, 20, 25, 30, 35 40, 45, 50, 55, 60, 65, 70, 75, 80, 90, 100, 200, 300, 400, 500 pl, or any of value from 1 pl to 500 pl.
  • the sample is contained in from 1 pL to 500 pL, from 10 pL to 500 pL, from 50 pL to 500 pL, from 100 pL to 500 pL, from 200 pL to 500 pL, from 300 pL to 500 pL, from 400 pL to 500 pL, from 1 pL to 200 pL, from 10 pL to 200 pL, from 50 pL to 200 pL, from 100 pL to 200 pL, from 1 pL to 100 pL, from 10 pL to 100 pL, from 50 pL to 100 pL, from 1 pL to 50 pL, from 10 pL to 50 pL, from 1 pL to 20 pL, from 10 pL to 20 pL, or from 1 pL to 10 pL.
  • the sample is contained in more than 500 pl.
  • the sample is taken from a single-cell eukaryotic organism; a plant or a plant cell; an algal cell; a fungal cell; an animal or an animal cell, tissue, or organ; a cell, tissue, or organ from an invertebrate animal; a cell, tissue, fluid, or organ from a vertebrate animal such as fish, amphibian, reptile, bird, and mammal; a cell, tissue, fluid, or organ from a mammal such as a human, a non-human primate, an ungulate, a feline, a bovine, an ovine, and a caprine.
  • the sample is taken from nematodes, protozoans, helminths, or malarial parasites.
  • the sample can comprise nucleic acids from a cell lysate from a eukaryotic cell, a mammalian cell, a human cell, a prokaryotic cell, or a plant cell.
  • the sample can comprise nucleic acids expressed from a cell.
  • the sample used for phenotyping testing can comprise at least one target nucleic acid segment that can bind to a guide nucleic acid of the reagents described herein.
  • the target nucleic acid segment in some cases, is a portion of a nucleic acid from a gene associated with a phenotypic trait.
  • the sample used for genotyping testing can comprise at least one target nucleic acid segment that can bind to a guide nucleic acid of the reagents described herein.
  • the target nucleic acid segment in some cases, is a portion of a nucleic acid from a gene associated with a genotype.
  • the sample used for ancestral testing can comprise at least one target nucleic acid segment that can bind to a guide nucleic acid of the reagents described herein.
  • the target nucleic acid segment in some cases, is a portion of a nucleic acid from a gene associated with a geographic region of origin or ethnic group.
  • the sample can be used for identifying a disease status.
  • a sample is any sample described herein, and is obtained from a subject for use in identifying a disease status of a subject.
  • the disease can be a cancer or genetic disorder.
  • a method may comprise obtaining a serum sample from a subject; and identifying a disease status of the subject. Often, the disease status is prostate disease status.
  • the device can be configured for asymptomatic, pre- symptomatic, and/or symptomatic diagnostic applications, irrespective of immunity.
  • the device can be configured to perform one or more serological assays on a sample (e.g, a sample comprising blood).
  • the target sequence is a portion of a nucleic acid from a virus or a bacterium or other agents responsible for a disease in the sample.
  • the target sequence in some cases, is a portion of a nucleic acid from a sexually transmitted infection or a contagious disease, in the sample.
  • the target sequence in some cases, is a portion of a nucleic acid from an upper respiratory tract infection, a lower respiratory tract infection, or a contagious disease, in the sample.
  • the target sequence in some cases, is a portion of a nucleic acid from a hospital acquired infection or a contagious disease, in the sample.
  • the target sequence is a portion of a nucleic acid from sepsis, in the sample.
  • diseases can include but are not limited to respiratory viruses (e.g, SARS-CoV-2 (i.e., a virus that causes COVID- 19), SARS-CoV-1, MERS-CoV, influenza, Adenovirus, Coronavirus HKU1, Coronavirus NL63, Coronavirus 229E, Coronavirus OC43, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), Human Metapneumovirus (hMPV), Human Rhinovirus (HRVs A, B, C), Human Enterovirus, Influenza A, Influenza A/Hl, Influenza A/H2, Influenza A/H3, Influenza A/H4, Influenza A/H5, Influenza A/H6, Influenza A/H7, Influenza A/H8, Influenza A/H9, Influenza A/H10, Influenza A/Hl 1, Influenza A/H12, Influenza A/H13, Influenza A
  • viruses include human immunodeficiency virus (HIV), human papillomavirus (HPV), chlamydia, gonorrhea, syphilis, trichomoniasis, sexually transmitted infection, malaria, Dengue fever, Ebola, chikungunya, and leishmaniasis.
  • HAV human immunodeficiency virus
  • HPV human papillomavirus
  • chlamydia gonorrhea
  • syphilis syphilis
  • trichomoniasis sexually transmitted infection
  • malaria Dengue fever
  • Ebola chikungunya
  • leishmaniasis leishmaniasis
  • Pathogens include viruses, fungi, helminths, protozoa, malarial parasites, Plasmodium parasites, Toxoplasma parasites, and Schistosoma parasites.
  • Helminths include roundworms, heartworms, and phytophagous nematodes, flukes, Acanthocephala, and tapeworms.
  • Protozoan infections include infections from Giardia spp., Trichomonas spp., African trypanosomiasis, amoebic dysentery, babesiosis, balantidial dysentery, Chagas disease, coccidiosis, malaria and toxoplasmosis.
  • pathogens such as parasitic/protozoan pathogens include, but are not limited to: Plasmodium falciparum, P. vivax, Trypanosoma cruzi and Toxoplasma gondii.
  • Fungal pathogens include, but are not limited to Cryptococcus neoformans, Histoplasma capsulatum, Coccidioides immitis, Blastomyces dermatitidis, Chlamydia trachomatis, Chlamydia pneumoniae, Chlamydia psittaci, and Candida albicans.
  • Pathogenic viruses include but are not limited to: respiratory viruses (e.g., adenoviruses, parainfluenza viruses, severe acute respiratory syndrome (SARS), coronavirus, MERS), gastrointestinal viruses (e.g., noroviruses, rotaviruses, some adenoviruses, astroviruses), exanthematous viruses (e.g., the virus that causes measles, the virus that causes rubella, the virus that causes chickenpox/shingles, the virus that causes roseola, the virus that causes smallpox, the virus that causes fifth disease, chikungunya virus infection); hepatic viral diseases (e.g., hepatitis A, B, C, D, E); cutaneous viral diseases (e.g., warts (including genital, anal), herpes (including oral, genital, anal), molluscum contagiosum); hemmorhagic viral diseases (e.g., Ebola, Lassa fever
  • Pathogens include, e.g., HIV virus, Mycobacterium tuberculosis, Klebsiella pneumoniae, Acinetobacter baumannii, Bacillus anthracis, Bordetella pertussis, Burkholderia cepacia, Corynebacterium diphtheriae, Coxiella burnetii, Streptococcus agalactiae, methicillin-resistant Staphylococcus aureus, Legionella longbeachae, Legionella pneumophila, Leptospira interrogans, Moraxella catarrhalis, Streptococcus pyogenes, Escherichia coli, Neisseria gonorrhoeae, Neisseria meningitidis, Neisseria elongate, Neisseria gonorrhoeae, Parechovirus, Pneumococcus, Pneumocystis jirovecii, Cryptoc
  • the target nucleic acid comprises a sequence from a virus or a bacterium or other agents responsible for a disease that can be found in the sample.
  • the target nucleic acid is a portion of a nucleic acid from a genomic locus, a transcribed mRNA, or a reverse transcribed cDNA from a gene locus in at least one of: human immunodeficiency virus (HIV), human papillomavirus (HPV), chlamydia, gonorrhea, syphilis, trichomoniasis, sexually transmitted infection, malaria, Dengue fever, Ebola, chikungunya, and leishmaniasis.
  • HCV human immunodeficiency virus
  • HPV human papillomavirus
  • chlamydia gonorrhea
  • syphilis syphilis
  • trichomoniasis sexually transmitted infection
  • malaria Dengue fever
  • Ebola chikungunya
  • leishmaniasis leishmaniasis
  • Pathogens include viruses, fungi, helminths, protozoa, malarial parasites, Plasmodium parasites, Toxoplasma parasites, and Schistosoma parasites.
  • Helminths include roundworms, heartworms, and phytophagous nematodes, flukes, Acanthocephala, and tapeworms.
  • Protozoan infections include infections from Giardia spp., Trichomonas spp., African trypanosomiasis, amoebic dysentery, babesiosis, balantidial dysentery, Chagas disease, coccidiosis, malaria and toxoplasmosis.
  • pathogens such as parasitic/protozoan pathogens include, but are not limited to: Plasmodium falciparum, P. vivax, Trypanosoma cruzi and Toxoplasma gondii.
  • Fungal pathogens include, but are not limited to Cryptococcus neoformans, Histoplasma capsulatum, Coccidioides immitis, Blastomyces dermatitidis , Chlamydia trachomatis, and Candida albicans.
  • Pathogenic viruses include but are not limited to immunodeficiency virus (e.g., HIV); influenza virus; dengue; West Nile virus; herpes virus; yellow fever virus; Hepatitis Virus C; Hepatitis Virus A; Hepatitis Virus B; papillomavirus; and the like.
  • immunodeficiency virus e.g., HIV
  • influenza virus dengue; West Nile virus
  • herpes virus yellow fever virus
  • Hepatitis Virus C Hepatitis Virus A
  • Hepatitis Virus B Hepatitis Virus B
  • papillomavirus papillomavirus
  • Pathogens include, e.g., HIV virus, Mycobacterium tuberculosis, Streptococcus agalactiae, methicillin- resistant Staphylococcus aureus, Staphylococcus epidermidis, Legionella pneumophila, Streptococcus pyogenes, Streptococcus salivarius, Escherichia coli, Neisseria gonorrhoeae, Neisseria meningitidis, Pneumococcus, Cryptococcus neoformans, Histoplasma capsulatum, Hemophilus influenzae B, Treponema pallidum, Lyme disease spirochetes, Pseudomonas aeruginosa, Mycobacterium leprae, Brucella abortus, rabies virus, influenza virus, cytomegalovirus, herpes simplex virus I, herpes simplex virus II, human serum parvo-like virus, respiratory
  • T. vaginalis varicella-zoster virus
  • hepatitis B virus hepatitis C virus
  • measles virus human adenovirus (type A, B, C, D, E, F, G)
  • human T-cell leukemia viruses Epstein-Barr virus, murine leukemia virus, mumps virus, vesicular stomatitis virus, Sindbis virus, lymphocytic choriomeningitis virus, wart virus, blue tongue virus
  • SARS-CoV-2 Variants include Coronavirus HKU1, Coronavirus NL63, Coronavirus 229E, Coronavirus OC43, SARS-CoV-2 85 A, SARS-CoV-2 T1001I, SARS-CoV-2 3675-3677A, SARS-CoV-2 P4715L, SARS-CoV-2 S5360L, SARS-CoV-2 69-70A, SARS-CoV-2 Tyrl44fs, SARS-CoV- 2242-244A, SARS-CoV-2 Y453F, SARS-CoV-2 S477N, SARS-CoV-2 E848K, SARS-CoV- 2 N501Y, SARS-CoV-2 D614G, SARS-CoV-2 P681R, SARS-CoV-2 P681H, SARS-CoV-2 L21F, SARS-CoV
  • the target sequence is a portion of a nucleic acid from a genomic locus, a transcribed mRNA, or a reverse transcribed cDNA from a gene locus of bacterium or other agents responsible for a disease in the sample comprising a mutation that confers resistance to a treatment, such as a single nucleotide mutation that confers resistance to antibiotic treatment.
  • the target sequence is a portion of a nucleic acid from a subject having cancer.
  • the cancer can be a solid cancer (tumor).
  • the cancer can be a blood cell cancer, including leukemias and lymphomas.
  • Non-limiting types of cancer that could be treated with such methods and compositions include colon cancer, rectal cancer, renal-cell carcinoma, liver cancer, bladder cancer, cancer of the kidney or ureter, lung cancer, cancer of the small intestine, esophageal cancer, melanoma, bone cancer, pancreatic cancer, skin cancer, brain cancer (e.g., glioblastoma), cancer of the head or neck, melanoma, uterine cancer, ovarian cancer, breast cancer, testicular cancer, cervical cancer, stomach cancer, Hodgkin’s Disease, non-Hodgkin’s lymphoma, thyroid cancer.
  • the cancer can be a leukemia, such as, by way of non-limiting example, acute myeloid (or myelogenous) leukemia (AML), chronic myeloid (or myelogenous) leukemia (CML), acute lymphocytic (or lymphoblastic) leukemia (ALL), and chronic lymphocytic leukemia (CLL).
  • AML acute myeloid (or myelogenous) leukemia
  • CML chronic myeloid (or myelogenous) leukemia
  • ALL acute lymphocytic leukemia
  • CLL chronic lymphocytic leukemia
  • the target sequence is a portion of a nucleic acid from a cancer cell.
  • a cancer cell can be a cell harboring one or more mutations that results in unchecked proliferation of the cancer cell. Such mutations are known in the art.
  • Non-limiting examples of antigens are ADRB3, AKAP-4,ALK, Androgen receptor, B7H3, BCMA, BORIS, BST2, CAIX, CD 179a, CD123, CD171, CD19, CD20, CD22, CD24, CD30, CD300LF, CD33, CD38, CD44v6, CD72, CD79a, CD79b, CD97, CEA, CLDN6, CLEC12A, CLL-1, CS-1, CXORF61, CYP1B1, Cyclin B 1, E7, EGFR, EGFRvIII, ELF2M, EMR2, EPC AM, ERBB2 (Her2/neu), ERG (TMPRSS2 ETS fusion gene), ETV6-AML, EphA2, Ephrin B2, FAP, FC&RL5, FLT3, Folate receptor alpha, Folate receptor beta, Fos-related antigen 1, Fucosyl GM1, GD2, GD3, GM3, GPC3, GPR20, GPRC5D, Glob
  • the target sequence is a portion of a nucleic acid from a control gene in a sample.
  • the control gene is an endogenous control.
  • the endogenous control can include human 18S rRNA, human GAPDH, human HPRT1, human GUSB, human RNase P, MS2 bacteriophage, or any other control sequence of interest within the sample.
  • the sample used for cancer testing or cancer risk testing can comprise at least one target sequence or target nucleic acid segment that can bind to a guide nucleic acid of the reagents described herein.
  • the target nucleic acid segment in some cases, is a portion of a nucleic acid from a gene with a mutation associated with cancer, from a gene whose overexpression is associated with cancer, a tumor suppressor gene, an oncogene, a checkpoint inhibitor gene, a gene associated with cellular growth, a gene associated with cellular metabolism, or a gene associated with cell cycle.
  • the target nucleic acid encodes for a cancer biomarker, such as a prostate cancer biomarker or non-small cell lung cancer.
  • the assay can be used to detect “hotspots” in target nucleic acids that can be predictive of cancer, such as lung cancer, cervical cancer, in some cases, the cancer can be a cancer that is caused by a virus.
  • viruses that cause cancers in humans include Epstein-Barr virus (e.g, Burkitt’s lymphoma, Hodgkin’s Disease, and nasopharyngeal carcinoma); papillomavirus (e.g, cervical carcinoma, anal carcinoma, oropharyngeal carcinoma, penile carcinoma); hepatitis B and C viruses (e.g, hepatocellular carcinoma); human adult T-cell leukemia virus type 1 (HTLV-1) (e.g, T-cell leukemia); and Merkel cell polyomavirus (e.g, Merkel cell carcinoma).
  • Epstein-Barr virus e.g, Burkitt’s lymphoma, Hodgkin’s Disease, and nasopharyngeal carcinoma
  • papillomavirus
  • the target nucleic acid is a portion of a nucleic acid that is associated with a blood fever.
  • the mutation is located in a portion of a nucleic acid from a genomic locus, any DNA amplicon of, a reverse transcribed mRNA, or a cDNA from a locus of at least one of: ALK, APC, ATM, AXIN2, BAP1, BARD1, BLM, BMPR1A, BRCA1, BRCA2, BRIP1, CASR, CDC73, CDH1, CDK4, CDKN1B, CDKN1C, CDKN2A, CEBPA, CHEK2, CTNNA1, DICER1, DIS3L2, EGFR, EPCAM, FH, FLCN, GATA2, GPC3, GREM1, HOXB13, HRAS, system, MAX, MEN1, MET, MITF, MLH1, MSH2, MSH3, M
  • the sample used for genetic disorder testing can comprise at least one target sequence or target nucleic acid segment that can bind to a guide nucleic acid of the reagents described herein.
  • the genetic disorder is hemophilia, sickle cell anemia, P-thalassemia, Duchene muscular dystrophy, severe combined immunodeficiency, or cystic fibrosis.
  • the target nucleic acid in some cases, is from a gene with a mutation associated with a genetic disorder, from a gene whose overexpression is associated with a genetic disorder, from a gene associated with abnormal cellular growth resulting in a genetic disorder, or from a gene associated with abnormal cellular metabolism resulting in a genetic disorder.
  • the target nucleic acid is a nucleic acid from a genomic locus, a transcribed mRNA, or a reverse transcribed mRNA, a DNA amplicon of or a cDNA from a locus of at least one of: CFTR, FMRI, SMN1, ABCB11, ABCC8, ABCD1, ACAD9, AC ADM, ACADVL, AC ATI, ACOX1, ACSF3, ADA, ADAMTS2, ADGRG1, AGA, AGL, AGPS, AGXT, AIRE, ALDH3A2, ALDOB, ALG6, ALMS1, ALPL, AMT, AQP2, ARG1, ARSA, ARSB, ASL, ASNS, ASPA, ASS1, ATM, ATP6V1B1, ATP7A, ATP7B, ATRX, BBS1, BBS10, BBS12, BBS2, BCKDHA, BCKDHB, BCS1L, BLM, BSND, CAPN3, CBS, CDH23
  • target nucleic acids are amplified and/or comprise amplicons.
  • the methods described herein can comprise amplifying a target nucleic acid molecule, and/or amplifying a nucleic acid molecule in a sample to produce a target nucleic acid molecule.
  • Amplification can occur prior to or simultaneously with detection of a signal indicative of reporter cleavage.
  • amplification and detection can occur within the same reaction volume sequentially or simultaneously.
  • amplification and detection can occur sequentially in different reaction volumes.
  • amplification and detection can occur at different temperatures.
  • amplification and detection can occur at the same temperature.
  • amplifying can improve at least one of sensitivity, specificity, or accuracy of the detection of the target nucleic acid.
  • Amplification can be isothermal or can comprise thermocycling.
  • amplification comprises transcription mediated amplification (TMA), helicase dependent amplification (HD A), circular helicase dependent amplification (cHDA), strand displacement amplification (SDA), recombinase polymerase amplification (RPA), polymerase chain reaction (PCR), quantitative PCR (qPCR), nested PCR, multiplex PCR, degenerative PCR, asymmetric PCR, touchdown PCR, random primer PCR, hemi-nested PCR, polymerase cycling assembly (PCA), colony PCR, ligase chain reaction (LCR), digital PCR, methylation-specific PCR (MSP), co-amplification at lower denaturation temperature PCR (COLD-PCR), allele-specific PCR, intersequence-specific PCR (ISS-PCR), whole genome amplification (WGA), inverse PCR, thermal asymmetric interlaced PCR (TAIL- PCR), loop mediated mediated amplification (TM
  • amplification of the target nucleic acid comprises modifying the sequence of the target nucleic acid.
  • amplification can be used to insert a protospacer adjacent motif (PAM) sequence into a target nucleic acid that lacks a PAM sequence.
  • PAM protospacer adjacent motif
  • amplification is used to increase the homogeneity of a target nucleic acid in a sample.
  • amplification can be used to remove a nucleic acid variation that is not of interest in the target nucleic acid sequence.
  • the term “subject” refers to any living or non-living organism including, but not limited to, a human (e.g. , a male human, female human, fetus, pregnant female, child, or the like), a non-human mammal, or a non-human animal.
  • Any human or non-human animal can serve as a subject, including but not limited to mammal, reptile, avian, amphibian, fish, ungulate, ruminant, bovine (e.g, cattle), equine (e.g., horse), caprine and ovine (e.g., sheep, goat), swine (e.g, pig), camelid (e.g, camel, llama, alpaca), monkey, ape (e.g., gorilla, chimpanzee), ursid (e.g., bear), poultry, dog, cat, mouse, rat, fish, dolphin, whale and shark.
  • a subject is a male or female of any age (e.g, a man, a woman, or a child).
  • a reference sequence refers to a sequence of nucleotide bases.
  • a reference sequence is a reference genome.
  • a “genome” or “reference genome” can refer to any particular known, sequenced or characterized genome, whether partial or complete, of any organism or virus that can be used to reference identified sequences from a subject. Exemplary reference genomes used for human subjects as well as many other organisms are provided in the on-line genome browser hosted by the National Center for Biotechnology Information (“NCBI”) or the University of California, Santa Cruz (UCSC).
  • NCBI National Center for Biotechnology Information
  • UCSC Santa Cruz
  • a “genome” refers to the complete genetic information of an organism or virus, expressed in nucleic acid sequences.
  • a reference sequence or reference genome often is an assembled or partially assembled genomic sequence from an individual subject or from multiple subjects.
  • a reference genome is an assembled or partially assembled genomic sequence from one or more human subjects.
  • a reference genome is an assembled or partially assembled genomic sequence from one or more microorganisms of the same species.
  • the reference genome can be viewed as a representative example of a species’ set of genes.
  • a reference genome comprises sequences assigned to chromosomes.
  • Exemplary human reference genomes include but are not limited to NCBI build 34 (UCSC equivalent: hg!6), NCBI build 35 (UCSC equivalent: hg!7), NCBI build 36.1 (UCSC equivalent: hg!8), GRCh37 (UCSC equivalent: hg!9), and GRCh38 (UCSC equivalent: hg38).
  • NCBI build 34 UCSC equivalent: hg!6
  • NCBI build 35 UCSC equivalent: hg!7
  • NCBI build 36.1 UCSC equivalent: hg!8
  • GRCh37 UCSC equivalent: hg!9
  • GRCh38 GRCh38
  • FIG. 1 is a block diagram illustrating a system 100 for determining a mutation call for a target locus in a biological sample, in accordance with some implementations.
  • the device 100 in some implementations includes one or more central processing units (CPU(s)) 102 (also referred to as processors), one or more network interfaces 104, a user interface 106, a non-persistent memory 111, a persistent memory 112, and one or more communication buses 110 for interconnecting these components.
  • the one or more communication buses 110 optionally include circuitry (sometimes called a chipset) that interconnects and controls communications between system components.
  • the non-persistent memory 111 typically includes high-speed random access memory, such as DRAM, SRAM, DDR RAM, ROM, EEPROM, flash memory, whereas the persistent memory 112 typically includes CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices.
  • the persistent memory 112 optionally includes one or more storage devices remotely located from the CPU(s) 102.
  • the persistent memory 112, and the non-volatile memory device(s) within the non-persistent memory 112 comprises non-transitory computer readable storage medium.
  • the non-persistent memory 111 or alternatively the non-transitory computer readable storage medium stores the following programs, modules and data structures, or a subset thereof, sometimes in conjunction with the persistent memory 112:
  • an optional operating system 116 which includes procedures for handling various basic system services and for performing hardware dependent tasks;
  • a signal data store 120 comprising, for each respective well 122 in a plurality of wells, a corresponding plurality of reporting signals 124 for a respective one or more nucleic acid molecules in the biological sample that map to the target locus, where: o the plurality of wells comprises a first set of wells 122 (e.g., 122-1-1, ... 122- K-l) representing a first allele for the target locus, o the plurality of wells further comprises a second set of wells 122 (e.g., 122-1- 2, ...
  • each corresponding plurality of reporting signals 124 comprises, for each respective time point in a plurality of time points (e.g, P time points, where P is a positive integer), a respective reporting signal in the form of a corresponding discrete attribute value (e.g., 124-1-1-1, ... 124-1-1-P; 124-2-1- 1, ...
  • each respective well 122 in the first set of wells and each respective well 122 in the second set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample
  • the signal data store 120 further comprises a plurality of control wells 126 that are free of nucleic acid derived from the biological sample, including a first set of control wells 126-1 representing the first allele for the target locus and a second set of control wells 126-2 representing the second allele for the target locus;
  • a data analysis construct 130 for determining: o for each respective well 122 in the plurality of wells, a corresponding signal yield 132 (e.g., 132-1-1, 132-1-2) for the respective well 122 using the corresponding plurality of reporting signals across the plurality of time points, and o for each respective well 122 in the first set of wells (e.g., 122-1-1, ...
  • a respective candidate call identity 134 (e.g, 134-1) based on a comparison between a corresponding first signal yield (e.g, 132-1-1) for the respective well in the first set of wells and a corresponding second signal yield (e.g, 132- 1-2) for a corresponding well in the second set of wells; and
  • a voting construct 140 for performing a voting procedure across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus.
  • one or more of the above identified elements are stored in one or more of the previously mentioned memory devices and correspond to a set of instructions for performing a function described above.
  • the above identified modules, data, or programs (e.g., sets of instructions) need not be implemented as separate software programs, procedures, data sets, or modules, and thus various subsets of these modules and data may be combined or otherwise re-arranged in various implementations.
  • the non-persistent memory 111 optionally stores a subset of the modules and data structures identified above. Furthermore, in some embodiments, the memory stores additional modules and data structures not described above.
  • one or more of the above identified elements is stored in a computer system, other than that of system 100, that is addressable by system 100 so that system 100 may retrieve all or a portion of such data when needed.
  • FIG. 1 depicts a “system 100,” the figures are intended more as a functional description of the various features which may be present in computer systems than as a structural schematic of the implementations described herein. In practice, and as recognized by those of ordinary skill in the art, items shown separately could be combined and some items could be separated. Moreover, although FIG. 1 depicts certain data and modules in non-persistent memory 111, some or all of these data and modules may be in persistent memory 112.
  • FIGS. 2A-2G While a system in accordance with the present disclosure has been disclosed with reference to FIG. 1, a method in accordance with the present disclosure is now detailed with reference to FIGS. 2A-2G.
  • the present disclosure provides a method 200 for determining a mutation call for a target locus in a biological sample.
  • the method is performed at a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors.
  • the method includes obtaining a signal dataset 120 comprising, for each respective well 122 in a plurality of wells in a common plate, a corresponding plurality of reporting signals 124 for a respective one or more nucleic acid molecules in the biological sample that map to the target locus.
  • the signal dataset 120 represents a plurality of time points.
  • the plurality of wells 122 comprises a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus.
  • the plurality of wells 122 further comprises a second set of wells representing a second allele for the target locus, where each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus.
  • Each corresponding plurality of reporting signals 124 comprises, for each respective time point in the plurality of time points, a respective reporting signal 124 in the form of a corresponding discrete attribute value.
  • Each respective well in the first set of wells and each respective well in the second set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample.
  • the biological sample is a clinical sample, a diagnostic sample, an environmental sample, a consumer quality sample, a food sample, a biological product sample, a microbial testing sample, a tumor sample, a forensic sample and/or a laboratory or hospital sample.
  • the biological sample is obtained from a human or an animal.
  • a biological sample is a sample from a patient undergoing a treatment.
  • the biological sample is collected from an environmental source, such as a field (e.g, an agricultural field), lake, river, creek, ocean, watershed, water tank, water reservoir, pool (e.g, swimming pool), pond, air vent, wall, roof, soil, plant, and/or other environmental source.
  • an industrial source such as a clean room (e.g, in manufacturing or research facilities), hospital, medical laboratory, pharmacy, pharmaceutical compounding center, food processing area, food production area, water or waste treatment facility, and/or food product.
  • the biological sample is an air sample, such as ambient air in a facility (e.g, a medical facility or other facility), exhaled or expectorated air from a subject, and/or aerosols, including any biological contaminants present therein (e.g, bacteria, fungi, viruses, and/or pollens).
  • the biological sample is a water sample, such as dialysis systems in medical facility (e.g, to detect waterborne pathogens of clinical significance and/or to determine the quality of water in a facility).
  • the biological sample is an environmental surface sample, such as before or after a sterilization or disinfecting process (e.g, to confirm the effectiveness of the sterilization or disinfecting procedure).
  • the biological sample is obtained from a subject, such as a human (e.g, a patient). In some embodiments, the biological sample is obtained from any tissue, organ or fluid from the subject. In some embodiments, a plurality of biological samples is obtained from the subject (e.g, a plurality of replicates and/or a plurality of samples including a healthy sample and a diseased sample). [00181] In some embodiments, the biological sample is a clinical sample. In some embodiments, the biological sample is a bodily fluid. In some embodiments, the bodily fluid is sputum, saliva, nasopharyngeal fluid, oropharyngeal fluid or blood. In some embodiments, the biological sample is obtained from a nasopharyngeal or oropharyngeal swab. In some embodiments, the biological sample is obtained from a sample repository.
  • the biological sample is obtained from a human with a disease condition (e.g., an infectious disease and/or a disease caused by a pathogenic microorganism).
  • a disease condition e.g., an infectious disease and/or a disease caused by a pathogenic microorganism.
  • the biological sample is obtained from a subject that is infected with a pathogen.
  • the pathogen is a virus, bacteria, fungus, or parasite.
  • the pathogen is the causative agent of influenza, common cold, measles, rubella, chickenpox, norovirus, polio, infectious mononucleosis (mono), herpes simplex virus (HSV), human papillomavirus (HPV), human immunodeficiency virus (HIV), viral hepatitis (e.g., hepatitis A, B, C, D, and/or E), viral meningitis, West Nile Virus, rabies, ebola, strep throat, bacterial urinary tract infections (UTIs) (e.g, coliform bacteria), bacterial food poisoning (e.g, E.
  • UTIs e.g, coliform bacteria
  • E bacterial food poisoning
  • coli coli, Salmonella, and/ or Shigella
  • bacterial cellulitis e.g, Staphylococcus aureus (MRSA)
  • MRSA Staphylococcus aureus
  • bacterial vaginosis e.g., bacterial vaginosis, gonorrhea, chlamydia, syphilis, Clostridium difficile (C.
  • the pathogen is the causative agent of one or more viral respiratory diseases.
  • the pathogen is the causative agent of a coronavirus infection. Referring to Block 206, in some embodiments, the pathogen is severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
  • the biological sample is any of the embodiments for samples described herein.
  • Other suitable embodiments of samples and/or subjects include those disclosed above (see, for example, the sections entitled “Target Nucleic Acids: Samples” and “Subjects,” above), and any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art.
  • the target locus is selected from the group consisting of a single nucleotide variant, a multi-nucleotide variant, an indel, a DNA rearrangement, and a copy number variation.
  • the target locus is selected from the group consisting of a restriction fragment length polymorphism (RFLP), a random amplified polymorphic DNA (RAPD), an amplified fragment length polymorphism (AFLP), a variable number tandem repeat (VNTR), an oligonucleotide polymorphism (OP), a single nucleotide polymorphism (SNP), an allele specific associated primer (ASAP), an inverse sequence-tagged repeat (ISTR), an inter-retrotransposon amplified polymorphism (IRAP), and a simple sequence repeat (SSR or microsatellite).
  • RFLP restriction fragment length polymorphism
  • RAPD random amplified polymorphic DNA
  • AFLP amplified fragment length polymorphism
  • VNTR variable number tandem repeat
  • OP oligonucleotide polymorphism
  • SNP single nucleotide polymorphism
  • ASAP allele specific associated primer
  • ISTR inverse sequence-tagged repeat
  • IRAP
  • the target locus is selected from a database.
  • the target locus maps to a corresponding reference sequence (e.g., a reference genome), and the corresponding reference sequence is obtained from a nucleotide sequence database.
  • Suitable nucleotide sequence databases include global genome databases and/or microorganism-specific genome databases.
  • a reference sequence for a microorganism is obtained from NCBI, BLAST, EMBL-EBI, GenBank, Ensembl, EuPathDB, The Human Microbiome Project, Pathogen Portal, RDP, SILVA, GREENGENES, EBI Metagenomics, EcoCyc, PATRIC, TBDB, PlasmoDB, the Microbial Genome Database (MBGD), and/or the Microbial Rosetta Stone Database.
  • the target locus is a gene. In some embodiments, the target locus maps to all or a portion of a gene. In some embodiments, the target locus maps to all or a portion of a plurality of genes.
  • the target locus comprises DNA or RNA.
  • the target locus maps to all or a portion of a reference genome that consists essentially of RNA sequences.
  • the target locus maps to all or a portion of a reference genome that consists essentially of DNA sequences.
  • the target locus comprises DNA, and the one or more nucleic acid molecules in the biological sample that map to the target locus is obtained from a transcription reaction of a DNA molecule for the target locus.
  • the transcription reaction generates one or more RNA transcripts that correspond to the target locus and, for each respective well in the plurality of wells in the common plate, the corresponding aliquot of nucleic acid derived from the biological sample includes the one or more RNA transcripts used for obtaining the respective plurality of reporting signals.
  • the target locus, nucleic acids, mutations, and/or reference sequences include any of the embodiments described herein.
  • Other suitable embodiments of target loci, nucleic acids, mutations, and/or reference sequences include those disclosed above (see, for example, the sections entitled “Target Loci,” “Target Nucleic Acids,” and “Reference Sequences,” above), as well as any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art.
  • the target locus comprises a plurality of alleles, including a first allele and a second allele.
  • the first allele is wild-type
  • the second allele is mutant
  • the mutation call is wild-type or mutant.
  • the first allele is a first mutant
  • the second allele is a second mutant.
  • the mutation call is for the first mutant allele or the second mutant allele.
  • the first allele and the second allele are selected from a plurality of alleles for the respective target locus.
  • the plurality of alleles includes at least 2, at least 3, at least 4, at least 5, or at least 8 alleles. In some embodiments, the plurality of alleles includes no more than 10, no more than 8, no more than 5, no more than 3, or no more than 2 alleles. In some embodiments, the plurality of alleles is from 2 to 5, from 3 to 8, from 3 to 10, or from 5 to 10 alleles. In some embodiments, the plurality of alleles falls within another range starting no lower than 2 alleles and ending no higher than 10 alleles.
  • the target locus comprises a plurality of alleles, including at least a wild-type allele. In some embodiments, the target locus comprises a plurality of alleles, including at least a mutant allele. In some embodiments, the target locus comprises a plurality of alleles, including at least a first mutant allele and a second mutant allele. In some embodiments, the target locus comprises one or more mutant alleles. For example, in some embodiments, the target locus comprises at least 2, at least 3, at least 4, at least 5, or at least 8 mutant alleles. In some embodiments, the target locus comprises no more than 10, no more than 8, no more than 5, no more than 3, or no more than 2 mutant alleles.
  • the target locus comprises from 2 to 5, from 3 to 8, from 3 to 10, or from 5 to 10 mutant alleles. In some embodiments, the target locus comprises a set of mutant alleles that falls within another range starting no lower than 2 alleles and ending no higher than 10 alleles.
  • the mutation call is for a wild-type allele. In some embodiments, the mutation call is for a mutant allele. In some embodiments, the mutation call is for a respective mutant allele in a plurality of mutant alleles. In some embodiments, the mutation call is selected from the group consisting of a wild-type allele and a respective mutant allele in a plurality of mutant alleles.
  • the first plurality of guide nucleic acids that have the first allele of the target locus hybridize to the first allele. In some embodiments, the first plurality of guide nucleic acids that have the first allele of the target locus have a nucleic acid sequence that is reverse complementary to the nucleic acid sequence of the first allele. In some embodiments, the second plurality of guide nucleic acids that have the second allele of the target locus hybridize to the second allele. In some embodiments, the second plurality of guide nucleic acids that have the second allele of the target locus have a nucleic acid sequence that is reverse complementary to the nucleic acid sequence of the second allele.
  • a respective guide nucleic acid comprises any of the embodiments for guide nucleic acids disclosed herein, as well as any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art. See, for example, the section entitled “Engineered Guide Nucleic Acids,” above.
  • the corresponding aliquot of nucleic acid derived from the biological sample is RNA or DNA.
  • the corresponding aliquot of nucleic acid derived from the biological sample comprises nucleic acids obtained from within cells.
  • the corresponding aliquot of nucleic acid derived from the biological sample comprises cell-free nucleic acid molecules.
  • the one or more nucleic acid molecules comprises synthetic nucleic acid molecules.
  • Example 1 describes systems and methods for identifying mutation calls using synthetic nucleic acid molecules, in accordance with an embodiment of the present disclosure.
  • the method includes isolating the one or more nucleic acid molecules from the biological sample.
  • isolation of nucleic acid molecules includes removing one or more cells from the biological sample, subjecting the biological sample to a lysis step, and/or treating the biological sample in order to separate cellular nucleic acid molecules from cell-free nucleic acid molecules.
  • isolation of nucleic acid molecules includes isolation of cell-free nucleic acid molecules from a liquid biological sample (e.g, plasma and/or blood).
  • the method includes heat-inactivating the biological sample.
  • the method includes performing an amplification of one or more nucleic acid molecules derived from the biological sample. See, for instance, the section entitled, “Nucleic Acid Amplification,” below.
  • nucleic acid from biological samples and/or preparing biological samples for the same are contemplated for use in the present disclosure, as will be apparent to one skilled in the art.
  • the plurality of wells 122 comprises at least 3, at least 5, at least 10, at least 20, at least 40, at least 50, at least 80, at least 100, at least 300, or at least 500 wells. In some embodiments, the plurality of wells comprises no more than 1000, no more than 500, no more than 200, no more than 80, no more than 50, or no more than 30 wells. In some embodiments, the plurality of wells is from 3 to 20, from 5 to 100, from 8 to 50, or from 10 to 500 wells. In some embodiments, the plurality of wells falls within another range starting no lower than 3 wells and ending no higher than 1000 wells.
  • the common plate is a multi-well plate.
  • the signal dataset 120 is obtained by a procedure comprising amplifying a first plurality of nucleic acids 304 derived from the biological sample, thereby generating a plurality of amplified nucleic acids 404.
  • the procedure further includes, for each respective well 122 in the first set of wells and each respective well in the second set of wells, partitioning, from the plurality of amplified nucleic acids 404, the respective corresponding aliquot of nucleic acid 410; contacting the respective corresponding aliquot of nucleic acid with at least a programmable nuclease, a guide nucleic acid, and one or more reporters; and cleaving the one or more reporters using the programmable nuclease if the first allele or the second allele, respectively, is present, thereby generating a reporting signal 124.
  • a biological sample 302 is used to obtain a plurality of nucleic acids 304.
  • the nucleic acids 304 are extracted from the biological sample 302.
  • the signal dataset 120 is obtained by a procedure 306 such as a DETECTR reaction, which includes an optional amplification reaction 308 and a programmable nuclease-based assay 310.
  • the amplifying 308 includes any suitable method for amplification of nucleic acids known in the art, such as reverse transcriptase loop-mediated isothermal amplification (RT-LAMP).
  • the programmable nuclease-based assay 310 also includes any suitable detection reaction, such as a Cas nuclease-based reaction, in which recognition of a target nucleic acid in the sample nucleic acids 304 by a guide nucleic acid (e.g, gRNA) induces cleavage of a reporter (e.g, a nucleic acid probe) by a Cas nuclease.
  • a guide nucleic acid e.g, gRNA
  • the Cas nuclease is complexed with the guide nucleic acid. Reporting signals generated from cleavage of the reporter can be detected, measured, and used for mutation calling and variant identification 312, in accordance with methods of the present disclosure.
  • the procedure 306 is a DETECTR reaction. In some embodiments, the procedure 306 is SHERLOCK. In some embodiments, the signal dataset 120 is obtained by a procedure that does not comprise an amplification step.
  • the amplifying is performed using isothermal amplification, loop-mediated isothermal amplification (LAMP), reverse transcriptase loop-mediated isothermal amplification (RT-LAMP), recombinase polymerase amplification (RPA), reverse transcriptase recombinase polymerase amplification (RT-RPA), polymerase chain reaction (PCR), or reverse transcriptase polymerase chain reaction (RT- PCR).
  • LAMP loop-mediated isothermal amplification
  • RPA reverse transcriptase loop-mediated isothermal amplification
  • RPA reverse transcriptase polymerase amplification
  • PCR polymerase chain reaction
  • RT- PCR reverse transcriptase polymerase chain reaction
  • the amplifying is performed using transcription mediated amplification (TMA), helicase dependent amplification (HD A), circular helicase dependent amplification (cHDA), strand displacement amplification (SDA), recombinase polymerase amplification (RPA), polymerase chain reaction (PCR), quantitative PCR (qPCR), nested PCR, multiplex PCR, degenerative PCR, asymmetric PCR, touchdown PCR, random primer PCR, hemi-nested PCR, polymerase cycling assembly (PCA), colony PCR, ligase chain reaction (LCR), digital PCR, methylation-specific PCR (MSP), co-amplification at lower denaturation temperature PCR (COLD-PCR), allele-specific PCR, intersequence-specific PCR (ISS-PCR), whole genome amplification (WGA), inverse PCR, thermal asymmetric interlaced PCR (TAIL-PCR), loop mediated amplification (LAMP), exponential amplification reaction (
  • TMA transcription mediated
  • isothermal amplification techniques employ a polymerase and a set of specialized primers designed to recognize distinct sequences in one or more target nucleic acids.
  • LAMP techniques typically perform amplification of target nucleic acid molecules at a constant temperature (e.g, 60-65 °C) using multiple inner and outer primers and a polymerase having strand displacement activity.
  • LAMP is initiated by an inner primer pair containing a nucleic acid sequence complementary to a portion of the sense and antisense strands of the target nucleic acid.
  • strand displacement synthesis primed by an outer primer pair can cause release of a single-stranded amplicon.
  • the single-stranded amplicon can serve as a template for further synthesis primed by a second inner and second outer primer that hybridize to the other end of the target nucleic acid and produce a stem-loop nucleic acid structure.
  • a second inner and second outer primer that hybridize to the other end of the target nucleic acid and produce a stem-loop nucleic acid structure.
  • one inner primer hybridizes to the loop on the product and initiates displacement and target nucleic acid synthesis, yielding the original stem-loop product and a new stem-loop product with a stem twice as long.
  • the 3’ terminus of an amplicon loop structure serves as initiation site for self-templating strand synthesis, yielding a hairpin-like amplicon that forms an additional loop structure to prime subsequent rounds of self-templated amplification.
  • the amplification continues with accumulation of many copies of the target nucleic acid.
  • the final products of the LAMP process are stem-loop nucleic acids with concatenated repeats of the target nucleic acid in cauliflower-like structures with multiple loops formed by annealing between alternately inverted repeats of a target nucleic acid sequence in the same strand.
  • LAMP assays produce a detectable signal (e.g, fluorescence) during the amplification reaction.
  • the method comprises detecting and/or quantifying a detectable signal (e.g, fluorescence) produced during the LAMP assay. Any suitable method for detecting and quantifying florescence can be used.
  • the amplifying is performed using any of the embodiments described herein, or any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art. See, for example, the section entitled “Target Nucleic Acids: Amplification Techniques,” above.
  • the method further includes, prior to the amplifying, partitioning the first plurality of nucleic acids 304 derived from the biological sample into a plurality of amplification replicates 404 (e.g, 404-1, 404-2, 404-3), where the amplifying generates a respective plurality of amplified nucleic acids for each respective amplification replicate.
  • the method comprises generating aliquots 404 for amplification reactions, as illustrated in FIG. 4A.
  • the first plurality of nucleic acids is not partitioned into amplification replicates prior to the amplifying.
  • the plurality of amplification replicates comprises at least 3, at least 4, at least 5, at least 6, at least 8, at least 10, at least 15, or at least 20 amplification replicates. In some embodiments, the plurality of amplification replicates comprises no more than 30, no more than 20, no more than 10, or no more than 6 amplification replicates. In some embodiments, the plurality of amplification replicates is from 3 to 9, from 4 to 12, or from 10 to 25 amplification replicates. In some embodiments, the plurality of amplification replicates falls within another range starting no lower than 3 replicates and ending no higher than 30 replicates.
  • the method further includes applying an amplification threshold filter to each respective amplification replicate, where, when the respective plurality of amplified nucleic acids fails to satisfy an amplification threshold, the respective amplification replicate is removed from the plurality of amplification replicates.
  • the method further includes applying an amplification threshold filter to each respective biological sample in a plurality of biological samples, where, when the respective plurality of amplified nucleic acids fails to satisfy an amplification threshold, the respective sample is removed from the plurality of samples.
  • the amplification threshold is established by performing a visual inspection of amplification reaction curves.
  • the amplification reaction curve is a receiver operating characteristic (ROC) curve.
  • the amplification threshold is determined by selecting an optimal Youden Index point such that the absolute difference between a sensitivity metric and a false positive rate calculated using reference amplification data is maximized.
  • the amplification threshold is derived using positive and negative (e.g, no template) amplification controls from the amplification reaction (e.g, LAMP) of a given sample.
  • the amplification threshold includes a time threshold and/or a fluorescence rate threshold.
  • Positive amplification controls are deemed to represent an ideal sample, where the ideal sample exhibits a classic sigmoidal rise of fluorescence over time, and negative amplification controls are deemed to represent the background fluorescence.
  • an ideal sample is approximated as having fluorescence kinetics similar to that of a positive amplification control.
  • the amplification threshold is determined based on a mean value between the amplification controls within a common plate. This determination can be used to reduce the impact of background that can occur in amplification replicates.
  • the determination of the amplification threshold further includes collecting fluorescence values at the time threshold.
  • the time threshold can be used to exclude those samples that would amplify closer to the endpoint, signifying the amplification intermediates (e.g, LAMP intermediates) to be the majority contributors of the rise in the signal and not the actual sample itself.
  • the determination further includes assigning a score for each sample, where the score is calculated as a ratio of the rate of fluorescence threshold to the rate fluorescence value at the time threshold for each sample.
  • a threshold ratio e.g, 1
  • samples are deemed to have failed to reach the minimum fluorescence required to be called out as amplified and, when the ratio is greater than or equal to the threshold ratio (e.g, 1), then samples are deemed to have amplified sufficiently.
  • the determination of the amplification threshold comprises performing an ROC analysis using the calculated score and/or a measured amplification metric (e.g, ground truth) of each sample in a plurality of samples, thereby identifying an exact score value for the respective sample.
  • a measured amplification metric e.g, ground truth
  • the time threshold is from 5 minutes to 40 minutes. In some embodiments, the time threshold is a time point within the total duration of the amplification reaction that is at least 10%, at least 20%, or at least 30% into the total duration and is no more than 90%, no more than 80%, or no more than 70% into the total duration. For example, in some embodiments, the amplification reaction has a duration of 40 minutes and the time threshold is selected from a range of from 10 minutes to 30 minutes. In some embodiments, the amplification reaction has a duration of 40 minutes and the time threshold is 18 minutes.
  • one or more amplification replicates in a plurality of amplification replicates are pooled and/or diluted, e.g, by any suitable method known in the art.
  • the method further includes, prior to the contacting, partitioning the respective plurality of amplified nucleic acids for each respective amplification replicate 404 into a plurality of detection replicates 408, where each respective detection replicate 410 in the plurality of detection replicates represents a respective well 122 in the plurality of wells.
  • the method comprises generating a respective plurality of aliquots 408 from a respective amplification reaction 404, as illustrated in FIGS. 4A-4B.
  • each respective aliquot 410 corresponds to a respective well 122 in the plurality of wells.
  • wells are grouped into a plurality of sets depending on the type of guide nucleic acid used to contact the corresponding aliquot of nucleic acid, as illustrated in FIGS. 4A-4B (e.g, a first set of wells comprises allele 1 (WT) guide sequences and a second set of wells comprises allele 2 (MUT) guide sequences).
  • each well in the first set of wells contains amplified nucleic acids that are interrogated by the allele 1 guide sequences (WT; set “A”) and each well in the second set of wells contains amplified nucleic acids that are interrogated by the allele 2 guide sequences (MUT; set “B”).
  • each respective detection replicate represents a respective well in the plurality of wells.
  • each respective detection replicate represents a pair of wells in a respective comparative programmable nuclease-based reaction.
  • a pair of wells includes (i) a first well in which the corresponding aliquot of nucleic acid is interrogated with a guide nucleic acid having the first allele and (ii) a second well in which the corresponding aliquot of nucleic acid is interrogated with a guide nucleic acid having the second allele.
  • a detection replicate includes, e.g, well 410-1-1-1-A (e.g., well 410 for plate 1, amplification replicate 1, detection reaction 1, and wild-type nucleic acid guide “A”) and well 410- 1-1-1 -B (e.g., well 410 for plate 1, amplification replicate 1, detection reaction 1, mutant nucleic acid guide “B”).
  • well 410-1-1-1-A e.g., well 410 for plate 1, amplification replicate 1, detection reaction 1, and wild-type nucleic acid guide “A”
  • well 410- 1-1-1 -B e.g., well 410 for plate 1, amplification replicate 1, detection reaction 1, mutant nucleic acid guide “B”.
  • the plurality of partitioned detection replicates comprises at least 3, at least 4, at least 5, at least 6, at least 8, at least 10, at least 15, or at least 20 detection replicates. In some embodiments, for each respective amplification replicate, the plurality of partitioned detection replicates comprises no more than 30, no more than 20, no more than 10, or no more than 6 detection replicates. In some embodiments, for each respective amplification replicate, the plurality of partitioned detection replicates is from 3 to 9, from 4 to 12, or from 10 to 25 detection replicates. In some embodiments, for each respective amplification replicate, the plurality of partitioned detection replicates falls within another range starting no lower than 3 replicates and ending no higher than 30 replicates. In some embodiments, nucleic acids in the biological sample, and/or a respective plurality of amplified nucleic acids derived therefrom, are not partitioned into a plurality of detection replicates prior to the contacting.
  • the cleaving the one or more reporters using the programmable nuclease is performed using a detection assay, such as a programmable nuclease-based step in a DETECTR assay.
  • a detection assay such as a programmable nuclease-based step in a DETECTR assay. See, e.g, Broughton et al., “CRISPR-Cas 12- based detection of SARS-CoV-2,” Nature Biotechnology 38, 870-874 (2020), which is hereby incorporated herein by reference in its entirety.
  • the method further comprises, prior to the amplifying, partitioning the first plurality of nucleic acids 304 derived from the biological sample into a plurality of replicates for each locus in a plurality of loci (e.g, an amino acid position, gene, and/or target sequence of interest).
  • nucleic acids in the biological sample are not partitioned into a plurality of replicates for each locus in a plurality of loci.
  • the method further comprises, prior to the contacting, partitioning the respective plurality of amplified nucleic acids for a respective amplification replicate 404 into a plurality of replicates for each locus in a plurality of loci (e.g, an amino acid position, gene, and/or target sequence of interest). For instance, as illustrated in FIG. 4A, nucleic acids 304 are interrogated at each of a plurality of loci of interest 406.
  • a plurality of loci e.g, an amino acid position, gene, and/or target sequence of interest
  • the amplification replicates 404 are partitioned across a plurality of multi-well plates, where each respective plate corresponds to a respective target locus in the plurality of loci (e.g, 406-1, 406-2, 406-3).
  • the respective amplification replicate 404 is further partitioned into a plurality of detection replicates 408 (e.g, 408-1-1, 408-1-2, 408-1-3), where each respective detection replicate 410 in the plurality of detection replicates represents a respective well 122 in the plurality of wells.
  • one or more detection replicates in a plurality of detection replicates are pooled and/or diluted, e.g, by any suitable method known in the art.
  • the method comprises contacting the respective corresponding aliquot of nucleic acid with a plurality of programmable nucleases.
  • each respective programmable nuclease in the plurality of programmable nucleases is the same or different type of nuclease.
  • a respective programmable nuclease is a trans-cleaving programmable nuclease (e.g, capable of nonspecific cleavage of nucleic acids).
  • the programmable nuclease targets DNA (e.g, singlestranded DNA) or RNA.
  • the programmable nuclease is a Cas nuclease.
  • the Cas nuclease is selected from the group consisting of a Casl2, Casl3, Cas 14, CasPhi, and/or any subtypes or orthologs thereof.
  • the Cas nuclease is LbCasl2a, AsCasl2a, or CasDxl.
  • the programmable nuclease is any of the embodiments described herein, or any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art.
  • the programmable nuclease is any of the programmable nucleases described in Broughton et al., “CRISPR-Casl2-based detection of SARS-CoV -2, ’’ Nature Biotechnology 38, 870-874 (2020), which is hereby incorporated herein by reference in its entirety.
  • the programmable nuclease is programmed for the target locus using a corresponding guide nucleic acid in the first plurality of guide nucleic acids that have the first allele of the target locus, where the cleaving the one or more reporters using the programmable nuclease is initiated upon recognition of the target locus by the corresponding guide nucleic acid.
  • a guide nucleic acid directs a programmable nuclease to a respective target locus for cleavage by recognizing the nucleic acid sequence of the target locus.
  • Recognition of the target locus can be engineered by designing guide nucleic acids that hybridize to (e.g., are reverse complementary to) all or a portion of the nucleic acid sequence of the target locus.
  • the corresponding guide nucleic acid is complexed to the programmable nuclease.
  • the first plurality of guide nucleic acids is RNA.
  • the first plurality of guide nucleic acids is gRNA or sgRNA.
  • the first plurality of guide nucleic acids hybridizes to a wild-type sequence for the target locus. In some embodiments, the first plurality of guide nucleic acids is reverse complementary to a wild-type sequence for the target locus.
  • the programmable nuclease is programmed for the target locus using a corresponding guide nucleic acid in the second plurality of guide nucleic acids that have the second allele of the target locus, where the cleaving the one or more reporters using the programmable nuclease is initiated upon recognition of the target locus by the corresponding guide nucleic acid.
  • the second plurality of guide nucleic acids is RNA.
  • the second plurality of guide nucleic acids hybridizes to (e.g., is reverse complementary to) a mutant sequence for the target locus.
  • the guide nucleic acids comprise any of the embodiments described herein, or any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art. See, for example, the section entitled “Engineered Guide Nucleic Acids,” above.
  • each respective reporter in the one or more reporters is single-stranded DNA. In some embodiments, each respective reporter in the one or more reporters is single-stranded RNA.
  • each respective reporter in the one or more reporters comprises a fluorescent reporter linked to a quencher.
  • the one or more reporters are cleaved by the programmable nuclease when the first allele is present in the corresponding aliquot of nucleic acid derived from the biological sample, thus generating a reporting signal.
  • the one or more reporters are not cleaved by the programmable nuclease, and the reporting signal is zero or background.
  • the one or more reporters are cleaved by the programmable nuclease when the second allele is present in the corresponding aliquot of nucleic acid derived from the biological sample, thus generating a reporting signal.
  • the one or more reporters are not cleaved by the programmable nuclease, and the reporting signal is zero or background.
  • each respective reporting signal 124 is obtained from a detection moiety.
  • each respective reporting signal is a fluorescence emission by a fluorophore
  • the corresponding discrete attribute value is a fluorescence intensity.
  • each respective reporting signal is an intensity of light, and the corresponding discrete attribute value is measured in relative units (e.g., relative fluorescent units).
  • the corresponding discrete attribute value for each respective reporting signal is detected using a plate reader.
  • the respective reporting signal 124 is a lateral flow readout.
  • in the lateral flow readout is detected manually by visual inspection.
  • the lateral flow readout is detected using an image analysis algorithm (e.g., ImageJ, FIJI, etc.). Detection of reporting signals by lateral flow readout is further described in Broughton et al., “CRISPR-Cas 12-based detection of SARS-CoV-2,” Nature Biotechnology 38, 870-874 (2020), which is hereby incorporated herein by reference in its entirety.
  • the one or more reporters and/or the plurality of reporting signals, and methods of detection thereof comprise any of the embodiments described herein, as well as any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art. See, for example, the section entitled “Reporters,” above.
  • the plurality of time points represented by the signal dataset comprises between 20 and 2000 time points. In some embodiments, the plurality of time points represented by the signal dataset is at least 10, at least 20, at least 50, at least 80, at least 100, at least 200, at least 500, at least 1000, or at least 2000 time points. In some embodiments, the plurality of time points is no more than 5000, no more than 2000, no more than 1000, no more than 500, no more than 200, no more than 100, or no more than 50 time points. In some embodiments, the plurality of time points represented by the signal dataset is from 10 to 500, from 50 to 200, from 40 to 100, or from 200 to 1000. In some embodiments, the plurality of time points represented by the signal dataset falls within another range starting no lower than 10 time points and ending no higher than 5000 time points.
  • the plurality of time points is obtained over a duration of between 30 seconds and 1 hour. In some embodiments, the plurality of time points is obtained over a duration of at least 30 seconds, at least 1 minute, at least 5 minutes, at least 10 minutes, at least 30 minutes, or at least 1 hour. In some embodiments, the plurality of time points is obtained over a duration of no more than 2 hours, no more than 1 hour, no more than 30 minutes, or no more than 5 minutes. In some embodiments, the plurality of time points is obtained over a duration of from 30 seconds to 10 minutes, from 5 minutes to 1 hour, or from 20 minutes to 50 minutes. In some embodiments, the plurality of time points is obtained over a duration that falls within another range starting no lower than 30 seconds and ending no higher than 2 hours.
  • each respective time point corresponds to a respective measurement obtained during a detection assay, where the detection assay is performed over a duration of time.
  • each respective time point corresponds to a respective readout obtained from a plate reader, where each respective readout in a plurality of readouts is taken at intervals over the duration of the detection assay (e.g., every 30 seconds, every 1 second, every 0.5 seconds, etc.).
  • the plurality of reporting signals 124 for each respective well 122 in the plurality of wells comprises at least 10, at least 20, at least 50, at least 80, at least 100, at least 200, at least 500, at least 1000, or at least 2000 reporting signals.
  • the plurality of reporting signals for each respective well is no more than 5000, no more than 2000, no more than 1000, no more than 500, no more than 200, no more than 100, or no more than 50 reporting signals.
  • the plurality of reporting signals for each respective well is from 10 to 500, from 50 to 200, from 40 to 100, or from 200 to 1000 reporting signals.
  • the plurality of reporting signals for each respective well falls within another range starting no lower than 10 reporting signals and ending no higher than 5000 reporting signals.
  • the signal dataset 120 further comprises, for each respective control well in a plurality of control wells in the common plate, a corresponding plurality of control reporting signals, where the plurality of control wells are free of nucleic acid derived from the biological sample.
  • the plurality of control wells comprises a first set of control wells 126-1 representing the first allele for the target locus, where each control well in the first set of control wells includes a first plurality of guide nucleic acids that have the first allele of the target locus.
  • the plurality of control wells comprises a second set of control wells 126-2 representing the second allele for the target locus, where each control well in the second set of control wells includes a second plurality of guide nucleic acids that have the second allele of the target locus.
  • Each corresponding plurality of control reporting signals comprises, for each respective time point in the plurality of time points, a respective control reporting signal in the form of a corresponding discrete attribute value.
  • each respective control well is a negative control well and/or a no template control wells.
  • the signal dataset 120 further comprises, for each respective positive control well in a plurality of positive control wells in the common plate, a corresponding plurality of positive control reporting signals.
  • the plurality of positive control wells comprises a first set of positive control wells representing the first allele for the target locus, where each respective positive control well in the first set of positive control wells includes a first plurality of guide nucleic acids that have the first allele of the target locus.
  • the signal dataset further includes a second set of positive control wells representing the second allele for the target locus, where each respective positive control well in the second set of positive control wells comprises a second plurality of guide nucleic acids that have the second allele of the target locus.
  • each respective well in the first set of wells comprises one or more nucleic acid molecules corresponding to the first allele of the target locus
  • each respective well in the second set of wells comprises one or more nucleic acid molecules corresponding to the second allele of the target locus.
  • a respective guide nucleic acid in a respective plurality of guide nucleic acids in a well hybridizes to all or a portion of the nucleic acid sequence of the respective allele.
  • any of the embodiments disclosed herein for guide nucleic acids, reporting signals and methods of obtaining the same, alleles, target loci, and wells are contemplated for use with control wells and/or positive control wells, as will be apparent to one skilled in the art. See, e.g., the sections entitled, “Samples,” “Nucleic Acid Amplification,” “Programmable Nuclease-Based Assays,” and “Signal Dataset,” above.
  • the signal dataset is preprocessed for analysis.
  • the preprocessing comprises data cleaning, normalization, filtering, and/or any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art.
  • the method further includes determining, for each respective well 122 in the plurality of wells, a corresponding signal yield 132 for the respective well using the corresponding plurality of reporting signals 124 for the respective well across the plurality of time points.
  • the determining signal yield further comprises, for each respective well 122 in the plurality of wells, normalizing the respective plurality of reporting signals 124 for the respective well by scaling a maximum discrete attribute value by a minimum discrete attribute value in the corresponding plurality of reporting signals 124 for the respective well, thereby obtaining the corresponding signal yield 132 for the respective well.
  • the maximum discrete attribute value is an endpoint fluorescence intensity
  • the minimum discrete attribute value is a minimum fluorescence intensity
  • the signal yield threshold is obtained by determining, for each respective control well in a plurality of control wells in the common plate, a corresponding control signal yield for the respective well using the corresponding plurality of control reporting signals for the respective well across the plurality of time points, thereby obtaining a plurality of control signal yields including a maximum control signal yield and a minimum control signal yield.
  • the signal yield threshold is a respective control signal yield (e.g, the maximum and/or the minimum control signal yield), and the respective well is deemed a no-call when the signal yield for the respective well is equal to or less than the respective control signal yield.
  • the signal yield threshold is a central tendency metric for the plurality of control signal yields (e.g, an average control signal yield), and the respective well is deemed a no-call when the signal yield for the respective well is equal to or less than the central tendency metric.
  • each respective control signal yield is determined by normalizing a maximum discrete attribute value in the corresponding plurality of control reporting signals for the corresponding control well by a minimum discrete attribute value in the corresponding plurality of control reporting signals for the corresponding control well.
  • the maximum discrete attribute value and the minimum discrete attribute value will generally be similar, due to the lack of target nucleic acid derived from the biological sample. For instance, in the absence of target nucleic acids to which a corresponding guide nucleic acid can hybridize, a complexed programmable nuclease is unlikely to initiate reporter cleavage. As a result, reporting signals are expected to maintain background levels throughout the course of the detection assay.
  • each respective control signal yield in the plurality of control signal yields is 1 or about 1, and the respective well is deemed a no-call when the signal yield for the respective well is equal to 1 or about 1.
  • the method further includes determining, for each respective well 122 in the first set of wells, a respective candidate call identity 134 based on a comparison between (i) a corresponding first signal yield 132 for the respective well in the first set of wells and (ii) a corresponding second signal yield 132 for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities 134.
  • the respective candidate call identity is selected from the group consisting of first allele, second allele, and no-call (e.g, wild-type, mutant, and/or no-call).
  • the corresponding signal yield from the corresponding well in the second set of wells originates from a common biological replicate of the biological sample.
  • the first well and the corresponding well are matched for comparison across detection replicates in order to determine candidate call identities.
  • the first well and the corresponding well are matched for comparison within a comparative programmable nuclease-based reaction.
  • a respective portion of nucleic acid is divided into two wells, where a first well is interrogated by a guide nucleic acid having the first allele sequence and a second well is interrogated by a guide nucleic acid having the second allele sequence. Relative reporting signals obtained from the first well and the second well are compared.
  • Comparative programmable nuclease-based reactions are further described in detail herein (see, e.g, the section entitled, “Programmable Nuclease-Based Assays,” above).
  • the comparative programmable nuclease-based reaction is a DETECTR reaction.
  • the respective portion of nucleic acid is an aliquot of nucleic acid.
  • the respective portion of nucleic acid is a plurality of nucleic acids derived from a biological sample.
  • FIG. 4B illustrates a comparative detection reaction including, for each respective pair of wells, a first well 410 interrogated by the wild-type guide nucleic acid “A” and a second well 410 interrogated by the mutant guide nucleic acid “B.”
  • the first well and the corresponding well are matched for comparison across amplification replicates in order to determine candidate call identities.
  • the respective candidate call identity is obtained using a relative intensity metric between the corresponding first signal yield and the corresponding second signal yield.
  • the relative intensity metric is a ratio, a log ratio, a ratio of binary logarithm, a difference, and/or a percent change.
  • the relative intensity metric has the form log 2 (Fy(FAl)) / log 2 (Fy(SAl)),
  • Fy(FAl) is the first allele corresponding signal yield
  • Fy(SAl) is the second allele corresponding signal yield
  • the candidate call identity when the relative intensity metric satisfies a first threshold criterion, the candidate call identity is first allele, and when the relative intensity metric satisfies a second threshold criterion, the candidate call identity is second allele.
  • the first threshold criterion and/or the second threshold criterion is based on a control relative intensity metric for a first control signal yield and a second control signal yield.
  • the control relative intensity metric is a ratio, a log ratio, a ratio of binary logarithm, a difference, and/or a fold change.
  • the first threshold criterion is satisfied when the relative intensity metric is greater than the ratio of (i) the binary logarithm of a first control signal yield and (i) the binary logarithm of a second control signal yield, where the first control signal yield is obtained for a first control well in the first set of control wells including a first plurality of guide nucleic acids that have the first allele of the target locus, and the second control signal yield is obtained for a second control well in the second set of control wells including a second plurality of guide nucleic acids that have the second allele of the target locus. Accordingly, in some embodiments, the first threshold criterion is satisfied when the relative intensity metric is greater than log 2 (Fc(FAl)) / log 2 (Fc(SAl))
  • Fc(FAl) is the first control signal yield
  • Fc(SAl) is the second control signal yield
  • the ratio of (i) the binary logarithm of a first control signal yield and (i) the binary logarithm of a second control signal yield is 1 or about 1.
  • the candidate call identity is first allele.
  • the second threshold criterion is satisfied when the relative intensity metric is less than the ratio of (i) the binary logarithm of a first control signal yield and (i) the binary logarithm of a second control signal yield, where the first control signal yield is obtained for a first control well in the first set of control wells including a first plurality of guide nucleic acids that have the first allele of the target locus, and the second control signal yield is obtained for a second control well in the second set of control wells including a second plurality of guide nucleic acids that have the second allele of the target locus. Accordingly, in some embodiments, the second threshold criterion is satisfied when the relative intensity metric is less than log 2 (Fc(FAl)) / log 2 (Fc(SAl))
  • Fc(FAl) is the first control signal yield
  • Fc(SAl) is the second control signal yield
  • the ratio of (i) the binary logarithm of a first control signal yield and (i) the binary logarithm of a second control signal yield is 1 or about 1.
  • the candidate call identity is second allele.
  • the method further comprises obtaining a central tendency metric for the corresponding first signal yield and the corresponding second signal yield.
  • the central tendency metric is an arithmetic mean, weighted mean, log mean, midrange, midhinge, trimean, geometric mean, geometric median, Winsorized mean, median, and/or mode of the distribution of values.
  • the central tendency metric is calculated as
  • the signal dataset 120 further comprises, for each respective control well in a plurality of control wells in the common plate, a corresponding plurality of control reporting signals, where the plurality of control wells are free of nucleic acid derived from the biological sample.
  • the plurality of control wells comprises a first set of control wells 126-1 representing the first allele for the target locus, where each well in the first set of control wells includes a first plurality of guide nucleic acids that have the first allele of the target locus.
  • the plurality of control wells comprises a second set of control wells 126-2 representing the second allele for the target locus, where each well in the second set of control wells includes a second plurality of guide nucleic acids that have the second allele of the target locus.
  • Each corresponding plurality of control reporting signals comprises, for each respective time point in the plurality of time points, a respective control reporting signal in the form of a corresponding discrete attribute value.
  • the method further comprises determining, for each respective control well in the plurality of control wells, a corresponding control signal yield for the respective well using the corresponding plurality of control reporting signals for the respective well across the plurality of time points, thereby obtaining a plurality of control signal yields including a maximum control signal yield and a minimum control signal yield for the common plate.
  • the method further includes evaluating whether a respective candidate call identity 134 in the plurality of candidate call identities is a no-call using the maximum control signal yield and the minimum control signal yield.
  • the evaluating comprises performing a comparison of (i) the relative intensity metric between the corresponding first and second signal yields and (ii) the plurality of control signal yields, where, when the relative intensity metric between the corresponding first and second signal yields falls within a range bounded by the maximum control signal yield and the minimum control signal yield, the respective candidate call identity 134 is deemed a no-call.
  • each respective control signal yield is determined by normalizing a maximum discrete attribute value in the corresponding plurality of control reporting signals for the corresponding control well by a minimum discrete attribute value in the corresponding plurality of control reporting signals for the corresponding control well. As described above, in some embodiments, each respective control signal yield is 1 or about 1.
  • the method further includes obtaining a central tendency metric for the corresponding first signal yield and the corresponding second signal yield.
  • the method further comprises determining, for each respective control well in the first set of control wells, a respective control central tendency metric, thereby obtaining a plurality of control central tendency metrics including a maximum control central tendency metric and a minimum control central tendency metric.
  • the evaluating comprises performing a comparison between (i) the central tendency metric of the corresponding first and second signal yields and (ii) the plurality of control central tendency metrics for the first set of control wells, where, when the central tendency metric of the corresponding first and second signal yields falls within a range bounded by the maximum control central tendency metric and the minimum control central tendency metric, the respective candidate call identity 134 is deemed a no-call.
  • the evaluating comprises performing a comparison between (i) the central tendency metric of the corresponding first and second signal yields and (ii) the plurality of control central tendency metrics for the first set of control wells, where, when the central tendency metric of the corresponding first and second signal yields is less than a respective control central tendency threshold, the respective candidate call identity 134 is deemed a no-call.
  • the control central tendency threshold is a respective control central tendency metric in the plurality of control central tendency metrics (e.g, the maximum control central tendency metric).
  • control central tendency metric is an arithmetic mean, weighted mean, log mean, midrange, midhinge, trimean, geometric mean, geometric median, Winsorized mean, median, and/or mode of the distribution of values.
  • control central tendency metric is a log average calculated as
  • Fc(FAl) is a first control signal yield for the respective control well in the first set of control wells
  • Fc(SAl) is a second control signal yield for a corresponding control well in the second set of control wells.
  • the method further includes performing a voting procedure across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus.
  • each respective candidate call identity for each respective well in the first set of wells corresponds to a respective biological sample in one or more biological samples
  • the voting procedure comprises assigning the respective candidate call identity as the mutation call for the target locus
  • the voting procedure comprises applying a concordance vote across the plurality of candidate call identities for the first set of wells, where the mutation call for the target locus is assigned based on a common candidate call identity that is shared by at least a majority of the first set of wells. In some embodiments, the voting procedure comprises applying a concordance vote across the plurality of candidate call identities for the first set of wells, where the mutation call for the target locus is assigned based on a common candidate call identity that is shared by all of the wells in the first set of wells. In some embodiments, a respective mutation call is no-call when no candidate call identity is shared by at least a majority of the first set of wells.
  • each well in the first set of wells corresponds to a different respective replicate of an amplification reaction.
  • each well in the first set of wells corresponds to a different respective replicate of a detection assay (e.g, a programmable nuclease-based assay). Partitioning biological samples into replicates is illustrated in FIGS.
  • each well in the first set of wells corresponds to a different respective replicate of an amplification reaction, and a respective mutation call is no-call when one or more wells in the amplification reaction fails to satisfy an amplification threshold. See, e.g., the section entitled “Nucleic Acid Amplification,” above.
  • Other embodiments for voting procedures and/or concordance votes are contemplated, as will be apparent to one skilled in the art and as disclosed further herein.
  • the method further includes, prior to the performing the voting procedure, binning the first set of wells into a plurality of bins, where each respective bin comprises a respective subset of wells originating from a common respective biological replicate of the biological sample.
  • the voting procedure further comprises (i) performing, for each respective bin in the plurality of bins, a respective first concordance vote across the candidate call identities 134 for the subset of wells in the respective bin, thereby generating a plurality of bin votes 414, and (ii) applying a second concordance vote across the plurality of bin votes 414, thereby obtaining the mutation call 416 for the target locus.
  • each respective bin vote in the plurality of bin votes is selected from the group consisting of first allele, second allele, and no-call.
  • the first allele is wild-type and the second allele is mutant.
  • the first allele is a first mutant, and the second allele is a second mutant.
  • a respective first concordance vote generates a respective bin vote based on a common candidate call identity that is shared by at least a majority of the subset of wells for the respective bin. For example, as illustrated in FIG. 4B, consider the case of three wells, in which at least two wells have a candidate call identity of wild-type. In some such embodiments, the first concordance vote generates a bin vote 414 that is wild-type. Similarly, as illustrated in FIG. 4C, where the subset of wells consists of three wells and at least two of the wells have a candidate call identity of mutant, then the first concordance vote generates a bin vote 414 that is mutant.
  • the respective first concordance vote generates a respective bin vote based on a common candidate call identity that is shared by at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 95% of the wells in the subset of wells for the respective bin.
  • a respective first concordance vote generates a respective bin vote based on a common candidate call identity that is shared by all of the wells in the subset of wells for the respective bin.
  • a respective bin vote is no-call when no candidate call identity is shared by at least a majority of the subset of wells for the respective bin. For instance, in some embodiments, when the candidate call identities for the subset of wells are equally distributed between first allele and second allele, then the respective bin vote is nocall.
  • a respective bin vote is no-call when at least one well in the subset of wells for the respective bin has a candidate call identity of no-call. For instance, as illustrated in FIG. 4C, where the subset of wells consists of three wells and one of the wells has a candidate call identities of no-call, then the first concordance vote generates a bin vote 414 for the respective bin that is no-call.
  • the no-call well is removed from the plurality of wells and the bin vote is performed using the candidate call identities of the remaining wells.
  • the second concordance vote generates a mutation call 416 based on a common bin vote that is shared by at least a majority of the bins in the plurality of bins.
  • FIG. 4B illustrates a plurality of three bins, where at least two of the bins have a bin vote of wild-type, and where the second concordance vote generates a mutation call 416 for the target locus that is wild-type.
  • FIG. 4C illustrates a plurality of three bins, where at least two of the bins have a bin vote of mutant, and where the second concordance vote generates a mutation call 416 for the target locus that is mutant.
  • the second concordance vote generates a mutation call based on a common bin vote that is shared by at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 95% of the bins in the plurality of bins. In some embodiments, the second concordance vote generates a mutation call based on a common bin vote that is shared by all of the bins in the plurality of bins.
  • the mutation call is no-call when no bin vote is shared by at least a majority of the plurality of bins. For instance, in some embodiments, when the bin votes for the plurality of bins are equally distributed between first allele and second allele, then the mutation call is no-call. [00310] In some embodiments, as illustrated in FIG. 4C, when a respective bin vote is nocall, the no-call bin is removed from the plurality of bin votes and the second concordance vote is performed with the remaining bins.
  • each respective bin in the plurality of bins corresponds to a different respective biological replicate of an amplification reaction, and the respective subset of wells for the respective bin is partitioned from the respective biological replicate. Partitioning biological samples into replicates is illustrated in FIGS. 4A- 4C and described in further detail above (see, e.g, the sections entitled “Nucleic Acid Amplification” and “Programmable Nuclease-Based Assays,” above).
  • the plurality of bins comprises between 2 and 10 bins. In some embodiments, the plurality of bins comprises at least 3, at least 4, at least 5, at least 6, at least 8, at least 10, at least 15, or at least 20 bins. In some embodiments, the plurality of bins comprises no more than 30, no more than 20, no more than 10, or no more than 6 bins. In some embodiments, the plurality of bins is from 3 to 9, from 4 to 12, or from 10 to 25 bins. In some embodiments, the plurality of bins falls within another range starting no lower than 3 bins and ending no higher than 30 bins.
  • each respective bin in the plurality of bins comprises between 2 and 10 wells in the respective subset of wells.
  • each respective subset of wells comprises at least 3, at least 4, at least 5, at least 6, at least 8, at least 10, at least 15, or at least 20 wells.
  • each respective subset of wells comprises no more than 30, no more than 20, no more than 10, or no more than 6 wells.
  • each respective subset of wells is from 3 to 9, from 4 to 12, or from 10 to 25 wells. In some embodiments, each respective subset of wells falls within another range starting no lower than 3 wells and ending no higher than 30 wells.
  • the systems and methods disclosed herein are used for comparison with one or more additional mutation calling methods (e.g., sequencing-based methods such as whole genome sequencing). In some embodiments, the systems and methods disclosed herein are performed concurrently with one or more additional mutation calling methods (e.g., sequencing-based methods such as whole genome sequencing).
  • additional mutation calling methods e.g., sequencing-based methods such as whole genome sequencing.
  • the systems and methods for determining a mutation call disclosed herein are performed for each respective mutant allele in a plurality of mutant alleles (see, e.g., the section entitled “Samples,” above).
  • the method further includes performing an instance of the obtaining a mutation call for the target locus, for each respective mutant allele in the plurality of mutant alleles for the target locus.
  • the method further comprises, for each candidate second allele in a plurality of candidate second alleles, repeating the obtaining a signal dataset, determining corresponding signal yields, determining respective candidate call identities, and performing a voting procedure, thereby obtaining a plurality of candidate mutation calls for the target locus.
  • the method further includes performing a mutation call voting procedure across the plurality of candidate mutation calls for the target locus, thereby obtaining a final mutation call for the target locus.
  • the first allele is a wild-type allele and each respective candidate second allele in the plurality of candidate second alleles is a different respective mutant allele for the target locus.
  • the target locus comprises at least a first allele that is a wild-type allele and a plurality of candidate second alleles, where each respective candidate second allele is a different mutant allele for the target locus.
  • the respective candidate mutation call is determined as the final mutation call for the target locus.
  • the first allele is a wild-type allele
  • each respective candidate second allele in the plurality of candidate second alleles is a different respective mutant allele for the target locus
  • the final mutation call for the target locus is a candidate mutation call that corresponds to a respective mutant allele.
  • the final mutation call for the target locus is the only candidate mutation call that corresponds to a respective mutant allele, where every other candidate mutation call in the plurality of candidate mutation calls corresponds to the wild-type allele.
  • a target allele comprises a wild-type allele and n mutant alleles, where n is a positive integer of 1 or greater (e.g., n is between 1 and 100).
  • N comparisons are performed between the wild-type allele and each of the n mutant alleles, in accordance with some embodiments of the present disclosure, thus obtaining a corresponding n candidate mutation calls.
  • one of the n comparisons yields a candidate mutation call that identifies the corresponding mutant allele, while every other comparison yields a candidate mutation call that identifies the wild-type allele.
  • the candidate mutation call that identifies the corresponding mutant allele is selected as the final mutation call, and the identity of the target locus is determined to be the corresponding mutant allele (e.g, SNP).
  • each instance of the obtaining a mutation call for each of the four mutant alleles compared with the wild-type allele would generate the following candidate mutation calls: log 2 (Fy(WT)) > log2(Fy(Ml)) Wild Type log 2 (Fy(WT)) > log2(Fy(M2)) Wild Type log 2 (Fy(WT)) > log2(Fy(M3)) Wild Type log 2 (Fy(WT)) ⁇ log2(Fy(M4)) Mutant (4)
  • mutant 4 is selected as the final mutation call.
  • the obtaining the final mutation call for the target locus is based upon a comparison of signal yields for each pair of candidate second alleles in the sub-plurality of candidate second alleles.
  • the first allele is a wild-type allele
  • each respective candidate second allele in the plurality of candidate second alleles is a different respective mutant allele for the target locus
  • two or more candidate mutation calls in the plurality of candidate mutation calls identifies a corresponding two or more mutant alleles.
  • a comparison is performed between each respective pair of mutant alleles corresponding to candidate mutation calls to determine which mutant allele has the highest signal yield and thus is selected as the final mutation call.
  • the comparison is performed by repeating the obtaining a signal dataset, determining corresponding signal yields, determining respective candidate call identities, and performing a voting procedure, in accordance with the methods disclosed above, where the first allele is a first mutant allele in a respective pair of mutant alleles and the second allele is a second mutant allele in the respective pair of mutant alleles.
  • each instance of the obtaining a mutation call for each of the four mutant alleles compared with the wild-type allele would generate the following candidate mutation calls: log2(Fy(WT)) > log2(Fy(Ml)) Wild Type log2(Fy(WT)) ⁇ log2(Fy(M2)) Mutant(2) log2(Fy(WT)) > log2(Fy(M3)) Wild Type log2(Fy(WT)) ⁇ log2(Fy(M4)) Mutant (4)
  • mutant 2 and mutant 4 Two of the mutant alleles (e.g, mutant 2 and mutant 4) are represented as a candidate mutation call. Comparison of the signal yields of the pair of mutant alleles represented as candidate mutation calls reveals that one of the mutant alleles has a higher signal yield than the other: log2(Fy(M2)) ⁇ log2(Fy(M4)) Mutant(4)
  • mutant allele e.g. , mutant 4 having the highest signal yield in the comparison is selected as the final mutation call.
  • the signal dataset comprises, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus.
  • the signal dataset represents a plurality of time points.
  • the plurality of wells comprises a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus.
  • the plurality of wells also comprises, for each respective candidate second allele in a set of candidate second alleles, a respective additional set of wells representing the respective candidate second allele for the target locus, where each well in this respective additional set of wells includes a corresponding plurality of guide nucleic acids that have the respective candidate second allele for the target locus.
  • Each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value.
  • each respective well in the first set of wells and each respective well in each respective additional set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample.
  • a corresponding signal yield for the respective well using the corresponding plurality of reporting signals for the respective well across the plurality of time points.
  • a voting procedure is then performed across the plurality of candidate call identities, thereby obtaining a mutation call for the target locus that is one of the candidate second alleles in the set of candidate second alleles or the first allele.
  • the first allele is a wild-type allele and each respective candidate second allele in the set of candidate second alleles is a different respective mutant allele for the target locus.
  • the first allele is other than a wild-type allele.
  • the set of candidate second alleles consists of between 2 and 10, between 2 and 20, or between 2 and 100 candidate second alleles.
  • each respective target locus in the plurality of target loci comprises any of the embodiments disclosed herein for a first respective target locus, as well as any substitutions, modifications, additions, deletions, and/or combinations thereof (see, e.g., the section entitled “Samples,” above), as will be apparent to one skilled in the art.
  • the plurality of target loci comprises at least 2, at least 5, at least 10, at least 20, at least 30, at least 50, at least 100, or at least 200 target loci. In some embodiments, the plurality of target loci comprises no more than 500, no more than 200, no more than 100, no more than 50, or no more than 10 target loci. In some embodiments, the plurality of target loci is from 2 to 10, from 5 to 50, or from 10 to 100 target loci. In some embodiments, the plurality of target loci falls within another range starting no lower than 2 loci and ending no higher than 500 loci.
  • each respective target locus in the plurality of target loci maps to a reference sequence for a single organism.
  • a respective organism e.g., a virus
  • the systems and methods for determining a mutation call disclosed herein are performed for each sample in a plurality of samples.
  • FIG. 4A illustrates a first plurality of amplification replicates partitioned from a first sample and a second plurality of amplification replicates partitioned from a second sample.
  • FIG. 4B illustrates the determination of mutation calls, in accordance with an embodiment of the present disclosure, using candidate call identities obtained for the first sample, while FIG. 4C illustrates the determination of mutation calls using candidate call identities obtained for the second sample.
  • the plurality of samples comprises at least 2, at least 5, at least 10, at least 20, at least 30, at least 50, at least 100, at least 200, or at least 500 samples. In some embodiments, the plurality of samples comprises no more than 1000, no more than 500, no more than 200, no more than 100, no more than 50, or no more than 10 samples. In some embodiments, the plurality of samples is from 2 to 20, from 5 to 100, or from 10 to 200 samples. In some embodiments, the plurality of samples falls within another range starting no lower than 2 samples and ending no higher than 1000 samples.
  • Suitable embodiments for performing the present systems and methods for a plurality of loci and/or for a plurality of samples including samples, target loci, target nucleic acids, wells, nucleic acid amplification, programmable nuclease-based assays, reporters, guide nucleic acid sequences, candidate call identities and determining the same, voting procedures, and mutation calls, and any characteristics or elements thereof, include any of the embodiments for samples, target loci, target nucleic acids, wells, nucleic acid amplification, programmable nuclease-based assays, reporters, guide nucleic acid sequences, candidate call identities and determining the same, voting procedures, and mutation calls, disclosed herein for a single target locus and/or for a single sample, as well as any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art.
  • one aspect of the present disclosure provides a method for determining a mutation call for a target locus in a biological sample, at a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors.
  • the method includes amplifying a first plurality of nucleic acids derived from the biological sample, thereby obtaining a plurality of amplified nucleic acids.
  • a procedure is performed including (i) partitioning, from the plurality of amplified nucleic acids, a respective corresponding aliquot of nucleic acid derived from the biological sample, and (ii) contacting the respective corresponding aliquot of nucleic acid with at least a programmable nuclease, a guide nucleic acid, and a plurality of reporters.
  • a signal dataset is obtained including, for each respective well in the plurality of wells in the common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules derived from the biological sample that map to the target locus.
  • the signal dataset represents a plurality of time points.
  • the plurality of wells includes a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus.
  • the plurality of wells further includes a second set of wells representing a second allele for the target locus, where each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus.
  • Each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value obtained from a cleavage of one or more respective reporters, in the plurality of reporters, by the programmable nuclease.
  • a corresponding signal yield for the respective well is determined using the corresponding plurality of reporting signals for the respective well across the plurality of time points.
  • a respective candidate call identity is determined based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding second signal yield for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities.
  • a voting procedure is performed across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus.
  • Another aspect of the present disclosure provides a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors, the one or more programs comprising instructions for performing a method including obtaining a signal dataset comprising, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus.
  • the signal dataset represents a plurality of time points.
  • the plurality of wells comprises a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus.
  • the plurality of wells further comprises a second set of wells representing a second allele for the target locus, where each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus.
  • Each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value.
  • Each respective well in the first set of wells and each respective well in the second set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample.
  • a corresponding signal yield for the respective well is determined using the corresponding plurality of reporting signals for the respective well across the plurality of time points.
  • a respective candidate call identity is determined based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding second signal yield for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities.
  • a voting procedure is performed across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus.
  • Another aspect of the present disclosure provides a non-transitory computer readable storage medium storing one or more programs configured for execution by a computer, the one or more programs comprising instructions for carrying out a method including obtaining a signal dataset comprising, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus.
  • the signal dataset represents a plurality of time points.
  • the plurality of wells comprises a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus.
  • the plurality of wells further comprises a second set of wells representing a second allele for the target locus, where each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus.
  • Each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value.
  • Each respective well in the first set of wells and each respective well in the second set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample.
  • a corresponding signal yield for the respective well is determined using the corresponding plurality of reporting signals for the respective well across the plurality of time points.
  • a respective candidate call identity is determined based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding second signal yield for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities.
  • a voting procedure is performed across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus.
  • compositions, systems and methods for assaying for a SARS-CoV-2 variant are consistent with the methods, and reagents disclosed herein.
  • the sample comprises a target nucleic acid which is a gene fragment of a SARS-CoV-2 variant.
  • the SARS-CoV-2 variant that is specifically targeted is Alpha strain/B.1.1.7 and Q lineages and descendent lineages.
  • the SARS-CoV-2 variant that is specifically targeted is Beta strain/B.1.351 and descendent lineages.
  • the SARS-CoV-2 variant that is specifically targeted is Gamma strain/P.1 and descendent lineages.
  • the SARS- CoV-2 variant that is specifically targeted is Epsilon strain/B.1.427 and B.1.429 lineage and descendent lineages.
  • the SARS-CoV-2 variant that is specifically targeted is Kappa strain/B.1.617.1 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Zeta strain/P.2 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Delta strain/B.1.617.2 and AY lineages and descendent lineages. In some embodiments, the SARS- CoV-2 variant that is specifically targeted is Zeta strain and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is B.1.526 lineage and descendent lineages.
  • the SARS-CoV-2 variant that is specifically targeted is B.1.526.1 lineage and descendent lineages. In some embodiments, the SARS- CoV-2 variant that is specifically targeted is B.1.525 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is B.1.617 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is B.1.617.3 lineage and descendent lineages. In some embodiments, the SARS- CoV-2 variant that is specifically targeted is B.1.617.1 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Lambda strain/C.37 lineage and descendent lineages.
  • the SARS-CoV-2 variant that is specifically targeted is Eta strain/B.1.525 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Iota strain/B.1.526 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Mu strain/B.1.621 and B.1.621.1 lineages and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Omicron strain/B.1.1.529 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is a lineage obtained from a reference database (see, for example, Rambaut et al. , (2020) Nature Microbiology, doi:10.1038/s41564-020-0770-5; and Cov-Lineages, available on the Internet at cov-lineages.org/lineage_hst.html).
  • a reference database see, for example, Rambaut et al
  • the SARS-CoV-2 variant that is specifically targeted has one or more mutation(s) in the gene encoding Spike (S) protein. In some embodiments, the SARS-CoV-2 variant that is specifically targeted has one or more mutation(s) in the gene encoding Nucleocapsid (N) protein. In some embodiments, the SARS-CoV-2 variant that is specifically targeted has one or more mutation(s) in the gene encoding Envelop (E) protein. In some embodiments, the SARS-CoV-2 variant that is specifically targeted has one or more mutation(s) in the gene encoding Membrane glycoprotein (M) protein.
  • S Spike
  • N Nucleocapsid
  • E Envelop
  • M Membrane glycoprotein
  • the SARS-CoV-2 variant that is specifically targeted has one or more mutation(s) in ORF la. In some embodiments, the SARS-CoV-2 variant that is specifically targeted has one or more mutation(s) in ORF lb. In some instances, the target nucleic acid comprises a segment of a nucleic acid from the SARS-CoV-2 variant comprising the one or more mutations.
  • the sample is a biological sample from an individual.
  • a biological sample is a blood, serum, plasma, saliva, urine, mucosal sample, peritoneal sample, cerebrospinal fluid, gastric secretions, nasal secretions, sputum, pharyngeal exudates, urethral or vaginal secretions, an exudate, an effusion, or tissue sample.
  • the sample is a nasal swab.
  • the nasal swab is a nasopharyngeal swab.
  • the sample is a throat swab.
  • the throat swab is an oropharyngeal swab.
  • a tissue sample may be dissociated or liquified prior to application to detection system of the disclosure.
  • a sample from an environment may be from soil, air, or water.
  • the environmental sample is taken as a swab from a surface of interest or taken directly from the surface of interest.
  • the raw sample is applied to the detection system.
  • the sample is diluted with a buffer or a fluid or concentrated prior to application to the detection system or be applied neat to the detection system. [00348] Sometimes, the sample is contained in no more 20 pL.
  • the sample in some cases, is contained in no more than 1, 5, 10, 15, 20, 25, 30, 35 40, 45, 50, 55, 60, 65, 70, 75, 80, 90, 100, 200, 300, 400, 500 pL, or any of value from 1 pL to 500 pL. Sometimes, the sample is contained in more than 500 pL.
  • Some methods described herein can detect a target nucleic acid present in the sample in various concentrations or amounts as a target nucleic acid.
  • the sample has at least 2 target nucleic acids.
  • the sample has at least 3, 5, 10, 20, 30, 40, 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, or 10000 target nucleic acids.
  • the method detects target nucleic acid present at least at one copy per 10 1 non-target nucleic acids, 10 2 non-target nucleic acids, 10 3 non-target nucleic acids, 10 4 non-target nucleic acids, 10 5 non-target nucleic acids, 10 6 non-target nucleic acids, 10 7 non-target nucleic acids, 10 8 non-target nucleic acids, 10 9 non-target nucleic acids, or IO 10 non-target nucleic acids.
  • the target nucleic acid comprises a segment of an S (Spike) gene of a SARS-CoV-2 variant.
  • the segment of the S gene in some embodiments, comprises a mutation (e.g., an SNP) relative to a wild-type SARS-CoV-2 protein.
  • the mutation in some embodiments, is used, at least in part, to identify the SARS-CoV-2 variant (e.g, to distinguish it from other variants or from a wild-type SARS-CoV-2). Therefore detection of the target nucleic acid comprising the mutation in a sample, in some embodiments, is used to identify the sample as comprising the SARS-CoV-2 variant.
  • Methods described herein comprises contacting the sample to a complex comprising a guide nucleic acid comprising a segment that is reverse complementary to a segment of the target nucleic acid and an effector protein that exhibits sequence independent cleavage upon forming a complex comprising the segment of the guide nucleic acid binding to the segment of the target nucleic acid (e.g, comprising a segment of a SARS-CoV-2 variant); and assaying for a signal indicating cleavage of a reporter, wherein the signal indicates a presence of the target nucleic acid in the sample and wherein absence of the signal indicates an absence of the target nucleic acid in the sample.
  • a guide nucleic acid binds to a target nucleic acid (e.g, a nucleic acid from a SARS-CoV-2 variant)
  • the effector proteins trans cleavage activity can be initiated, and reporter nucleic acids can be cleaved, resulting in the detection of fluorescence indicative of, at least in part, the presence of the target nucleic acid.
  • the cleaving of the reporter nucleic acid using the effector protein in some embodiments, cleaves with an efficiency of 50% as measured by a change in a signal that is calorimetric, potentiometric, amperometric, optical (e.g, fluorescent, colorimetric, etc.), or piezo-electric, as non-limiting examples.
  • the cleavage efficiency is at least 40%, 50%, 60%, 70%, 80%, 90%, or 95% as measured by a change in a signal that is calorimetric, potentiometric, amperometric, optical (e.g, fluorescent, colorimetric, etc.), or piezo-electric, as non-limiting examples.
  • the method described herein detect a target nucleic acid (e.g, a nucleic acid from a SARS-CoV-2 variant) with an effector protein and a detector nucleic acid in a sample where the sample is contacted with the reagents for a predetermined length of time sufficient for trans cleavage of the single-stranded detector nucleic acid.
  • Some methods described herein comprise a) contacting the sample to i) a detector nucleic acid; and ii) a composition comprising a effector protein and a non-naturally occurring guide nucleic acid having a nucleotide sequence that is at least 80% identical to SEQ ID NO: 1, 2, 3, 4, 5, or 6 (or any one of SEQ ID NOS: 22-27 or 40-42) that hybridizes to a segment of the target nucleic acid, wherein the effector protein cleaves the detector nucleic acid upon hybridization of the non-naturally occurring guide nucleic acid to the segment of the coronavirus target nucleic acid; and b) assaying for a change in a signal, wherein the change in the signal is produced by cleavage of the detector nucleic acid.
  • the detector nucleic comprises acid a nucleotide sequence that is at least 62.5% identical to SEQ ID NO: 7, and flanked with a fluorescent dye on the 5’ end and a quencher on the 3’ end.
  • the detector nucleic acid comprises a nucleotide sequence that is at least 75% identical to SEQ ID NO: 7, and flanked with a fluorescent dye on the 5’ end and a quencher on the 3’ end.
  • the detector nucleic acid comprises a nucleotide sequence that is at least 87.5% identical to SEQ ID NO: 7, and flanked with a fluorescent dye on the 5’ end and a quencher on the 3’ end.
  • the non-naturally occurring guide nucleic acid having a nucleotide sequence that is at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 1, 2, 3, 4, 5, or 6, or any one of SEQ ID NOS: 22-27 or 40-42.
  • the method further comprises assaying for a control sequence by contacting the control nucleic acid to a second detector nucleic acid and a composition comprising the programmable nuclease and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the control nucleic acid, wherein the programmable nuclease cleaves the detector nucleic acid upon hybridization of the non-naturally occurring guide nucleic acid to the segment of the control nucleic acid.
  • Contacting the sample with the effector protein and guide nucleic acid occurs at a temperature of at least about 25°C, at least about 30°C, at least about 35°C, at least about 40°C, at least about 50°C, or at least about 65°C. In some instances, the temperature is not greater than 80°C. In some instances, the temperature is about 25°C, about 30°C, about 35°C, about 40°C, about 45°C, about 50°C, about 55°C, about 60°C, about 65°C, or about 70°C. In some instances, the temperature is about 25°C to about 45°C, about 35°C to about 55°C, or about 55°C to about 65°C.
  • methods of the disclosure detect a target nucleic acid (e.g. , the target nucleic acid indicative of, at least in part, a SARS-CoV-2 variant) in less than 60 minutes.
  • methods of the disclosure detect a target nucleic acid in less than about 120 minutes, less than about 110 minutes, less than about 100 minutes, less than about 90 minutes, less than about 80 minutes, less than about 70 minutes, less than about 60 minutes, less than about 55 minutes, less than about 50 minutes, less than about 45 minutes, less than about 40 minutes, less than about 35 minutes, less than about 30 minutes, less than about 25 minutes, less than about 20 minutes, less than about 15 minutes, less than about 10 minutes, less than about 5 minutes, less than about 4 minutes, less than about 3 minutes, less than about 2 minutes, or less than about 1 minute.
  • the methods of detecting are performed in less than 10 hours, less than 9 hours, less than 8 hours, less than 7 hours, less than 6 hours, less than 5 hours, less than 4 hours, less than 3 hours, less than 2 hours, less than 1 hour, less than 50 minutes, less than 45 minutes, less than 40 minutes, less than 35 minutes, less than 30 minutes, less than 25 minutes, less than 20 minutes, less than 15 minutes, less than 10 minutes, less than 9 minutes, less than 8 minutes, less than 7 minutes, less than 6 minutes, or less than 5 minutes. In some cases, methods of detecting are performed in about 5 minutes to about 10 hours, about 10 minutes to about 8 hours, about 15 minutes to about 6 hours, about 20 minutes to about 5 hours, about 30 minutes to about 2 hours, or about 45 minutes to about 1 hour.
  • the results from the completed assay can be detected and analyzed in various ways.
  • signal produced by the reaction is visible by eye, and the results can be read by the user.
  • the signal is visualized by an imaging device or other device depending on the type of signal.
  • the imaging device is a digital camera, such a digital camera on a mobile device.
  • the mobile device in some embodiments, has a software program or a mobile application that can capture an image of the support medium, identify the assay being performed, detect the detection region and the detection spot, provide image properties of the detection spot, analyze the image properties of the detection spot, and provide a result.
  • the imaging device can capture fluorescence, ultraviolet (UV), infrared (IR), or visible wavelength signals.
  • the imaging device in some embodiments, has an excitation source to provide the excitation energy and captures the emitted signals.
  • the excitation source is a camera flash and optionally a filter.
  • the imaging device is used together with an imaging box that is placed over the support medium to create a dark room to improve imaging.
  • the imaging box can be a cardboard box that the imaging device can fit into before imaging.
  • the imaging box has optical lenses, mirrors, filters, or other optical elements to aid in generating a more focused excitation signal or to capture a more focused emission signal.
  • the imaging box and the imaging device are small, handheld, and portable to facilitate the transport and use of the assay in remote or low resource settings.
  • the assay described herein can be visualized and analyzed by a mobile application (app) or a software program.
  • a mobile application app
  • a software program Using the graphic user interface (GUI) of the app or program, in some embodiments, an individual takes an image of the support medium, including the detection region, barcode, reference color scale, and fiduciary markers on the housing, using a camera on a mobile device.
  • the program or app reads the barcode or identifiable label for the test type, locates the fiduciary marker to orient the sample, reads the detectable signals, compares against the reference color grid, and determines the presence or absence of the target nucleic acid, which indicates the presence of the gene, virus, or the agent responsible for the disease.
  • the mobile application in some embodiments, presents the results of the test to the individual.
  • the mobile application stores the test results in the mobile application.
  • the mobile application in some embodiments, communicates with a remote device and transfer the data of the test results.
  • the test results in some embodiments, are viewable remotely from the remote device by another individual, including a healthcare professional.
  • a remote user in some embodiments, accesses the results and uses the information to recommend action for treatment, intervention, cleanup of an environment.
  • the method of disclosure is used as an initial screen for the presence of an SNP before whole genome sequencing. In other cases, the method of disclosure is used to replace whole genome sequencing. In other cases, the method of disclosure is used to detect specific mutations associated with neutralizing antibody evasion before physicians making a clinical decision of using certain antibody drugs to treat the infection. In other cases, the method of disclosure is used for contact tracing within communities.
  • Methods described herein comprise contacting a nasal swab or a throat swab from an individual.
  • the nasal swab is a nasopharyngeal swab.
  • the throat swab is an oropharyngeal swab.
  • the procedures to collect a nasal swab is well known in the relevant field. Briefly, a tapered swab, in some embodiments, is used. When the individual is tilted to a certain angel, the swab is inserted into the individual’s nasal cavity parallel to the palate until resistance is met at turbinates.
  • the swab in some embodiments, is preserved into a sterile tube.
  • the procedures to collect a throat swab are also well known in the relevant field. Briefly, in some embodiments, a swab is inserted into the mouth of the individual, and touches both tonsillar pillars and posterior oropharynx. In some embodiments, the swab is preserved into a sterile tube.
  • methods described herein comprise preparing samples, including lysing the sample.
  • lysing the sample comprises contacting the sample to a lysis buffer.
  • a lysis reaction in some embodiments, is performed at a range of temperatures. In some embodiments, a lysis reaction is performed at about room temperature. In some embodiments, a lysis reaction is performed at about 95°C. In some embodiments, a lysis reaction is performed at from 1 °C to 10 °C, from 4 °C to 8 °C, from 10 °C to 20 °C, from 15 °C to 25 °C, from 15 °C to 20 °C, from 18 °C to 25 °C, from 18 °C to 95 °C, from 20 °C to 37 °C, from 25 °C to 40 °C, from 35 °C to 45 °C, from 40 °C to 60 °C, from 50 °C to 70 °C, from 60 °C to 80 °C, from 70 °C to 90 °C, from 80 °C to 95 °C, or from 90 °C to 99 °C.
  • a lysis reaction is performed for about 5 minutes, about 15 minutes, or about 30 minutes. In some embodiments, a lysis reaction is performed for from 2 minutes to 5 minutes, from 3 minutes to 8 minutes, from 5 minutes to 15 minutes, from 10 minutes to 20 minutes, from 15 minutes to 25 minutes, from 20 minutes to 30 minutes, from 25 minutes to 35 minutes, from 30 minutes to 40 minutes, from 35 minutes to 45 minutes, from 40 minutes to 50 minutes, from 45 minutes to 55 minutes, from 50 minutes to 60 minutes, from 55 minutes to 65 minutes, from 60 minutes to 70 minutes, from 65 minutes to 75 minutes, from 70 minutes to 80 minutes, from 75 minutes to 85 minutes, or from 80 minutes to 90 minutes.
  • the lysis buffer disclosed herein comprises a viral lysis buffer.
  • a viral lysis buffer in some embodiments, lyses a coronavirus capsid in a viral sample (e.g, a sample collected from an individual suspected of having a coronavirus infection), releasing a viral genome.
  • the viral lysis buffer in some embodiments, is compatible with amplification (e.g, RT-LAMP amplification) of a target region of the viral genome.
  • the viral lysis buffer in some embodiments, is compatible with detection (e.g, a DETECTR reaction disclosed herein).
  • a sample in some embodiments, is prepared in a one- step sample preparation method comprising suspending the sample in a viral lysis buffer compatible with amplification, detection (e.g, a DETECTR reaction), or both.
  • a viral lysis buffer compatible with amplification e.g, RT-LAMP amplification
  • detection e.g, DETECTR
  • a viral lysis buffer compatible with amplification e.g, RT-LAMP amplification
  • detection e.g, DETECTR
  • a viral lysis buffer compatible with amplification e.g, RT-LAMP amplification
  • detection e.g, DETECTR
  • a viral lysis buffer compatible with amplification e.g, RT-LAMP amplification
  • detection e.g, DETECTR
  • or both comprises a buffer (e.g, Tris-HCl, phosphate, or HEPES), a reducing agent (e.g, N-Acetyl Cysteine (NAC), Dithiothrei
  • a viral lysis buffer in some embodiments, comprises a buffer and a reducing agent, or a viral lysis buffer comprises a buffer and a chelating agent.
  • the viral lysis buffer in some embodiments, is formulated at a low pH.
  • the viral lysis buffer in some embodiments, is formulated at a pH of from about pH 4 to about pH 5.
  • the viral lysis buffer in some embodiments, is formulated at a pH of from about pH 4 to about pH 9.
  • the viral lysis buffer in some embodiments, further comprises a preservative (e.g, ProClin 150).
  • the viral lysis buffer in some embodiments, comprises an activator of the amplification reaction.
  • the buffer in some embodiments, comprises primers, dNTPs, or magnesium (e.g., MgSOi. MgCh or MgOAc), or a combination thereof, to activate the amplification reaction.
  • an activator e.g., primers, dNTPs, or magnesium
  • an activator is added to the buffer following lysis of the coronavirus to initiate the amplification reaction.
  • Methods described herein comprise RNA extraction.
  • the method described herein comprises obtaining a sample from a subject infected with or suspected to be infected with coronavirus.
  • the methods comprise extracting RNA from the sample.
  • RNA can be extracted from the sample with conventional RNA extraction methods.
  • RNA is extracted with the Mag-Bind Viral DNA/RNA 96 kit (Omega Bio-Tek).
  • RNA is extracted with the HighPrepTM Viral DNA/RNA Kit (MagBio).
  • RNA is extracted with the AB MagPure Virus RNA Isolation Kit.
  • RNA is extracted with the GeneJET RNA Purification Kit (Thermo Fisher Scientific).
  • RNA in some embodiments, is automatically extracted on a liquid handling platform. In some instances, RNA is automatically extracted on the KingFisher Flex (Thermo Fisher Scientific). In some instances, RNA is automatically extracted on Microlab® STAR (Hamilton). In some instances, RNA is automatically extracted on Microlab® NIMBUS (Hamilton). In some instances, RNA is automatically extracted on KingFisher® Duo Prime (Thermo Fisher Scientific). In some instances, RNA is automatically extracted on MagMAX® Express-96 (Applied Biosystems). In some instances, RNA is automatically extracted on BioSprint® 96 (QIAGEN).
  • Methods described herein comprise amplifying a target nucleic acid for detection.
  • amplifying comprises changing the temperature of the amplification reaction, also known as thermal amplification (e.g., PCR).
  • amplifying is performed at essentially one temperature, also known as isothermal amplification.
  • amplifying comprises subjecting a target nucleic acid to an amplification reaction selected from transcription mediated amplification (TMA), helicase dependent amplification (HD A), or circular helicase dependent amplification (cHDA), strand displacement amplification (SDA), recombinase polymerase amplification (RPA), loop mediated amplification (LAMP), exponential amplification reaction (EXPAR), rolling circle amplification (RCA), ligase chain reaction (LCR), simple method amplifying RNA targets (SMART), single primer isothermal amplification (SPIA), multiple displacement amplification (MDA), nucleic acid sequence based amplification (NASBA), hinge-initiated primer-dependent amplification of nucleic acids (HIP), nicking enzyme amplification reaction (NEAR), and improved multiple displacement amplification (IMDA).
  • TMA transcription mediated amplification
  • HD A helicase dependent amplification
  • cHDA circular helicase dependent amplification
  • SDA
  • the target nucleic acid is amplified by LAMP. In some instances, the target nucleic acid is amplified by RT-LAMP.
  • LAMP primer sets can be designed to target distinct but nearby sequences around one or more mutations (e.g., SNP) that are indicative of, at least in part, a SARS-CoV-2 variant.
  • four primers comprising forward inner primer (FIP) with an overhang designed to form dumbbell structures, backward inner primer (BIP) with an overhang designed to form dumbbell structures, forward outer primer (F3) for displacing the FIP linked complementary strands from templates, and backward outer primer (B3) for displacing the BIP linked complementary strands from templates, are designed to target one region.
  • six primers comprising FIP, BIP, F3, B3, loop forward (LF), and loop backward (LB), are designed to target one region, wherein LF and LB are added for faster amplification.
  • one or more of the primer are degenerative primers.
  • a set of FIP, BIP, F3, B3, LF, and LB are designed to amplify around an SNP that is indicative of, at least in part, a SARS-CoV 2 variant.
  • a set of FIP, BIP, F3, B3, LF, and LB are designed to amplify a fragment of DNA that encodes around amino acid position 452 of the S protein.
  • a set of FIP, BIP, F3, B3, LF, and LB are designed to amplify a fragment of DNA that encodes around amino acid position 484 of the S protein.
  • a set of FIP, BIP, F3, B3, LF, and LB are designed to amplify a fragment of DNA that encodes around amino acid position 501 of the S protein.
  • a set of FIP, BIP, F3, B3, LF, and LB are designed to amplify a fragment of DNA that encodes around amino acid position 18, amino acid position 95, amino acid position 144, amino acid position 614, amino acid position 653, amino acid position 681, amino acid position 655, amino acid position 796, or amino acid position 1219 of the S protein.
  • a set of FIP, BIP, F3, B3, LF, and LB has primer sequences shown in Tables 3A and 3B herein. [00375] Table 3A: LAMP Primer Sequences
  • compositions that, in some embodiments, further comprise reagents for amplification, the uses thereof, e.g., in systems, and methods for assaying for a SARS-CoV-2 variant.
  • reagents for amplifying a nucleic acid include polymerases, primers, and nucleotides.
  • Nucleic acid amplification of a target nucleic acid improves at least one of sensitivity, specificity, or accuracy of the assay in detecting the target nucleic acid.
  • amplification of the target nucleic acid increases the concentration of the target nucleic acid in the sample relative to the concentration of nucleic acids that do not correspond to the target nucleic acid.
  • amplification of the target nucleic acid comprises modifying the sequence of the target nucleic acid. For example, in some embodiments, amplification is used to insert a PAM sequence into a target nucleic acid that lacks a PAM sequence.
  • the nucleic acid amplification reaction is carried out with reagents comprising amplification primers described herein, a DNA polymerase, dNTPs, and appropriate buffer.
  • the buffer comprises Tris-HCl, (NH4)2SO4, KC1, MgSO4, and Tween® 20.
  • the concentration of MgSCti is 6mM.
  • amplification of the target nucleic acid comprises modifying the sequence of the target nucleic acid. For example, in some embodiments, amplification is used to insert a PAM sequence into a target nucleic acid that lacks a PAM sequence.
  • Amplifying takes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, or 60 minutes. Sometimes, the nucleic acid amplification is performed for 1 to 60, 5 to 55, 10 to 50, 15 to 45, 20 to 40, or 25 to 35 minutes. Amplifying, in some embodiments, is performed at a temperature of around 20- 45°C. Amplifying, in some embodiments, is performed at a temperature of less than about 20°C, less than about 25°C, less than about 30°C, 35°C, less than about 37°C, less than about 40°C, or less than about 45°C.
  • the nucleic acid amplification reaction in some embodiments, is performed at a temperature of at least about 20°C, at least about 25°C, at least about 30°C, at least about 35°C, at least about 37°C, at least about 40°C, or at least about 45°C.
  • the amplification reaction is performed on at least 10, 100, 1,000, 5,000, 10,000, 15,000 or 10,000 copies of the target nucleic acid. In some cases, at least 10,000 copies of the target nucleic acid are input into the amplification reaction.
  • the nucleic acid amplification reaction is carried out with reagents comprising amplification primers described herein, a DNA polymerase, dNTPs, and appropriate buffer.
  • the buffer comprises Tris-HCl, (NH4)2SO4, KC1, MgSO4, and Tween® 20.
  • the concentration of MgSCti is 6mM.
  • Methods described herein comprise reverse transcribing the coronavirus target nucleic acid, the amplification product, or a combination thereof.
  • the reverse transcribing comprises contacting the sample to reagents for reverse transcription.
  • the reagents for reverse transcription comprise a reverse transcriptase, an oligonucleotide primer, and dNTPs.
  • the reverse transcriptase described herein is WarmStart RTx reverse transcriptase.
  • the reverse transcriptase described herein is MultiScribeTM reverse transcriptase.
  • the reverse transcriptase described herein is QuantiTect reverse transcriptase. In one instance, the reverse transcriptase described herein is GoScriptTM reverse transcriptase. In one instance, the reverse transcriptase described herein is UltraScript 2.0 reverse transcriptase.
  • the contacting the sample to reagents for reverse transcription occurs prior to the contacting the sample to the detector nucleic acid to the detector nucleic acid and the composition, prior to the contacting the sample to the reagents for amplification, or prior to both.
  • the contacting the sample to reagents for reverse transcription occurs concurrent to the contacting the sample to the detector nucleic acid to the detector nucleic acid and the composition, concurrent to the contacting the sample to the reagents for amplification, or concurrent to both.
  • Methods described herein comprise contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof.
  • Methods described herein further comprise assaying for a change in a signal produced by cleavage of a reporter nucleic acid wherein the effector protein trans cleaves the reporter nucleic acid upon hybridization of the non-naturally occurring guide nucleic acid to the segment of the target nucleic acid or the amplification product thereof.
  • CRISPR/Cas enzymes are effector proteins used in the methods and systems disclosed herein.
  • CRISPR/Cas enzymes can include any of the known Classes and Types of CRISPR/Cas enzymes.
  • Effector proteins disclosed herein include Class 1 CRISPR/Cas enzymes, such as the Type I, Type IV, or Type III CRISPR/Cas enzymes.
  • Effector proteins disclosed herein also include the Class 2 CRISPR/Cas enzymes, such as the Type II, Type V, and Type VI CRISPR/Cas enzymes.
  • the Type V CRISPR/Cas enzyme is a Casl2 effector protein.
  • Type V CRISPR/Cas enzymes (e.g, Casl2 or Casl4) lack an HNH domain.
  • a Casl2 nuclease of the disclosure cleaves a nucleic acids via a single catalytic RuvC domain.
  • the RuvC domain is within a nuclease, or “NUC” lobe of the protein, and the Casl2 nucleases further comprise a recognition, or “REC” lobe.
  • the programmable Casl2 effector protein described herein is Casl2a.
  • the programmable Casl2 effector protein described herein is Casl2c.
  • the programmable Casl2 effector protein described herein is Casl2d.
  • the programmable Casl2 effector protein described herein is Casl2e.
  • the programmable Casl2 effector protein described herein is Casl2b. In some embodiments, the programmable Casl2 effector protein described herein is Casl2h. In some embodiments, the programmable Casl2 effector protein described herein is Casl2i. In some embodiments, the programmable Casl2 effector protein described herein is a small effector such as Casl2g.
  • compositions comprising an effector protein and a guide nucleic acid and uses thereof, e.g., in systems, and methods for assaying for a SARS-CoV-2 variant.
  • the guide nucleic acid described herein targets (e.g, is capable of binding to and/or is complementary to) a target nucleic acid.
  • the target nucleic acid is near a neighboring PAM sequence that is recognizable by the effector proteins described herein.
  • the engineered guide RNA comprises a CRISPR RNA (crRNA) that is at least partially complementary to a target nucleic acid.
  • the engineered guide RNA comprises a trans-activating crRNA (tracrRNA), at least a portion of which interacts with the effector protein. The tracrRNA, in some embodiments, hybridizes to a portion of the guide RNA that does not hybridize to the target nucleic acid.
  • compositions comprise a crRNA and tracrRNA that function together as two separate, unlinked molecules.
  • the guide nucleic acid targets (e.g, is capable of binding to and/or is complementary to) a target nucleic acid indicative of, at least in part, a SARS-CoV- 2 variant.
  • the target nucleic acid is from a SARS-CoV-2 variant and comprises a mutation relative to a wild-type SARS-CoV-2 or to a strain of SARS-CoV-2 that comprises no mutations at the same locus.
  • the mutation is a single nucleotide polymorphism (SNP).
  • the mutation e.g, an SNP
  • the target nucleic acid is a RNA, DNA, or a synthetic nucleic acid.
  • the guide nucleic acid targets a target nucleic acid which comprises a segment of an S (Spike) gene of SARS-CoV-2.
  • the S protein plays a key role in the receptor recognition and cell membrane fusion process, which is essential for the infection and transmission capabilities of SARS-CoV-2.
  • the guide nucleic acid targets a target nucleic acid comprising certain SNP(s) in the S gene that encodes S protein.
  • the guide nucleic acid targets a portion of the S gene encoding around amino acid position 18, amino acid position 95, amino acid position 144, amino acid position 614, amino acid position 653, amino acid position 681, amino acid position 655, amino acid position 796, or amino acid position 1219 of the Spike protein of SARS-CoV-2 (see NCBI Reference Sequence: YP_009724390.1).
  • the guide nucleic acid is a guide nucleic acid represented in Table 4A or Table 4B.
  • the guide nucleic acid described herein targets around position 452 in the Spike protein and hybridizes to a wild-type segment of sequence.
  • the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 1.
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 1.
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 1.
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 1. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 1. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 22. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 22.
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 22. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 22. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 22. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 22. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 22.
  • the guide nucleic acid described herein targets around position 484 in the Spike protein and hybridizes to a wild-type segment of sequence.
  • the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 2.
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 2.
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 2.
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 2.
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 2. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 2. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 23. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 23.
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 23. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 23. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 23. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 23.
  • the guide nucleic acid described herein targets around position 501 in the Spike protein and hybridizes to a wild-type segment of sequence.
  • the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 3.
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 3.
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 3.
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 3.
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 3. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 3. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 24. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 24.
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 24. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 24. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 24. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 24.
  • the guide nucleic acid described herein targets around position 452 in the Spike protein and hybridizes to a mutant segment of sequence.
  • the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 4.
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 4.
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 4.
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 4.
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 4. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 4. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 25. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 25.
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 25. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 25. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 25. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 25.
  • the guide nucleic acid described herein targets around position 484, and hybridizes to a mutant segment of sequence.
  • the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 5.
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 5.
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 5.
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 5.
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 5. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 5. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 26. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 26.
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 26. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 26. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 26. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 26.
  • the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 40. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 40. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 40. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 40. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 40.
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 40.
  • the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 4E
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 4E
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 4E
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 4E
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 4E
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO:
  • the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 42. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 42. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 42. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 42. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 42. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 42.
  • the guide nucleic acid described herein targets around position 501, and hybridizes to a mutant segment of sequence.
  • the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 6.
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 6.
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 6.
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 6.
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 6. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 6. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 27. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 27.
  • the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 27. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 27. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 6. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 6.
  • the guide nucleic acids described herein comprise UAAUUUCUACUAAGUGUAGAU (SEQ ID NO: 126) on the 5’ end to be recognized by CasDxl, as described in the Example Section.
  • the guide nucleic acids described herein comprise UAAUUUCUACUAAGUGUAGAU (SEQ ID NO: 126) on the 5’ end to be recognized by LbCasl2a, as described in the Example Section
  • the guide nucleic acids described herein comprise UAAUUUCUACUCUUGUAGAU (SEQ ID NO: 127) on the 5’ end to be recognized by AsCasl2a, as described in the Example Section.
  • systems comprising an effector protein, a guide nucleic acid and a reporter nucleic acid and uses thereof, e.g., in systems, and methods for assaying for a SARS-CoV-2 variant.
  • a reporter comprises a single-stranded nucleic acid and a detection moiety, wherein the single-stranded nucleic acid is capable of being cleaved by the activated effector protein, thereby generating a detectable signal.
  • the reporter comprises deoxyribonucleotides.
  • the reporter comprises ribonucleotides.
  • the reporter comprises at least one deoxyribonucleotide and at least one ribonucleotide.
  • the reporter comprises a single-stranded nucleic acid comprising at least one ribonucleotide residue at an internal position that functions as a cleavage site.
  • the reporter comprises a protein capable of generating a signal.
  • a signal in some embodiments, is a calorimetric, potentiometric, amperometric, optical (e.g, fluorescent, colorimetric, etc.), or a piezo-electric signal.
  • the reporter comprises a detection moiety. Suitable detectable labels and/or moieties contemplated for providing a signal include, but are not limited to, an enzyme, a radioisotope, a member of a specific binding pair; a fluorophore; a fluorescent protein; a quantum dot; and the like.
  • Suitable fluorescent proteins include, but are not limited to, green fluorescent protein (GFP) or variants thereof, blue fluorescent variant of GFP (BFP), cyan fluorescent variant of GFP (CFP), yellow fluorescent variant of GFP (YFP), enhanced GFP (EGFP), enhanced CFP (ECFP), enhanced YFP (EYFP), GFPS65T, Emerald, Topaz (TYFP), Venus, Citrine, mCitrine, GFPuv, destabilized EGFP (dEGFP), destabilized ECFP (dECFP), destabilized EYFP (dEYFP), mCFPm, Cerulean, T-Sapphire, CyPet, YPet, mKO, HcRed, t- HcRed, DsRed, DsRed2, DsRed-monomer, J-Red, dimer2, t-dimer2(12), mRFPl, pocilloporin, Renilla GFP, Monster GFP, paGF
  • Suitable enzymes include, but are not limited to, horse radish peroxidase (HRP), alkaline phosphatase (AP), beta-galactosidase (GAL), glucose-6- phosphate dehydrogenase, beta-N-acetylglucosaminidase, P-glucuronidase, invertase, Xanthine Oxidase, firefly luciferase, and glucose oxidase (GO).
  • HRP horse radish peroxidase
  • AP alkaline phosphatase
  • GAL beta-galactosidase
  • glucose-6- phosphate dehydrogenase beta-N-acetylglucosaminidase
  • P-glucuronidase invertase
  • Xanthine Oxidase firefly luciferase
  • GO glucose oxidase
  • the reporter comprises a detection moiety.
  • the reporter comprises a cleavage site, wherein the detection moiety is located at a first site on the reporter, wherein the first site is separated from the remainder of reporter upon cleavage at the cleavage site.
  • the detection moiety is 3’ to the cleavage site.
  • the detection moiety is 5’ to the cleavage site.
  • the detection moiety is at the 3’ terminus of the nucleic acid of a reporter. In some cases, the detection moiety is at the 5’ terminus of the nucleic acid of a reporter.
  • the reporter comprises a detection moiety and a quenching moiety.
  • the reporter comprises a cleavage site, wherein the detection moiety is located at a first site on the reporter and the quenching moiety is located at a second site on the reporter, wherein the first site and the second site are separated by the cleavage site.
  • the quenching moiety is a fluorescence quenching moiety.
  • the quenching moiety is 5’ to the cleavage site and the detection moiety is 3’ to the cleavage site.
  • the detection moiety is 5’ to the cleavage site and the quenching moiety is 3’ to the cleavage site.
  • the quenching moiety is at the 5’ terminus of the nucleic acid of a reporter.
  • the detection moiety is at the 3’ terminus of the nucleic acid of a reporter. In some cases, the detection moiety is at the 5’ terminus of the nucleic acid of a reporter. In some cases, the quenching moiety is at the 3’ terminus of the nucleic acid of a reporter.
  • the reporter is a reporter represented in Table 5.
  • Table 5 Exemplary Single Stranded Detector Nucleic Acid
  • the detector nucleic acid has a sequence of SEQ ID NO: 7. In some cases, the detector nucleic acid comprises a sequence that is at least 87.5% identical to SEQ ID NO: 7. In one embodiment, the detector nucleic acid comprises a sequence that is at least 75% identical to SEQ ID NO: 7. In one embodiment, the detector nucleic acid comprises a sequence that is at least 62.5% identical to SEQ ID NO: 7. In one embodiment, the detector nucleic acid comprises a sequence that is at least 50% identical to SEQ ID NO: 7.
  • Suitable fluorophores provide a detectable fluorescence signal in the same range as 6-Fluorescein (Integrated DNA Technologies), IRDye 700 (Integrated DNA Technologies), TYE 665 (Integrated DNA Technologies), Alex Fluor 594 (Integrated DNA Technologies), or ATTO TM 633 (NHS Ester) (Integrated DNA Technologies).
  • fluorophores are fluorescein amidite, 6-Fluorescein, IRDye 700, TYE 665, Alex Fluor 594, or ATTO TM 633 (NHS Ester).
  • the fluorophore may be an infrared fluorophore.
  • the fluorophore may emit fluorescence in the range of 500 nm and 720 nm. In some cases, the fluorophore emits fluorescence at a wavelength of 700 nm or higher. In other cases, the fluorophore emits fluorescence at about 665 nm.
  • the fluorophore emits fluorescence in the range of 500 nm to 520 nm, 500 nm to 540 nm, 500 nm to 590 nm, 590 nm to 600 nm, 600 nm to 610 nm, 610 nm to 620 nm, 620 nm to 630 nm, 630 nm to 640 nm, 640 nm to 650 nm, 650 nm to 660 nm, 660 nm to 670 nm, 670 nm to 680 nm, 690 nm to 690 nm, 690 nm to 700 nm, 700 nm to 710 nm, 710 nm to 720 nm, or 720 nm to 730 nm. In some cases, the fluorophore emits fluorescence in the range 450 nm to 750 nm, 500 nm to 650 nm,
  • a quenching moiety in some embodiments, is chosen based on its ability to quench the detection moiety.
  • a quenching moiety in some embodiments, is a non- fluorescent fluorescence quencher.
  • a quenching moiety in some embodiments, quenches a detection moiety that emits fluorescence in the range of 500 nm and 720 nm.
  • a quenching moiety in some embodiments, quenches a detection moiety that emits fluorescence in the range of 500 nm and 720 nm. In some cases, the quenching moiety quenches a detection moiety that emits fluorescence at a wavelength of 700 nm or higher.
  • the quenching moiety quenches a detection moiety that emits fluorescence at about 660 nm or about 670 nm. In some cases, the quenching moiety quenches a detection moiety that emits fluorescence in the range of 500 to 520, 500 to 540, 500 to 590, 590 to 600, 600 to 610, 610 to 620, 620 to 630, 630 to 640, 640 to 650, 650 to 660, 660 to 670, 670 to 680, 690 to 690, 690 to 700, 700 to 710, 710 to 720, or 720 to 730 nm.
  • the quenching moiety quenches a detection moiety that emits fluorescence in the range 450 nm to 750 nm, 500 nm to 650 nm, or 550 to 650 nm.
  • a quenching moiety quenches fluorescein amidite, 6-Fluorescein, IRDye 700, TYE 665, Alex Fluor 594, or ATTO TM 633 (NHS Ester).
  • a quenching moiety in some embodiments, is Iowa Black RQ, Iowa Black FQ or IRDye QC-1 Quencher.
  • a quenching moiety quenches fluorescein amidite, 6-Fluorescein (Integrated DNA Technologies), IRDye 700 (Integrated DNA Technologies), TYE 665 (Integrated DNA Technologies), Alex Fluor 594 (Integrated DNA Technologies), or ATTO TM 633 (NHS Ester) (Integrated DNA Technologies).
  • a quenching moiety may be Iowa Black RQ (Integrated DNA Technologies), Iowa Black FQ (Integrated DNA Technologies) or IRDye QC-1 Quencher (LiCor). Any of the quenching moieties described herein, in some embodiments, is from any commercially available source, may be an alternative with a similar function, a generic, or a non-trade name of the quenching moieties listed.
  • the reporter is present in at least 1.5 fold, at least 2 fold, at least 3 fold, at least 4 fold, at least 5 fold, at least 6 fold, at least 7 fold, at least 8 fold, at least 9 fold, at least 10 fold, at least 11 fold, at least 12 fold, at least 13 fold, at least 14 fold, at least 15 fold, at least 16 fold, at least 17 fold, at least 18 fold, at least 19 fold, at least 20 fold, at least 30 fold, at least 40 fold, at least 50 fold, at least 60 fold, at least 70 fold, at least 80 fold, at least 90 fold, at least 100 fold, from 1.5 fold to 100 fold, from 2 fold to 10 fold, from 10 fold to 20 fold, from 20 fold to 30 fold, from 30 fold to 40 fold, from 40 fold to 50 fold, from 50 fold to 60 fold, from 60 fold to 70 fold, from 70 fold to 80 fold, from 80 fold to 90 fold, from 90 fold to 100 fold, from 1.5 fold to 10 fold, from 1.5 fold to 20 fold, from 10 fold to 40 fold, from 20 fold
  • Methods described herein comprise determining a variant call of a sample.
  • determining the variant call comprises determining whether the sample comprises a wild-type SARS-CoV-2 or a SARS-CoV-2 variant.
  • determining the variant call of the SARS-CoV-2 variant comprises detecting one or more S- gene mutation(s) relevant to a wild-type SARS-CoV-2.
  • the SARS-CoV-2 variant is any one of Alpha, Beta, Gamma, Delta, Epsilon, Kappa, Omicron, or Zeta SARS- CoV-2 variants.
  • determining the variant call of Alpha SARS-CoV-2 variant comprises detecting an S-gene mutation in amino acid position 501 from N to Y (N501 Y). In specific cases, determining the variant call of Alpha SARS-CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid position 501 of S protein.
  • determining the variant call of Alpha SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 3 or 6. In certain cases, determining the variant call of Alpha SARS- CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 24 or 27.
  • determining the variant call of Beta SARS-CoV-2 variant comprises detecting S-gene mutations in amino acid positions 484 from E to K and 501 from N to Y (E484K and N501Y).
  • determining the variant call of Beta SARS- CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid positions 484 and 501 of S protein.
  • determining the variant call of Beta SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 2, 3, 5, or 6. In certain cases, determining the variant call of Beta SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 23, 24, 26, or 27.
  • determining the variant call of Gamma SARS-CoV-2 variant comprises detecting S-gene mutations in amino acid positions 484 from E to K and 501 from N to Y (E484K and N501Y). In specific cases, determining the variant call of Gamma SARS- CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid positions 484 and 501 of S protein.
  • determining the variant call of Gamma SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 2, 3, 5, or 6. In certain cases, determining the variant call of Gamma SARS-CoV-2 variant comprises using the non- naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 23, 24, 26, or 27.
  • determining the variant call of Delta SARS-CoV-2 variant comprises detecting an S-gene mutation in amino acid position 452 from L to R (L452R).
  • determining the variant call of Delta SARS-CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid position 452 of S protein.
  • determining the variant call of Delta SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 1 or 4. In certain cases, determining the variant call of Delta SARS- CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 22 or 25.
  • determining the variant call of Epsilon SARS-CoV-2 variant comprises detecting an S-gene mutation in amino acid position 452 from L to R (L452R).
  • determining the variant call of Epsilon SARS-CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid position 452 of S protein.
  • determining the variant call of Epsilon SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 1 or 4. In certain cases, determining the variant call of Epsilon SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 22 or 25.
  • determining the variant call of Kappa SARS-CoV-2 variant comprises detecting an S-gene mutation in amino acid position 452 from L to R (L452R).
  • determining the variant call of Kappa SARS-CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid position 452 of S protein.
  • determining the variant call of Kappa SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 1 or 4. In certain cases, determining the variant call of Kappa SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 22 or 25.
  • determining the variant call of Omicron SARS-CoV-2 variant comprises detecting an S-gene mutation in amino acid position 484 from E to A (E484A). In specific cases, determining the variant call of Omicron SARS-CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid position 484 of S protein.
  • determining the variant call of Omicron SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 2 or 42.
  • determining the variant call of Zeta SARS-CoV-2 variant comprises detecting an S-gene mutation in amino acid position 484 from E to K (E484K).
  • determining the variant call of Zeta SARS-CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid position 484 of S protein.
  • determining the variant call of Zeta SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 2 or 5. In certain cases, determining the variant call of Zeta SARS- CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 23 or 26.
  • the data analysis pipeline to determine a variant call of a sample comprises 1) obtaining the maximum fluorescence value and the minimum fluorescence value from a well containing the sample, an effector protein, a reporter nucleic acid, and a non-naturally occurring guide nucleic acid that targets the target nucleic acid from wild-type SARS-CoV-2; 2) dividing the maximum fluorescence value by the corresponding minimum fluorescence value from 1); 3) obtaining the maximum fluorescence value and the minimum fluorescence value from a well with the sample, an effector protein, a reporter nucleic acid, and a non-naturally occurring guide nucleic acid that targets the target nucleic acid from a SARS-CoV-2 variant.
  • the non-naturally occurring guide nucleic acid are designed for detecting mutations L452R, E484K, E484Q, E484A, and N501Y.
  • the non-naturally occurring guide nucleic acid comprises a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 22, 23, 24, 25, 26, 27, 40, 41, or 42; 4) dividing the maximum fluorescence value by the corresponding minimum fluorescence value from 3).
  • the data analysis pipeline to determine a variant calling of a sample comprises transforming the normalized signals described herein (e.g., fluorescence yield).
  • the transformation is applying a logarithmic value to normalized signal (e.g., fluorescence yield).
  • the transformation is applying a logarithmic value with a base of 2 to normalized signal (e.g., fluorescence yield). Therefore, log2 Fy (WT) and log2 Fy (M) are obtained.
  • an allele discrimination plot can be generated with the transformed signal (see FIGS. 4A-4C or FIG. 11D as non-limiting examples).
  • a binary logarithmic value can be applied to scaled signals, and the ratio of the WT and Mutant transformed values are plotted against the average of the WT and Mutant transformed values, a mean average (MA) plot can be generated (see FIGS. 8A-8H or FIGS. 12A-12M as non-limiting examples).
  • the data analysis pipeline to determine a variant calling of a sample comprises comparing the transformed signals.
  • the sample is determined as WT.
  • the data analysis pipeline can be expressed as log2(Fy(WT)) > log2(Fy(M)) —> Wild Type.
  • the sample is determined as Mutant that is indictive of variant.
  • the data analysis pipeline can be expressed as log2(Fy(WT)) ⁇ log2(Fy(M)) —> Mutant.
  • the signals of each mutant can be compared to the wild-type signal. If there exists a mutant, then among (n) comparisons for n mutants and one wild-type, one of the comparisons may yield a mutant call.
  • the data analysis pipeline can be expressed as log2(Fy(WT)) > log2(Fy(Ml)) Wild Type, log2(Fy(WT)) > log2(Fy(M2)) Wild Type, log2(Fy(WT)) > log2(Fy(M3)) Wild Type, log2(Fy(WT)) ⁇ log2(Fy(M4)) Mutant (4).
  • the tie breaker analysis pipeline can be expressed as log2(Fy(WT)) > log2(Fy(Ml)) Wild Type, log2(Fy(WT)) ⁇ log2(Fy(M2)) Mutant (2), log2(Fy(WT)) > log2(Fy(M3)) Wild Type, log2(Fy(WT)) ⁇ log2(Fy(M4)) Mutant (4), log2(Fy(M2)) ⁇ log2(Fy(M4)) Mutant (4).
  • a no-target control can be processed concurrently with the sample, and the maximum and the minimum fluorescence values can be scaled and transformed as described herein for a sample.
  • logarithmic ratio between the scaled signals and the logarithmic mean of the scaled signals can be defined as Contrast and Size.
  • the Contrast and the Size of the sample are within the lower and upper bounds of the NTC that is concurrently processed, the sample is determined as No Call.
  • the quality of the determination of a variant calling can be visualized by the separation of WT and Mutant signals in the allele discrimination plot described herein. In other instances, the quality of the determination of a variant calling can be visualized by the separation of WT and Mutant signals in the MA plot described herein (see FIGS. 8A-8H or FIGS. 12A-12M as non-limiting examples).
  • compositions, systems, and methods described herein, in some implementations, are multiplexed in a number of ways. These methods of multiplexing are, for example, consistent with methods and reagents disclosed herein for detection of a target nucleic acid within the sample.
  • Multiplexing are either spatial multiplexing wherein multiple different target nucleic acids are detected at the same time, but the reactions are spatially separated. Often, the multiple target nucleic acids are detected using the same programmable nuclease, but different guide nucleic acids. The multiple target nucleic acids sometimes are detected using the different programmable nucleases. Sometimes, multiplexing can be single reaction multiplexing wherein multiple different target acids are detected in a single reaction volume. Often, a single population of programmable nucleases is used in single reaction multiplexing. Sometimes, at least two different programmable nucleases are used in single reaction multiplexing. For example, multiplexing can be enabled by immobilization of multiple categories of detector nucleic acids within a fluidic system, to enable detection of multiple target nucleic acids within a single sample.
  • kits, reagents, methods, and systems for use in detecting a SARS-CoV-2 variant Disclosed herein are kits, reagents, methods, and systems for use in detecting a SARS-CoV-2 variant.
  • kits include a package, carrier, or container that is compartmentalized to receive one or more containers such as vials, tubes, and the like, each of the container(s) comprising one of the separate elements to be used in a method described herein.
  • Suitable containers include, for example, test wells, bottles, vials, and test tubes.
  • the containers are formed from a variety of materials such as glass, plastic, or polymers.
  • the kit or systems described herein contain packaging materials. Examples of packaging materials include, but are not limited to, pouches, blister packs, bottles, tubes, bags, containers, bottles, and any packaging material suitable for intended mode of use.
  • a kit typically includes labels listing contents and/or instructions for use, and package inserts with instructions for use.
  • a set of instructions will also typically be included.
  • a label is on or associated with the container.
  • a label is on a container when letters, numbers or other characters forming the label are attached, molded or etched into the container itself; a label is associated with a container when it is present within a receptacle or carrier that also holds the container, e.g, as a package insert.
  • a label is used to indicate that the contents are to be used for a specific therapeutic application. The label also indicates directions for use of the contents, such as in the methods described herein.
  • the product After packaging the formed product and wrapping or boxing to maintain a sterile barrier, the product may be terminally sterilized by heat sterilization, gas sterilization, gamma irradiation, or by electron beam sterilization. Alternatively, the product may be prepared and packaged by aseptic processing.
  • a signal dataset was generated using a wild-type sample containing the wild-type synthetic gene fragments and a mutant sample containing the mutant synthetic gene fragments, in which each sample was extracted for their respective nucleic acids and partitioned into replicates (e.g., three wild-type amplification replicates and three mutant amplification replicates).
  • Amplification replicates were amplified by reverse transcriptase loop-mediated isothermal amplification (RT-LAMP), and each amplification replicate was further partitioned into replicate wells in a microtiter plate, for each of the three amino acid positions 452, 484, and 501 (see, for example, the schematic in FIGS. 4A-4B).
  • Replicate wells were processed using comparative programmable nuclease-based (DETECTR) reactions, in which amplified synthetic gene fragments were interrogated using a CasDxl programmable nuclease, one or more reporters, and a guide sequence corresponding to either the wild-type sequence or the mutant sequence of the SARS-CoV-2 S-gene.
  • DETECTR comparative programmable nuclease-based
  • LAMP primer sets each containing 6 primers, were designed to target the L452R, E484K and N501Y mutations in the SARS-CoV-2 Spike (S) protein.
  • Multiplexed RT-LAMP was performed using a final reaction volume of 50 pl, which consisted of 8 ul nucleic acid template, 5 pl of L452R primer set, 5 pl of E484K/N501Y primer set, 17 pl of nuclease-free water, 1 pl of SYTO-9 dye, and 14 pl of LAMP mastermix.
  • Each of the primer sets consisted of 1.6 pM each of inner primers FIP and BIP, 0.2 pM each of outer primers F3 and B3, and 0.8 pM each of loop primers LF and LB.
  • the LAMP mastermix contained 6 mM of MgS04, isothermal amplification buffer at lx final concentration, 1.5 mM of dNTP mix, 8 units of Bst 2.0 WarmStart DNA Polymerase, and 0.5 ul of WarmStart RTx Reverse Transcriptase. Plates were incubated at 65°C for 40 minutes in a real-time QuantstudioTM 5 PCR instrument. Fluorescent signals were collected every 30 seconds.
  • Each sample was thus interrogated using a wild-type/mutant comparison in replicates of three, for each of the three amino acid positions, and for each of the three amplification replicates. Reporting signals for each well were obtained based on fluorescence intensities generated by cleavage of the reporter by the CasDxl programmable nuclease upon recognition of the target sequence by the guide sequence.
  • the fluorescence intensities of each well were normalized by scaling the maximum fluorescence values to the minimum fluorescence values for the respective well to generate fluorescence yields. Scaled signals for wells interrogated with the wild-type guide sequence were then compared with scaled signals for wells interrogated with the mutant guide sequence, for each variant amino acid position per sample in the plate. Specifically, after scaling, a logarithmic transformation was applied to the generated fluorescence yields for a respective well, thus obtaining a logarithmic ratio calculated in the form of log 2 (Fy(FAl)) / log 2 (Fy(SAl)),
  • Fy(FAl) is the corresponding fluorescence yield for a first well interrogated with the wild-type guide sequence in the comparative assay
  • Fy(SAl) is the corresponding fluorescence yield for a second well of the same sample type interrogated with the mutant guide sequence in the comparative assay.
  • a well was deemed to be wild-type (e.g, contain an aliquot of the wild-type synthetic gene fragments) if the resulting ratio indicated a greater value for log 2 (Fy(FAl)) than for log 2 (Fy(SAl)), whereas a well was deemed to be mutant (e.g, contain an aliquot of the mutant synthetic gene fragments) if the resulting ratio indicated a lower value for log 2 (Fy(FAl)) than for log 2 (Fy(SAl)).
  • comparative assay replicates were designated as no-call if the logarithmic ratio between the WT and MUT fluorescence yields fell within the maximum fluorescence yield and minimum fluorescence yield generated for a plurality of control wells containing no target samples (NTCs). Comparative assay replicates were further designated as no-call if the logarithmic mean of the WT and MUT fluorescence yields fell within the range of the maximum and minimum logarithmic means for NTCs, where logarithmic means were calculated in the form of
  • FIGS. 5A, 5B, and 5C illustrate the identification of differentiated phenotypes WT and MUT (“M”) using the systems and methods disclosed herein.
  • FIG. 5A illustrates allele discrimination plots that show separation of the WT and MUT (“M”) signals, which are concordant with the genotype of the originating samples (WT, M, and NTC).
  • FIG. 5B illustrates further separation of the populations after application of a binary logarithmic value to each scaled signal in each well. These transformed values provide clear and improved differentiation for making mutation calls when the ratio of the WT and MUT transformed values are plotted against the average of the WT and MUT transformed values on a Mean Average (MA) plot.
  • MA Mean Average
  • gRNAs Guide RNAs
  • CasDxl and LbCasl2a were initially screened for activity on synthetic gene fragments encoding regions of the SARS-CoV-2 S-gene with either wild-type (WT, SEQ ID NO: 20) or mutant (MUT, SEQ ID NO. 21) sequences at amino acid positions 452, 484, and 501.
  • WT wild-type
  • MUT mutant sequences at amino acid positions 452, 484, and 501.
  • CasDxl, LbCasl2a, and AsCasl2a were further evaluated with their cognate gRNAs on synthetic gene fragments with respect to SNP differentiation capabilities.
  • S-gene fragment for SARS-CoV-2, forward strand (SEQ ID NO: 118):
  • SEQ ID: 118 ggugguaauuauaauuaccuguauagauuguuuuaggaagucuaaucucaaaccuuuugagagagauauuucaacugaaa ucuaucaggccgguagcacaccuuguaaugguguugaagguuuuuaauuguuacuuuccuuuacaaucauaugguuucc aacccacuaaugguguugguuaccaacca
  • S-gene fragment for SARS-CoV-2, reverse strand (SEQ ID NO: 119):
  • SEQ ID: 119 ccaccauuuaauauuaauggacauaucuaacaauccuucagauuagaguuuggaaaacucucucuauaaaguugacuuua gauaguccggccaucguguggaacauuaccacaacuuccaaaauuaacaaugaaaggaaauguuaguauaccaaagguug ggugauuaccacaaccaaugguuggu
  • SEQ ID: 120 ccGguauagauuguuuagga
  • SEQ ID: 121 aauggacauaucuaacaaau
  • SEQ ID: 122 acauuaccacaaUuuccaaa
  • SEQ ID: 123 acauuaccacaacuuccaaa
  • SEQ ID: 124 aacccacuUaugguguuggu [00485] Wild-type S-gene fragment subsequence spanning amino acid position 501 (SEQ ID NO: 125):
  • SEQ ID: 125 caacccacuaaugguguugg
  • FIG. 7 illustrates the SARS-CoV-2 CRISPR-CasDxl based DETECTR® workflow.
  • the sample e.g., a RNA extraction
  • DETECTR was used as an input to DETECTR, which was visualized by a fluorescent reader.
  • RNA encoding SARS-CoV-2 S-gene was amplified using an isothermal amplification method such as RT-LAMP.
  • Amplified samples were detected using a Casl2 programmable nuclease complexed with gRNAs directed to SARS-CoV-2 S-gene sequence.
  • the Casl2 programmable nuclease cleaved an ssDNA reporter nucleic acid upon complex formation with the target nucleic acid.
  • the results of the DETECTR assay were then compared with Whole Genome Sequencing (WGS) results.
  • Wild-type and mutant synthetic gene fragments were PCR amplified using NEB 2x Phusion Master Mix following the manufacturer’s protocol. The amplified product was cleaned using AMPure XP beads following manufacturers protocol at a 0.7X concentration. The product was eluted in nuclease-free water and normalized to 10 nM.
  • the LAMP primers used herein are shown in Table 3A and were synthesized by Eurofins Genomics.
  • the guide RNAs used herein are shown in Tables 4A and were synthesized by Dharmacon or Synthego.
  • the reporter used herein is shown in Table 5 and was synthesized by IDT.
  • the synthetic gene fragments SEQ ID Nos. 20 and 21 were synthesized by Twist Biosciences.
  • NP/OP SARS-CoV-2 RT-PCR positive nasopharyngeal and/or oropharyngeal
  • UDM universal transport media
  • VTM viral transport media
  • NP/OP swab samples obtained from the UCSF Clinical Microbiology Laboratory were pretreated with DNA/RNA Shield (Zymo Research, # R1100-250) at a 1 : 1 ratio.
  • the Mag-Bind Viral DNA/RNA 96 kit (Omega Bio-Tek, # M6246-03) on the KingFisher Flex (Thermo Fisher Scientific, # 5400630) was used for viral RNA extraction using an input volume of 200 pl of diluted NP/OP swab sample and an elution volume of 100 pl.
  • the TaqpathTM COVID-19 RT-PCR kit was used to determine the N gene cycle threshold values.
  • VBM SARS-CoV-2 variants being monitored
  • VOC variants of concern
  • VOI variants of interest
  • RNA from heat-inactivated SARS-CoV-2 VBM/V OC/V OI isolates were extracted using the EZ1 Virus Mini Kit v2.0 (Qiagen, # 955134) on the EZ1 Advanced XL (Qiagen, # 9001875) according to the manufacturer’s instructions. For each culture, six replicate LAMP reactions were pooled into a single sample. DETECTR® was performed on a 1: 10 dilution of the 10,000 cp/rxn LAMP amplification products.
  • COVID- 19 Variant DETECTR® assay was performed with RT-LAMP followed by a CRISPR-Cas-based assay.
  • LAMP primer sets each containing 6 primers, were designed to target the L452R, E484K and N501Y mutations in the SARS-CoV-2 Spike (S) protein (see Table 3A).
  • Sets of LAMP primers were designed from a 350 bp target sequence spanning the 3 mutations using Primer Explorer V5 (available on the Internet at primerexplorer.jp/e/).
  • Candidate primers were manually evaluated for inclusion using the OligoCalc online oligonucleotide properties calculator (see W. A. Kibbe, Nucleic Acids Res 35, W43-46 (2007)) while ensuring that there was no overlap with either primers from the other set or guide RNA target regions that included the L452R, E484K, and N501Y mutations.
  • RT-LAMP Multiplexed RT-LAMP was performed using a final reaction volume of 50 pl, which consisted of 8 pl RNA template, 5 pl of L452R primer set (Eurofins Genomics), 5 pl of E484K/N501Y primer set, 17pl of nuclease-free water, Ipl of SYTO-9 dye (ThermoFisher Scientific), and 14pl of LAMP mastermix.
  • Each of the primer sets consisted of 1.6pM each of inner primers FIP and BIP, 0.2 pM each of outer primers F3 and B3, and 0.8 pM each of loop primers LF and LB.
  • the LAMP mastermix contained 6 mM of MgSOi.
  • isothermal amplification buffer at IX final concentration 1.5 mM of dNTP mix (NEB), 8 units of Bst 2.0 WarmStart DNA Polymerase (NEB), and 0.5 ul of WarmStart RTx Reverse Transcriptase (NEB). Plates were incubated at 65°C for 40 minutes in a real-time QuantstudioTM 5 PCR instrument. Fluorescent signals were collected every 60 seconds.
  • CasDxl (Mammoth Biosciences), LbCasl2a (EnGen® Lba Casl2a, NEB) or AsCasl2a (Alt-R® A.s. Casl2a, IDT) protein targeting the WT or MUT SNP at L452R, E484K, or N501Y was incubated with 40nM gRNA in IX buffer (MBuffer3 for CasDxl, NEBuffer r2.1 for LbCasl2a and AsCasl2a) for 30 min at 37°C.
  • IX buffer Muffer3 for CasDxl, NEBuffer r2.1 for LbCasl2a and AsCasl2a
  • gRNAs with an extra sequence of UAAUUUCUACUAAGUGUAGAU (SEQ ID NO: 126) on the 5’end were used with both CasDxl and LbCasl2a, whereas gRNAs with an extra sequence of UAAUUUCUACUCUUGUAGAU (SEQ ID NO: 127) on the 5’end were used with AsCasl2a (see Table 4A).
  • lOOnM ssDNA reporter (SEQ ID NO: 7) was added to the RNA- protein complex. 18pL of this DETECTR® master mix was combined with 2pL target amplicon. The DETECTR® assays were monitored for 30 min at 37°C in a plate reader (Tecan).
  • cDNA Complementary DNA
  • SARS-CoV-2 primers version 3 SARS-CoV-2 primers version 3 according to the Artic protocol (see Plitnick, J. et al., J Clin Microbiol 59, e0064921 (2021) and Quick, J. et al., Nat Protoc 12, 1261-1276 (2017)).
  • SARS-CoV-2 viral genome assembly and variant analyses were performed using an in-house bioinformatics pipeline. Briefly, sequencing reads generated by Illumina sequencers (NextSeq 550 or NovaSeq 6000) were demultiplexed and converted to FASTQ files using bcl2fastq (v2.20.0.422). Raw FASTQ files were first screened for SARS-CoV-2 sequences using BLASTn (BLAST+ package 2.9.0) alignment against the Wuhan-Hu- 1 SARS-CoV-2 viral reference genome (NC_045512).
  • each well had a guide specific to the mutant or the wild-type SNP.
  • the comparison was important to assign a genotypic call to the sample.
  • the DETECTR® reactions across the plate were not comparable to each other.
  • the endpoint fluorescence intensities were normalized in each well to its own minimum intensity, which is defined as fluorescence yield.
  • the fluorescence yield can be compared across wells in a plate under the assumption that each well will have a similar minimum fluorescence starting point. Irrespective of the highest levels of the fluorescence intensities observed across samples, without being limited to any one theory of operation, the yield for a given target will generally remain the same assuming that similar concentrations of samples/target are being compared. This aids in normalizing the signal and comparing replicates across the wells in the same plate.
  • Variant calling was performed using the following general rules: 1. No template controls were assigned NTC; 2. If the contrast of the sample for a SNP was between minimum and maximum contrast for the plate, then the sample was assigned a NoCall; and 3. If the size of the sample was lower than the size of the NTC on the plate, then the sample was assigned a NoCall.
  • gRNAs Guide RNAs
  • WT wild-type
  • MUT mutant sequences at amino acid positions 452, 484, and 501
  • FIGS. 6B-E Further evaluation of CasDxl, LbCasl2a and AsCasl2a with their cognate gRNAs on synthetic gene fragments is illustrated in FIGS. 6B-E.
  • FIGS. 6C-E each figure corresponds to one of the three effector proteins.
  • the first column of each figure shows the fluorescent read out from the CRISPR assay performed on a wild-type fragment, with a wild-type guide versus a mutant guide.
  • the second column of each figure shows the fluorescent read out from the CRISPR performed on a mutant fragment, with a wild-type guide versus a mutant guide.
  • the third column of each figure shows the fluorescent read out from the CRISPR assay performed on a control, with a wild-type guide versus a mutant guide.
  • CasDxl showed clear SNP differentiation between wild-type (WT) and mutant (MUT) sequences on all three S-gene variants (see FIG. 6C).
  • LbCasl2a was capable of differentiating SNPs at positions 452 and 484, and AsCasl2a was able to differentiate the SNP at position 452 (see FIGS. 6D- 6E).
  • a redundant LAMP design was adopted for two reasons: first, this approach was shown to improve detection sensitivity in initial experiments; second, the goal was to increase assay robustness given the continual emergence of escape mutations in the spike RBD throughout the course of the pandemic (see Harvey et al., Nat Rev Microbiol 19, 409-424 (2021)).
  • the tested viral cultures included an ancestral SARS-CoV-2 lineage (WA-1) containing the wildtype spike protein (D614) targeted by the approved mRNA (BNT162b2 from Pfizer or mRNA-1273 from Modema) (see K. S. Corbett et al., Nature 586, 567-571 (2020); F. P.
  • VBMs variants being monitored
  • VOCs variants of concern
  • VOIs variants of interest
  • Alpha B.1.1.7
  • Beta B.1.351
  • Gamma P.l
  • Epsilon B.1.427 and B.1.429
  • Kappa B.1.617.1
  • Zeta P.2 lineages
  • Heat-inactivated viral culture samples representing the seven SARS-CoV-2 lineages were quantified by digital droplet PCR across a 4-log dynamic range and used to evaluate the analytical sensitivity of the pre-amplification step.
  • RT-LAMP amplification was evaluated using six replicates from each viral culture. Consistent amplification was observed for all seven SARS-CoV-2 lineages with 10,000 copies of target input per reaction (200,000 copies/mL) (see FIG. 6G), which is comparable to the target input of more than 200,000 copies/mL viruses (less than 30 Ct value) required for sequencing workflows used in SARS-CoV-2 variant surveillance (see e.g., F.
  • FIG. 6B shows a representative heatmap shows the expected pattern of wild-type (WT) and mutational (MUT) calls for each of the SNPs resulting in L452R, E484K and N501 Y. Specifically, for each combination of effector protein and either wild-type or mutant guide, FIG.
  • FIG. 6H shows the heatmap results for each reaction carried out in this assay, including for each combination of effector proteins, variant, and corresponding mutant or wild type guide.
  • the results of the fluorescent assay are also shown in the fluorescent read-out curves of FIGS. 6I-6K plotting raw fluorescence over time for the wild-type SARS-CoV-2 (Column 1), each variant (Columns 2-7) and controls (Columns 8-12).
  • CasDxl correctly identified the wild-type (WT) and mutational (MUT) targets at positions 452, 484 and 501 in each LAMP-amplified, heat-inactivated viral culture (FIG. 6H and FIGS. 6I-6K).
  • LbCasl2a was capable of differentiating WT from MUT at position 501 on LAMP-amplified viral cultures but showed much higher background for the WT target at position 452 and higher background for both WT and MUT targets at position 484 (FIG. 6H and FIGS. 6I-6K). Additionally, AsCasl2a was able to differentiate WT from MUT targets at position 452 albeit with substantial background but was unable to differentiate WT from MUT targets at positions 484 and 501 (FIG. 6H and FIGS. 6I-6K). From these data, it was concluded that CasDxl provided more consistent and accurate calls for the L452R, E484K and N501Y mutations.
  • one sample (COVID-31) was designated a ‘no call’ at position 452 by viral WGS and thus lacked a comparator, two samples were designated a ‘no call’ due to flat WT and MUT curves (COVID-41 and COVID-73), four samples had similar WT and MUT curve amplitudes, suggesting a mixed population (COVID-03, COVID-56, COVID-61 and COVID-81) (see FIGS. 13A-13D), and four samples had SNP assignments discordant with those from viral WGS (COVID-12, COVID-13, COVID-20 and COVID-63) (see FIGS. 13A-13D).
  • the positive predictive agreement (PPA) between the DETECTR® assay and viral WGS at all three WT and MUT SNP positions was 100% (272 of 272, p ⁇ 2.2e-16 by Fisher’s Exact Test) (see Table 7).
  • the corresponding negative predictive agreement (NPA) was 91.4% as the E484Q mutation for two SNPs was incorrectly classified as WT. Nevertheless, the final viral lineage classification for the 91 samples after discrepancy testing showed 100% agreement with viral WGS (see FIGS.
  • a CRISPR based DETECTR® assay was developed for the detection of SARS-CoV-2 variants.
  • Three CRISPR-Casl2 enzymes were evaluated. Based on a head-to-head comparison of these enzymes, clear differences in performance were observed with CasDxl demonstrating the highest fidelity was able to reliably detect all three of the targeted SNPs.
  • a data analysis pipeline was developed to differentiate between WT and MUT signals with the COVID-19 Variant DETECTR® assay, yielding an overall SNP concordance of 100% (272/272 total SNP calls) and 100% agreement with lineage classification compared to viral WSG.
  • CRISPR-based diagnostic assays have been demonstrated for the detection of SARS-CoV-2 variants, these studies have limitations in coverage of circulating lineages and in the extent of clinical sample evaluation.
  • the miSHERLOCK variant assay uses LbCasl2a (NEB) to detect N501Y, E484K and Y144Del covering eight lineages (WA-1, Alpha, Beta, Gamma, Eta, Iota, Mu and Zeta) and was tested only on contrived samples (RNA spiked into human saliva) (see Puig et al., Sci Adv 7, (2021)).
  • the SHINEv2 assay uses LwaCasl3a to detect 69/70Del, K417N/T, L452R and 156/157Del + R158G covering eight lineages (WA-1, Alpha, Beta, Gamma, Delta, Epsilon, Kappa and Mu) and was tested with only the 69/70Del gRNAs on 20 Alpha-positive NP clinical samples (see Arizti-Sanz et al., medRxiv, (2021)).
  • COVID-19 Variant DETECTR® assay uses CasDxl to detect N501Y, E484K and L452R covering eleven lineages (WA-1, Alpha, Beta, Gamma, Delta, Epsilon, Eta, Iota, Kappa, Mu and Zeta) and 91 clinical samples representing seven out of the eleven lineages were tested with successful detection of all seven.
  • the COVID-19 Variant DETECTR® assay described herein can be served as an initial screen for the presence of a rare or novel variant (e.g., carrying both L452R and E484K or carrying all three SNPs) that could be reflexed to viral WGS.
  • a rare or novel variant e.g., carrying both L452R and E484K or carrying all three SNPs
  • the COVID- 19 Variant DETECTR® assay would thus enable rapid identification of variants circulating in the community to support outbreak investigation and public health containment efforts.
  • identification of specific mutations associated with neutralizing antibody evasion, such as E484K could inform patient care with regards to the use of monoclonal antibodies that remain effective in treating the infection.
  • the COVID- 19 Variant DETECTR® assay can be readily reconfigured by validating new pre-amplification LAMP primers and gRNAs that target emerging mutations with clinical and epidemiological significance.
  • the newly emerging Omicron variant containing at least 30 mutations in the spike protein and 11 mutations in the spike RGD region targeted by the assay, could be detected by increasing degeneracy in the LAMP primers and adding at least one gRNA to be able to distinguish this variant from the others.
  • a validated CRISPR assay that combines SARS-CoV-2 detection with variant identification would be useful as a tool for simultaneous COVID-19 diagnosis in individual patients and surveillance for infection control and public health purposes.
  • Example 3 SARS-CoV-2 SNP Calling: L452R, E484K, and N501 Y
  • the disclosure provides methods for determining L452R, E484K, and N501Y versus wild-type calls within the spike gene of SARS-CoV-2 using an interpretive algorithm for SNP calling in conjunction with a CRISPR-Casl2 based assay.
  • the spike variants L452R, E484K, N501Y are described in Zhang el al., 2021, “Ten emerging SARS-CoV-2 spike variants exhibit variable infectivity, animal tropism, and antibody neutralization,” Commun Biol 4, 1196, which is hereby incorporated by reference.
  • a signal dataset was obtained that comprised, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting fluorescent signals for a respective one or more nucleic acid molecules in a biological sample that map to position 452, 484 or 501 of the spike protein arising from RT-LAMP amplification of the CRISPR-Casl2 based DETECTR® assay.
  • the plurality of wells comprised a first set of nine wells representing the wild type allele for position 452 of the spike protein. Each well in the first set of nine wells included a first plurality of guide nucleic acids that have the wild type allele for position 452 of the SARS-CoV-2 spike protein. [00546] The plurality of wells further comprised a second set of nine wells representing the L452R allele for the SARS-CoV-2 spike protein. Each well in the second set of nine wells included a second plurality of guide nucleic acids that have the L452R allele for the SARS- CoV-2 spike protein.
  • Each respective well in the first set of wells and each respective well in the second set of wells contained a corresponding aliquot of nucleic acid derived from a biological sample for which the determination of a L452R versus a wild-type call within the spike gene of SARS-CoV-2 was sought, as well as a Casl2 protein with trans-cutting activity.
  • Each corresponding plurality of reporting fluorescent signals in the signal dataset comprised, for each respective time point in a set of 30 time points, a respective fluorescent reporting signal in the form of a corresponding discrete attribute value arising from RT- LAMP amplification.
  • Each respective time point in the corresponding plurality of reporting signals represented a different 60 second time interval within a 30 minute monitoring time period of the RT-LAMP amplification.
  • the signal yield was the maximum signal yield observed across the 30 times points divided by the minimum signal yield observed across the 30 times points. This allowed signal yields from individual wells to be compared to each other.
  • the relative intensity metric has the form: log 2 (Fy(WT)) / log 2 (Fy(M)),
  • Fy(WT) was the first corresponding signal yield representing the subject wild type allele
  • Fy(M) was the subject mutant allele of the SARS-CoV-2 spike protein.
  • WT was the wild type allele for position 452 of the SARS-CoV-2 spike protein
  • M was the L452R allele for the SARS-CoV-2 spike protein of the SARS-CoV-2 spike protein.
  • the first set of nine wells was binned into three bins, each bin consisting of three of the wells from the first set of nine wells. For each respective bin in the set of three bins, a respective first concordance vote across the candidate call identities for wells in the respective bin was made, thereby generating three bin votes, one for each bin.
  • the respective first concordance vote generated a respective bin vote based on a common candidate call identity that was shared by all of the wells in the subset of wells for the respective bin.
  • a respective bin vote was no-call when at least one well in the three wells in the respective bin has a candidate call identity of no-call.
  • a second concordance vote was made across the three bin votes for the first set of nine wells, thereby obtaining the mutation call that was one of L452R or wild type for position 452 of the SARS-CoV-2 spike protein for the biological sample.
  • the second concordance vote generated this mutation call based on a common bin vote that was shared by at least two of the three bins.
  • Example 4 SARS-CoV-2 Omicron Variant Detection with High-Fidelity CRISPR-Casl2 Enzyme
  • CasDxl (Mammoth Biosciences) was evaluated for activity on SARS-CoV-2 Omicron variants using methods substantially similar to those described in Examples 2 and 3.
  • gRNAs Guide RNAs
  • CasDxl Guide RNAs (gRNAs) with CasDxl were screened for activity on synthetic gene fragments encoding regions of the SARS-CoV-2 S-gene with either wild-type (WT, SEQ ID NO: 20), E484K mutant (K484, SEQ ID NO: 21), E484A mutant (A484, SEQ ID NO: 43), and E484Q mutant (Q484 SEQ ID NO: 44) sequences at amino acid positions 452, 484, and 501.
  • WT wild-type
  • E484K mutant K484, SEQ ID NO: 21
  • E484A mutant A484, SEQ ID NO: 43
  • E484Q mutant Q484 SEQ ID NO: 44
  • Wild type and mutant synthetic gene fragments were PCR amplified using NEB 2x Phusion Master Mix following the manufacturer’s protocol. The amplified product was cleaned using AMPure XP beads following manufacturers protocol at a 0.7X concentration. The product was eluted in nuclease-free water and normalized to 10 nM.
  • the LAMP primers used herein are shown in Tables 3 A and 3B and were synthesized by Eurofins Genomics.
  • the guide RNAs used herein are shown in Tables 4A and 4B and were synthesized by Dharmacon or Synthego.
  • the reporter used herein is shown in Table 5 and was synthesized by IDT.
  • the synthetic gene fragments SEQ ID Nos. 20, 21, 43, and 44 were synthesized by Twist Biosciences.
  • NP/OP SARS-CoV-2 RT-PCR positive nasopharyngeal and/or oropharyngeal
  • UDM universal transport media
  • VTM viral transport media
  • NP/OP swab samples obtained from the UCSF Clinical Microbiology Laboratory were pretreated with DNA/RNA Shield (Zymo Research, # R1100-250) at a 1 : 1 ratio.
  • the Mag-Bind Viral DNA/RNA 96 kit (Omega Bio-Tek, # M6246-03) on the KingFisher Flex (Thermo Fisher Scientific, # 5400630) was used for viral RNA extraction using an input volume of 200 pl of diluted NP/OP swab sample and an elution volume of 100 pl.
  • the TaqpathTM COVID-19 RT-PCR kit was used to determine the N gene cycle threshold values.
  • VBM SARS-CoV-2 variants being monitored
  • VOC variants of concern
  • VOI variants of interest
  • RNA from heat-inactivated SARS-CoV-2 VBM/V OC/V OI isolates were extracted using the EZ1 Virus Mini Kit v2.0 (Qiagen, # 955134) on the EZ1 Advanced XL (Qiagen, # 9001875) according to the manufacturer’s instructions. For each culture, six replicate LAMP reactions were pooled into a single sample. DETECTR® was performed on a 1: 10 dilution of the 10,000 cp/rxn LAMP amplification products.
  • COVID- 19 Variant DETECTR® assay was performed with RT-LAMP followed by a CRISPR-Cas-based assay.
  • LAMP primer sets each containing 6 primers, were designed to target the L452R, E484K and N501Y mutations in the SARS-CoV-2 Spike (S) protein (see Table 3A).
  • Sets of LAMP primers were designed from a 350 bp target sequence spanning the 3 mutations using Primer Explorer V5 (available on the Internet at primerexplorer.jp/e/).
  • Candidate primers were manually evaluated for inclusion using the OligoCalc online oligonucleotide properties calculator (see W. A.
  • RT-LAMP was performed using a final reaction volume of 50 pl, which consisted of 8 pl RNA template, 5 pl of L452R primer set (Eurofins Genomics), 5 pl of E484K/N501Y primer set, 17pl of nuclease-free water, Ipl of SYTO-9 dye (ThermoFisher Scientific), and 14pl of LAMP mastermix.
  • Each of the primer sets consisted of 1.6pM each of inner primers FIP and BIP, 0.2 pM each of outer primers F3 and B3, and 0.8 pM each of loop primers LF and LB.
  • the LAMP mastermix contained 6 mM of MgSOi. isothermal amplification buffer at IX final concentration, 1.5 mM of dNTP mix (NEB), 8 units of Bst 2.0 WarmStart DNA Polymerase (NEB), and 0.5 ul of WarmStart RTx Reverse Transcriptase (NEB). Plates were incubated at 65°C for 40 minutes in a real-time QuantstudioTM 5 PCR instrument. Fluorescent signals were collected every 60 seconds.
  • Degenerate multiplexed RT-LAMP was performed using a final reaction volume of 65 pl, which consisted of 9.6 pl RNA template, 10 pl of L452R degenerate primer set (Eurofins Genomics), 10 pl of E484(K/Q/A)/N501Y degenerate primer set, 14.1 pl of nuclease-free water, 1.3 pl of SYTO-9 dye (ThermoFisher Scientific), and 20 pl of LAMP mastermix.
  • Two degenerate LAMP primer sets each containing 6 primers modified from the original LAMP primer sets (see Table 3 A) with degenerate nucleotides, were designed to capture the L452R, E484K, E484Q, E484A, and N501Y mutations in the SARS-CoV-2 Spike (S) protein (see Table 3B).
  • S SARS-CoV-2 Spike
  • SARS-CoV-2 viral genome assembly and variant analyses were performed using an in-house bioinformatics pipeline. Briefly, sequencing reads generated by Illumina sequencers (MiSeq or NextSeq 550) were demultiplexed and converted to FASTQ files using bcl2fastq (v2.20.0.422). Raw FASTQ files were first screened for SARS-CoV-2 sequences using BLASTn (BLAST+ package 2.9.0) alignment against the Wuhan-Hu-1 SARS-CoV-2 viral reference genome (NC_045512).
  • each well had a guide specific to the mutant or the wild-type SNP.
  • the comparison was important to assign a genotypic call to the sample.
  • the DETECTR® reactions across the plate were not comparable to each other.
  • the endpoint fluorescence intensities were normalized in each well to its own minimum intensity, which was defined as fluorescence yield.
  • the fluorescence yield was compared across wells in a plate under the assumption that each well would have a similar minimum fluorescence starting point. Irrespective of the highest levels of the fluorescence intensities observed across samples, without being limited to any one theory of operation, the yield for a given target will generally remain the same assuming that similar concentrations of samples/target are being compared. This aided in normalizing the signal and comparing replicates across the wells in the same plate.
  • the wild-type and mutant target guides on NTC generally do not show any change in intensity over time.
  • the fluorescence yield for NTC remains constant across replicates and plates, and moreover is close to 1.
  • Variant calling was performed using the following general rules: 1. No template controls were assigned NTC; 2. If the contrast of the sample for a SNP was between minimum and maximum contrast for the plate, then the sample was assigned a NoCall; and 3. If the size of the sample was lower than the size of the NTC on the plate, then the sample was assigned a NoCall.
  • the 2X2 cross tables classify all three SNPs (at positions 452, 484, 501) across all the samples between sequencing and DETECTR® technologies (see Table 7 and FIGS. 14A- 14D).
  • the data transformation and statistical analysis was done in R (see R Core Team. R: A Language and Environmental for Statistical Computing, (Vienna, Austria, 2018), available on the Internet at R-proj ect.org, date accessed: 11/26/21).
  • the TaqPath PCR assay with S-gene Target Failure has functioned as a screen that can be reflexed to sequencing to identify the Omicron variant
  • the SGTF assay alone cannot differentiate between Omicron BA.1 and Alpha and cannot identify emerging variants that lack the SGTF, such as the Omicron BA.2 sublineage.
  • the COVID-19 variant DETECTR® assay described in Example 2 was reconfigured for the identification of Omicron by targeting the E484A mutation, which alone differentiates Omicron from all other current VBM/VOI/VOC.
  • gRNAs Guide RNAs
  • CasDxl Guide RNAs with CasDxl were screened for activity on synthetic gene fragments encoding regions of the SARS-CoV-2 S-gene with either wild-type (WT, SEQ ID NO: 20), E484K mutant (K484, SEQ ID NO: 21), E484A mutant (A484, SEQ ID NO: 43), and E484Q mutant (Q484 SEQ ID NO: 44) sequences at amino acid positions 452, 484, and 501 (see FIG. 15A). From this initial activity screen, the top-performing gRNAs were identified for each S-gene variant encoding either E484K, E484Q, or E484A (see FIG. 15B). CasDxl was further evaluated with its cognate gRNAs on synthetic gene fragments with respect to SNP differentiation capabilities is illustrated in FIG. 15B.
  • the original LAMP primer set used in Example 2 (Table 3 A) was tested against a degenerate LAMP primer set (Table 3B) which incorporated degenerate nucleotides within the LAMP primers, as it was suspected that the original LAMP primer set may not have sufficient sensitivity to amplify the targeted spike RGD region.
  • the degenerative LAMP primer design incorporated two sets of six primers each, with both sets generating overlapping spike RBD amplicons that spanned the L452R, E484K/Q/A, and N501Y mutations.
  • FIG. 16 shows a comparison of RT-LAMP primers for processing Omicron clinical samples.
  • Omicron clinical samples (1802), WT (1804) control RNA, and Alpha (1806) control RNAs were amplified using the original RT-LAMP primer set and degenerate RT-LAMP primer set.
  • the degenerate primers (see Table 3B) amplified both the controls (WT and Alpha) and Omicron samples, whereas the original primer set (see Table 3A) only amplified the control samples (WT and Alpha).
  • MA mean average
  • sample COVID-112 was called an Omicron by DETECTR® based on its A484 SNP call, which was confirmed by WGS.
  • sample COVID-122 could not be amplified by RT-LAMP, also suggesting a loss in sample integrity. Following this discrepancy analysis, an overall SNP concordance of 94.7%, and 100% NPA was demonstrated for this set of 48 samples (Table 9).
  • Table 9 Overall SNP concordance values for the 484 SNP from the evaluation of the DETECTR® assay against the SARS-CoV-2 WGS comparator assay
  • a CRISPR based DETECTR® assay was developed for the detection of SARS-CoV-2 variants.
  • a data analysis pipeline was developed to differentiate between WT and MUT signals with the COVID-19 Variant DETECTR® assay, yielding an overall SNP concordance of 97.9% (373/381 total SNP calls when combined with Example 2) and 99.3% (138/139 when combined with Example 2) agreement with lineage classification compared to viral WSG.
  • these findings show robust agreement between the COVID- 19 Variant DETECTR® assay and viral WGS for identification of SNP mutations and variant categorization.
  • the COVID- 19 Variant DETECTR® assay provided a faster and simpler alternative to sequencing-based methods for COVID- 19 variant diagnostics and surveillance.
  • CRISPR-based diagnostic assays have been demonstrated for the detection of SARS-CoV-2 variants, these studies have limitations regarding coverage of circulating lineages, the extent of clinical sample evaluation, and/or assay complexity.
  • the miSHERLOCK variant assay uses LbCasl2a (NEB) with RPA preamplification to detect N501Y, E484K and Y144Del covering eight lineages (WA-1, Alpha, Beta, Gamma, Eta, Iota, Mu and Zeta) and was tested only on contrived samples (RNA spiked into human saliva) (see Puig et al., Sci Adv 7, (2021)).
  • the SHINEv2 assay uses LwaCasl3a with RPA pre-amplification to detect 69/70Del, K417N/T, L452R and 156/157Del + R158G covering eight lineages (WA-1, Alpha, Beta, Gamma, Delta, Epsilon, Kappa and Mu) and was tested with only the 69/70Del gRNAs on 20 Alpha-positive NP clinical samples (see Arizti-Sanz et al., medRxiv, (2021)).
  • the mCARMEN variant identification panel uses 26 crRNA pairs with either the LwaCasl3a or LbaCasl3a and PCR pre-amplification to identify all current circulating lineages including Omicron; however, the VIP requires the Fluidigm Biomark HD system or similar, more complex instrumentation for streamlined execution (see Welch et al., medRxiv, (2021)).
  • COVID- 19 Variant DETECTR® assay disclosed herein in Examples 2 and 4 uses CasDxl to detect N501Y, E484K/Q/A, and L452R covering all current circulating lineages and tested on 139 clinical samples representing eight lineages (WA-1, Alpha, Gamma, Delta, Epsilon, Iota, Mu, Omicron). Furthermore, specific Omicron identification was accomplished using only the E484 WT and A484 MUT guides.
  • Example 4 shows that the choice of Cas enzyme may be important to maximize the accuracy of CRISPR-based diagnostic assays and may need to be tailored to the site that is being targeted.
  • the COVID- 19 variant DETECTR® assay was capable of distinguishing the Alpha, Delta, Kappa and Omicron variants, but did not resolve the remaining VBMs or VOIs.
  • tracking of key mutations many of which are suspected to arise from convergent evolution, rather than tracking of variants, may be more important for surveillance as the pandemic continues.
  • the data analysis pipeline developed for CRISPR-based SNP calling described herein can readily incorporate additional targets and may offer a blueprint for automated interpretation of fluorescent signal patterns.
  • the COVID-19 Variant DETECTR® assay described herein can be served as an initial screen for circulating variants and/or a distinct pattern from a rare or novel variant by interrogating the key 452, 484, and 501 positions that could be reflexed to viral WGS.
  • the COVID- 19 Variant DETECTR® assay would thus enable rapid identification of variants circulating in the community to support outbreak investigation and public health containment efforts.
  • the COVID-19 Variant DETECTR® assay can be readily reconfigured by validating new preamplification LAMP primers and gRNAs that target emerging mutations with clinical and epidemiological significance.
  • newly emerging variants may contain additional mutations that may not be captured with the resolution of the L452R, E484K/Q/A, and N501Y mutations described in Examples 2-4 or which may result in variable performance at a particular SNP compared to others, and could be detected by incorporation of additional gRNA(s) to provide specific and redundance coverage and/or improve identification of specific lineages.
  • incorporation of an additional N-gene target to the assay may improve the limit of detection of the DETECTR® assay, which currently relies entirely on a multiplexed and degenerate S-gene LAMP primer design, optionally to facilitate simultaneous detection and SNP/variant identification.
  • a validated CRISPR assay that combines SARS-CoV-2 detection with variant identification may offer a faster and simpler alternative to sequencing and would be useful as a tool for simultaneous COVID-19 diagnosis in individual patients and surveillance for infection control and public health purposes.
  • first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first subject could be termed a second subject, and, similarly, a second subject could be termed a first subject, without departing from the scope of the present disclosure. The first subject and the second subject are both subjects, but they are not the same subject.
  • the term “if’ may be construed to mean “when” or “upon” or “in response to determining” or “in response to detecting,” depending on the context.
  • the phrase “if it is determined” or “if [a stated condition or event] is detected” may be construed to mean “upon determining” or “in response to determining” or “upon detecting (the stated condition or event (” or “in response to detecting (the stated condition or event),” depending on the context.
  • the term “about” in reference to a number or range of numbers is understood to mean the stated number and numbers +/- 10% thereof, or 10% below the lower listed limit and 10% above the higher listed limit for the values listed for a range.

Abstract

Systems and methods for determining a mutation call for a target locus in a sample are provided. A signal dataset is obtained comprising, for each well in a plurality of wells, reporting signals for nucleic acids in the sample that map to the locus, over a plurality of time points. Each well contains an aliquot of nucleic acid derived from the sample. A first set of wells includes guide nucleic acids having a first allele for the locus. A second set of wells includes guide nucleic acids having a second allele for the locus. For each well, signal yields are determined using the reporting signals, and candidate call identities are determined for the first set of wells based on a comparison of signal yields in the first and second set of wells. A voting procedure is performed across the candidate call identities, thus obtaining a mutation call for the locus.

Description

SYSTEMS AND METHODS FOR IDENTIFYING GENETIC PHENOTYPES USING PROGRAMMABLE NUCLEASES
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to U.S. Provisional Patent Application Serial No. 63/390,572, filed July 19, 2022, U.S. Provisional Patent Application Serial No. 63/305,934, filed February 2, 2022; U.S. Provisional Patent Application Serial No. 63/305,872, filed February 2, 2022; U.S. Provisional Patent Application Serial No. 63/283,930, filed November 29, 2021; and U.S. Provisional Patent Application Serial No. 63/283,936, filed November 29, 2021, each of which is hereby incorporated by reference in its entirety.
TECHNICAL FIELD
[0002] This specification describes technologies generally relating to detection and discrimination of genetic mutations in a biological sample.
INCORPORATION BY REFERENCE OF SEQUENCE LISTING
[0003] Accompanying this filing is a Sequence Listing entitled, “SeqListing.xml” created on November 29, 2022 and having 190,717 bytes of data, machine formatted on IBM-PC, MS-Windows operating system. The sequence listing is hereby incorporated by reference in its entirety for all purposes.
BACKGROUND
[0004] There have been recurrent large-scale epidemics from novel emerging viruses, including SARS-CoV-2. Since the initial outbreak of SARS-CoV-2, multiple fast-spreading variants have emerged and have been observed to complicate the epidemic and to lead to various government restrictions after the initial relief, which resulted in more severe socioeconomic impacts than the original strain.
[0005] Tracking the evolution and spread of SARS-CoV-2 variants in the community is critical to inform public policy regarding testing and vaccination, as well as guide contact tracing and containment effects during local outbreaks. Variant identification can also be clinically significant, as some mutations substantially reduce the effectiveness of available monoclonal antibody therapies for the disease. [0006] As a result, there is an urgent need to strengthen genomic surveillance to monitor the evolution of SARS-CoV-2 variants and to uncover possible different patterns of infection by variants of concern e.g., changes in transmissibility, clinical patterns, or lethality). Virus whole-genome sequencing (WGS) and single nucleotide polymorphism (SNP) genotyping are commonly used to identify variants, but can be limited by long turnaround times and/or the requirement for bulky and expensive laboratory instrumentation.
SUMMARY
[0007] Given the above background, improved systems and methods are needed for determining mutation calls for target loci in biological samples, particularly in the application of programmable nuclease-based (e.g., CRISPR-Cas-based) assays and/or amplificationbased assays for recognition of mutated sequences, such as in SARS-CoV-2 variants. For example, in at least some instances, mutations in the spike protein, which binds to the human ACE2 receptor, can render the SARS-CoV-2 virus more infectious and/or more resistant to antibody neutralization, resulting in increased transmissibility and/or escape from immunity, whether vaccine-mediated or naturally acquired immunity. Variant identification can also be clinically significant, as some mutations substantially reduce the effectiveness of available monoclonal antibody therapies for the COVID-19 disease.
[0008] Advantageously, technical solutions (e.g., compositions, computing systems, methods, and non-transitory computer readable storage mediums) for addressing the above identified problems are provided in the present disclosure. The compositions, systems, and methods described herein satisfy the abovementioned needs, among others, and provide related advantages.
[0009] For instance, tracking the evolution and spread of pathogenic variants (e.g. , SARS-CoV-2 variants) in the community can inform public policy regarding testing and vaccination, as well as guide contact tracing and containment effects during local outbreaks. As described above, virus whole-genome sequencing (WGS) and single nucleotide polymorphism (SNP) genotyping are commonly used to identify variants, but can be limited in at least some instances by long turnaround times and/or the requirement for bulky and expensive laboratory instrumentation. In some implementations, it is therefore advantageous to provide diagnostic assays based on clustered interspaced short palindromic repeats (CRISPR) for rapid detection of variants (e.g, SARS-CoV-2 variants) in clinical samples. Some advantages of the assays, compositions, computing systems, methods, and non- transitory computer readable storage mediums described herein for use in laboratory and point of care settings include low cost, minimal instrumentation, and a sample-to-answer turnaround time of under 2 hours.
[0010] Systems and Methods for Determining Mutation Calls
[0011] Accordingly, one aspect of the present disclosure provides a method for determining a mutation call for a target locus in a biological sample, at a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors. The method includes obtaining a signal dataset comprising, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus. The signal dataset represents a plurality of time points. The plurality of wells comprises a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus. The plurality of wells further comprises a second set of wells representing a second allele for the target locus, where each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus. Each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value, and each respective well in the first set of wells and each respective well in the second set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample.
[0012] For each respective well in the plurality of wells, a corresponding signal yield for the respective well is determined using the corresponding plurality of reporting signals for the respective well across the plurality of time points. For each respective well in the first set of wells, a respective candidate call identity is determined based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding second signal yield for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities. A voting procedure is performed across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus.
[0013] In some embodiments, the signal dataset is obtained by a procedure comprising amplifying a first plurality of nucleic acids derived from the biological sample, thereby generating a plurality of amplified nucleic acids. For each respective well in the first set of wells and each respective well in the second set of wells, the method includes partitioning, from the plurality of amplified nucleic acids, the respective corresponding aliquot of nucleic acid; contacting the respective corresponding aliquot of nucleic acid with at least a programmable nuclease, a guide nucleic acid, and one or more reporters; and cleaving the one or more reporters using the programmable nuclease if the first allele or the second allele, respectively, is present, thereby generating a reporting signal.
[0014] In some embodiments, the method further comprises, for each candidate second allele in a plurality of candidate second alleles, repeating the process of obtaining, determining, determining, and performing, thereby obtaining a plurality of candidate mutation calls for the target locus, and performing a mutation call voting procedure across the plurality of candidate mutation calls for the target locus, thereby obtaining a final mutation call for the target locus. In some such embodiments, the first allele is a wild-type allele and each respective candidate second allele in the plurality of candidate second alleles is a different respective mutant allele for the target locus.
[0015] In some embodiments, when a single respective candidate mutation call in the plurality of candidate mutation calls corresponds to a respective candidate second allele in the plurality of candidate second alleles, the respective candidate mutation call is determined as the final mutation call for the target locus. In some embodiments, when a sub-plurality of candidate mutation calls in the plurality of candidate mutations calls corresponds to a respective sub-plurality of candidate second alleles in the plurality of candidate second alleles, the obtaining the final mutation call for the target locus is based upon a comparison of signal yields for each pair of candidate second alleles in the sub-plurality of candidate second alleles.
[0016] Another aspect of the present disclosure provides a method for determining a mutation call for a target locus in a biological sample. The method comprises, using a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors, obtaining a signal dataset comprising, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus. In such embodiments, the signal dataset represents a plurality of time points. Moreover, the plurality of wells comprises a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus. Further still, the plurality of wells also comprises, for each respective candidate second allele in a set of candidate second alleles, a respective additional set of wells representing the respective candidate second allele for the target locus, where each well in this respective additional set of wells includes a corresponding plurality of guide nucleic acids that have the respective candidate second allele for the target locus. Each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value. Also, each respective well in the first set of wells and each respective well in each respective additional set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample. The method continues by determining, for each respective well in the plurality of wells, a corresponding signal yield for the respective well using the corresponding plurality of reporting signals for the respective well across the plurality of time points. The method further determines, for each respective well in the first set of wells, for each respective candidate second allele in the set of candidate second alleles, a respective candidate call identity based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding respective second signal yield for a corresponding well in the respective additional set of wells for the respective candidate second allele, thereby obtaining a plurality of candidate call identities. A voting procedure is then performed across the plurality of candidate call identities, thereby obtaining a mutation call for the target locus that is one of the candidate second alleles in the set of candidate second alleles or the first allele. In some such embodiments, the first allele is a wild-type allele and each respective candidate second allele in the set of candidate second alleles is a different respective mutant allele for the target locus. In some such embodiments, the first allele is other than a wild-type allele. In some such embodiments, the set of candidate second alleles consists of between 2 and 10 candidate second alleles. In some such embodiments the set of candidate second alleles consists of a single second allele.
[0017] Another aspect of the present disclosure provides a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors, the one or more programs comprising instructions for performing any of the methods and/or embodiments disclosed above.
[0018] Another aspect of the present disclosure provides a non-transitory computer readable storage medium storing one or more programs configured for execution by a computer, the one or more programs comprising instructions for performing any of the methods and/or embodiments disclosed above.
[0019] Compositions, Systems, and Methods for Assaying SARS-CoV-2 Variants
[0020] Another aspect of the present disclosure provides a CRISPR-based COVID-19 variant DETECTR® assay (henceforth abbreviated as DETECTR assay) for the detection of SARS-CoV-2 mutations. The assay combines RT-LAMP pre-amplification followed by fluorescent detection using a CRISPR-Casl2 enzyme. A comparative evaluation of multiple candidate Cast 2 enzymes and robust assay performance was found with a CRISPR-Casl2 enzyme called CasDxl, which had high specificity in identifying key SNP mutations of functional relevance in the spike protein at amino acid positions 452, 484, and 501.
[0021] In various aspects, the disclosure provides method of assaying for a SARS-CoV-2 variant in an individual, the method comprising: collecting a nasal swab or a throat swab from the individual; optionally extracting a target nucleic acid comprising a segment of a SARS-CoV-2 Spike (“S”) gene from the nasal swab or the throat swab; amplifying the target nucleic acid to produce an amplification product, wherein amplifying the target nucleic acid comprises contacting the target nucleic acid to a plurality of LAMP amplification primers; contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising (i) an effector protein and (ii) a guide nucleic acid comprising a nucleotide sequence that is at least 90% identical to any one of SEQ ID NOS: 1 to 6, 22 to 27, or 40-42; assaying for a change in a signal produced by cleavage of a reporter nucleic acid, wherein the effector protein trans cleaves the reporter nucleic acid upon hybridization of the guide nucleic acid to the target nucleic acid, a segment thereof, or an amplification product thereof; and determining an SNP call of the sample. In some embodiments, the nasal swab is a nasopharyngeal swab. In some embodiments, the throat swab is an oropharyngeal swab. In some embodiments, amplifying the target nucleic acid comprises contacting the target nucleic acid to at least one reagent for amplification. In some embodiments, the amplifying comprises loop mediated amplification (LAMP). In some embodiments, the at least one reagent for amplification comprises a polymerase, dNTPs, or a combination thereof. In some embodiments, the plurality of LAMP amplification primers comprise an FIP primer, a BIP primer, an F3 primer, a B3 primer, an LF primer, and an LB primer. In some embodiments, the amplification primers comprise SEQ ID NOS: 8-13, SEQ ID NOS: 14-19, SEQ ID NOS: 28-33, and/or SEQ ID NOS: 34-39. In some embodiments, the amplifying comprises reverse transcription-LAMP. In some embodiments, the method further comprises lysing the sample. In some embodiments, lysing the sample comprises contacting the sample to a lysis buffer. In some embodiments, determining an SNP call comprises determining whether the segment of the S-gene comprises one or more S-gene mutation(s) relative to a reference wild-type SARS-CoV-2 S-gene. In some embodiments, the reference wild-type SARS-CoV-2 gene is from a SARS-CoV-2-Wuhan-Hul sequence or the USA-WA1/2020 sequence. In some embodiments, the one or more S-gene mutations is a single nucleotide polymorphism (SNP). In some embodiments, the one or more S-gene mutations is associated with one or more Spike protein mutations. In some embodiments, the one or more Spike protein mutations is (i) a mutation in amino acid position 484 from E to K (E484K), (ii) a mutation in amino acid position 501 from N to Y (N501Y), (iii) a mutation in amino acid position 452 from L to R (L452R), (iv) a mutation in amino acid position 484 from E to Q (E484Q), (v) a mutation in amino acid position 484 from E to A (E484A), or a combination thereof. In some embodiments, determining an SNP call of the sample comprises comparing the signal produced by the cleavage of the reporter nucleic acid by a composition comprising a guide nucleic acid comprising a nucleotide sequence that is least 90% identical to any one of SEQ ID NOS: 1-3 or 22-24 to a signal produced by contacting a composition comprising a guide nucleic acid comprising a nucleotide sequence that is at least 90% identical to any one of SEQ ID NOS: 4-6, 25-27, or 40-42 to a nucleic acid identical to the target nucleic acid. In some embodiments, the method comprises determining a variant call of the sample. In some embodiments, determining the variant call comprises determining whether the sample comprises a wild-type SARS-CoV-2 or a SARS-CoV-2 variant. In some embodiments, the SARS-CoV-2 variant is any one of an Alpha, Beta, Gamma, Delta, Epsilon, Kappa, Omicron, or Zeta SARS-CoV-2 variant. In some embodiments, determining the variant call comprises determining one or more SNP calls of the sample. In some embodiments, determining whether the sample comprises an Alpha SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an N501Y mutation. In some embodiments, determining whether the sample comprises a Beta SARS-CoV-2 variant comprises detecting S-gene mutations associated with E484K and N501Y mutations. In some embodiments, determining whether the sample comprises a Mu SARS-CoV-2 variant comprises detecting S-gene mutations associated with E484K and N501Y mutations. In some embodiments, determining whether the sample comprises a Gamma SARS-CoV-2 variant comprises detecting S-gene mutations associated with E484K and N501Y mutations. In some embodiments, determining whether the sample comprises a Delta SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an L452R mutation. In some embodiments, determining whether the sample comprises an Epsilon SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an L452R mutation. In some embodiments, determining whether the sample comprises a Kappa SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an L452R mutation. In some embodiments, determining whether the sample comprises an Omicron SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an E484A mutation. In some embodiments, determining whether the sample comprises a Zeta SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an E484K mutation. In some embodiments, In some embodiments, the effector protein is a Type V Cas effector protein. In some embodiments, the Type V effector protein is a Cas 12 effector protein, a Cas 13 effector protein, a Cas 14 effector protein, or a CasPhi effector protein. In some embodiments, the guide nucleic acid comprises a nucleotide sequence that is at least 95% identical to one of SEQ ID NO: 1 to 6, 22 to 27, or 40 to 42. In some embodiments, the guide nucleic acid comprises a nucleotide sequence that is at least 98% identical to one of SEQ ID NO: 1 to 6, 22 to 27, or 40 to 42. In some embodiments, the guide nucleic acid comprises a nucleotide sequence of one of SEQ ID NO: 1 to 6, 22 to 27, or 40 to 42. In some embodiments, the reporter nucleic acid comprises a nucleotide sequence that is at least 75% identical to SEQ ID NO: 7, wherein the nucleotide sequence is flanked with a fluorescent dye on the 5’ end and a quencher on the 3’ end. In some embodiments, the reporter nucleic acid has a nucleotide sequence of SEQ ID NO: 7 flanked with a fluorescent dye on the 5’ end and a quencher on the 3’ end.
[0022] Various embodiments of systems, methods and devices within the scope of the appended claims each have several aspects, no single one of which is solely responsible for the desirable attributes described herein. Without limiting the scope of the appended claims, some prominent features are described herein. After considering this discussion, and particularly after reading the section entitled “Detailed Description” one will understand how the features of various embodiments are used.
INCORPORATION BY REFERENCE
[0023] All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference in their entireties to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference. BRIEF DESCRIPTION OF THE DRAWINGS
[0024] The implementations disclosed herein are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings. Like reference numerals refer to corresponding parts throughout the several views of the drawings.
[0025] FIG. 1 is an example block diagram illustrating a computing device and related data structures used by the computing device in accordance with some implementations of the present disclosure.
[0026] FIGS. 2A, 2B, 2C, 2D, 2E, 2F, and 2G illustrate an example method in accordance with an embodiment of the present disclosure, in which optional steps are indicated by broken lines.
[0027] FIG. 3 illustrates an example schematic for a method of determining a mutation call for a target locus in a sample using programmable nucleases, in accordance with some embodiments of the present disclosure.
[0028] FIGS. 4A, 4B, and 4C collectively illustrate a method for determining a mutation call for a target locus in a sample, in accordance with some embodiments of the present disclosure.
[0029] FIGS. 5A, 5B, and 5C collectively illustrate identification of differentiated phenotypes called as wild-type, mutated, and no-call in a signal dataset, in accordance with an embodiment of the present disclosure. Figure 5 A provides an allele discrimination plot visualizing the signal yields obtained from a COVID Variant programmable nuclease-based assay on gene fragments. The allele discrimination plots represent scatter plots of scaled WT and MUT fluorescence values plotted against each other. Figure 5B shows Mean Average (MA) plots of the COVID Variant programmable nuclease-based assay data on gene fragments to decrease ambiguity of the signal yields. A ratio of WT and MUT transformed values are plotted against the average of the WT and MUT transformed values on the MA plot. Figure 5C presents MA plots of the COVID Variant programmable nuclease-based assay on the gene fragments (n=30 WT; n=30 MUT for each SNP) and no template controls (n=33 WT; n=33 MUT for each SNP) used to validate the present systems and methods.
[0030] FIGS. 6A, 6B, 6C, 6D, 6E, 6F, 6G, 6H, 61, 6J, and 6K illustrate design and workflow for a COVID-19 Variant DETECTR® assay, in accordance with some embodiments of the present disclosure. FIG. 6A shows a schematic of CRISPR-Cas gRNA design for SARS-CoV-2 S gene mutations. FIG. 6B shows heat map comparison of three different Casl2 enzymes tested using 10 nM PCR-amplified synthetic gene fragments (t=30 minutes). FIG. 6C shows raw fluorescence curves visualizing the SNP differentiation capability of CasDxl using synthetic gene fragments of an S-gene fragment of interest from (i) a wild-type form of the virus containing no major mutations and (ii) SARS-CoV-2 variants having mutations at amino acid positions 452, 484, and 501 relative to the wild-type virus. FIG. 6D shows raw fluorescence curves visualizing the SNP differentiation capability of LbCasl2a (which may be referred to as “LbaCasl2a”, “LbCasl2”, and “LbCasl2a” interchangeably) using synthetic gene fragments of an S-gene fragment of interest from (i) a wild-type form of the virus containing no major mutations and (ii) SARS-CoV-2 variants having mutations at amino acid positions 452, 484, and 501 relative to the wild-type virus. FIG. 6E shows raw fluorescence curves visualizing the SNP differentiation capability of AsCasl2a using synthetic gene fragments of an S-gene fragment of interest from (i) a wildtype form of the virus containing no major mutations and (ii) SARS-CoV-2 variants having mutations at amino acid positions 452, 484, and 501 relative to the wild-type virus. FIG. 6F shows a schematic of multiplexed RT-LAMP primer design showing the SARS-CoV-2 S gene mutations and gRNA positions. FIG. 6G shows a dot plot showing the number (n = 6) of positive replicates across a 4-log dynamic range of the RT-LAMP products. FIG. 6H shows heat-map comparison of end-point fluorescence (t = 30 minutes) of three different Cast 2 enzymes tested against heat-inactivated viral cultures. Replicates (n = 6) generated using RT-LAMP were pooled, and CRISPR-Casl2 reactions were then run in triplicate (n = 3). FIG. 6I-6K shows raw fluorescence curves from three Casl2 enzymes (CasDxl, LbCasl2a, and AsCasl2a) on eight heat-inactivated viral culture samples from various SARS-CoV-2 lineages, a no target control (RT-LAMP) and CasDxl detection controls (WT, MUT and NTC). Experiments were performed using CasDxl replicates (n = 3), with standard deviation of ±1.0SD.
[0031] FIG. 7 shows an example workflow comparison between the COVID Variant DETECTR® assay and SARS-CoV-2 Whole-Genome-Sequencing, in accordance with some embodiments of the present disclosure.
[0032] FIGS. 8A, 8B, 8C, 8D, 8E, 8F, 8G, and 8H illustrate example plots obtained using a DETECTR® data analysis pipeline for SARS-CoV-2 SNP mutation calling, as illustrated in FIGS. 4A-4C, in accordance with an embodiment of the present disclosure. FIG. 8A shows a key providing orientation for FIGS. 8B-8H relative to each other. FIG. 8B- 8H shows raw fluorescence RT-LAMP curves for each clinical sample. The raw fluorescence RT-LAMP amplification curves for each of the clinical samples was analyzed in triplicate (n = 3 replicates). Each line is representative of the median ±1.0SD of the three RT-LAMP replicates for each sample. RT-LAMP replicates that passed quality control (QC) are represented with solid black lines and dark gray shading and failed LAMP replicates are shown with solid gray lines and light gray shading. Only valid RT-LAMP replicates were used in subsequent data analysis.
[0033] FIGS. 9A, 9B, 9C, 9D, 9E, 9F, 9G, 9H, 91, 9J, 9K, 9L, and 9M illustrate example plots obtained using the DETECTR® data analysis pipeline for SARS-CoV-2 SNP mutation calling illustrated in FIGS. 4A-4C and 8A-8H, in accordance with an embodiment of the present disclosure. FIG. 9A shows a key providing orientation for FIGS. 9B-9M relative to each other. FIGS. 9B-9M shows raw fluorescence CasDxl curves for each clinical sample amplified by RT-LAMP. Each clinical sample was amplified with RT-LAMP in triplicate, and the resulting amplicons were detected by CasDxl in triplicate. The raw fluorescence curves show WT detection in thick black lines and MUT detection in thin gray lines. Each line is representative of the median ±1.0SD of the CasDxl replicates (n = 3) for each WT and MUT guide for each of the RT-LAMP replicates (n = 3), represented by different patterned lines.
[0034] FIGS. 10A, 10B, 10C, 10D, 10E, and 10F illustrate example plots obtained using the DETECTR® data analysis pipeline for SARS-CoV-2 SNP mutation calling illustrated in FIGS. 4A-4C, 8A-8H, and 9A-9M, in accordance with an embodiment of the present disclosure. FIG. 10A shows allele discrimination plots visualizing the scaled signals from the COVID Variant DETECTR® assay on gene fragments. The allele discrimination plots represent scatter plots of scaled WT and MUT fluorescence values plotted against each other. FIG. 10B shows contrast-size or mean average (MA) plots of the COVID Variant DETECTR® assay data on gene fragments to decrease ambiguity of the scaled signals, where a ratio of the WT and MUT transformed values are plotted against the average of the WT and MUT transformed values on the MA plot. FIGS. 10C-10D collectively show three representative clinical samples of different SARS-CoV-2 lineages used in the workflow of the COVID- 19 Variant DETECTR® assay. In FIG. 10C, raw fluorescence curves of each sample run in RT-LAMP amplification and subsequent triplicate DETECTR® reactions targeting both WT and MUT SNPs for L452(R), E484(K), and N501(Y) are shown. For raw fluorescence curves, WT detection is shown in black lines without asterisks and MUT detection is shown in black lines marked by asterisks. Assays were performed using RT- LAMP replicates (n = 3), CasDxl replicates (n = 3 per LAMP replicate), where shading around kinetic curves indicates standard deviation of ±1.0SD. FIG. 10D shows box plot visualization of the end point fluorescence in DETECTR® across each SNP for the three representative clinical samples shown in FIG. 10C. Calls were made for each SNP by evaluating the median values of the DETECTR® calls and overall calls through the LAMP replicates, and given a designation of WT, MUT, or NoCall. Final calls were made on the lineage determined by each SNP. Non-shaded elements represent WT and shaded elements represent MUT. FIG. 10E shows MA plots of the COVID Variant DETECTR® assay on the gene fragments (n = 30 WT; n = 30 MUT for each SNP) and no template controls (n = 33 WT; n = 33 MUT for each SNP) used to develop the data analysis pipeline. FIG. 10F shows a schematic of a data analysis pipeline workflow describing the RT-LAMP QC and subsequent CasDxl signal scaling. The scaled signals were compared across SNPs and the calls were made for each RT-LAMP replicate. The combined replicate calls defined the mutation call, which informed the final lineage classification.
[0035] FIGS. 11A, 11B, 11C, and 11D illustrate an example evaluation of the COVID- 19 Variant DETECTR® assay compared to SARS-CoV-2 Whole-Genome Sequencing, in accordance with an embodiment of the present disclosure. FIG. 11A shows determination of RT-LAMP threshold with a ROC curve. Thresholds for LAMP quality analysis were derived to determine which samples had amplified sufficiently. The exact score value for this qualitative QC metric was determined using a ROC analysis. FIG. 11B illustrates a heat map showing CasDxl signal (n = 3) per every LAMP replicate (n = 3) for each SNP on every clinical sample reflecting samples prior to discordance testing. FIG. 11C shows allele discrimination plots visualizing the scaled signals from the COVID Variant DETECTR® assay on clinical samples. FIG. 11D shows MA plots, transformed onto M (log ratio) and A (mean average) scales, showing CasDxl SNP detection replicates (n = 807) for each SARS- CoV-2 mutation across 91 clinical samples. WT, MUT, NoCall, and NTC detection is denoted by labeled circles.
[0036] FIGS. 12A, 12B, 12C, 12D, 12E, 12F, 12G, 12H, 121, 12J, 12K, 12L, and 12M shows visualization of SNP calls by the data analysis pipeline illustrated in FIGS. 11A-11D. Box plots of all the clinical samples illustrate the spread of the scaled signals for each of the samples across the replicates in the experiment. SNP calls were made on each sample agreement with the median values depicted on the box plot of the sample, which also provided an analytical confirmation of the DETECTR® results. WT detection is represented by shaded boxes and MUT detection is represented by non-shaded boxes.
[0037] FIGS. 13A, 13B, 13C, 13D, 13E, 13F, 13G, 13H, 131, and 13J illustrate evaluation of the COVID- 19 Variant DETECTR® assay compared to SARS-CoV-2 Whole- Genome Sequencing as illustrated in FIGS. 11A-11D and 12A-12M, in accordance with an embodiment of the present disclosure. FIGS. 13A-13D collectively show raw fluorescence CasDxl curves for the clinical samples with discordant DETECTR® and WGS results. WT detection is represented by black lines and MUT detection is represented by gray lines. Each line is representative of the median ±1.0SD of the CasDxl replicates (n = 3) for each guide for each of the LAMP replicates (n = 3), and each RT-LAMP replicate is represented by different patterned lines. FIG. 13E shows visualization of the COVID Variant DETECTR® and SARS-CoV-2 WGS assays showing the alignment of final calls. Across all of the clinical samples in this cohort, 80 out of the 91 clinical sample COVID Variant DETECTR® assay calls were consistent with the SARS-CoV-2 WGS calls. FIG. 13F shows alignment of final mutation calls comparing the COVID-19 Variant DETECTR® and SARS-CoV-2 WGS assay results across 91 clinical samples after discordant samples (indicated by asterisk) were resolved. FIG. 13G shows final lineage classification on each clinical sample by the COVID- 19 Variant DETECTR® compared to the SARS-CoV-2 lineage determined by the viral WGS. FIGS. 13H-13I collectively show a summary of re-testing of discordant samples from the original clinical sample where nearly all SNP discrepancies are resolved. FIG. 13J shows final lineage classification on each clinical sample by the COVID- 19 Variant DETECTR® compared to the SARS-CoV-2 lineage determined by the viral WGS following resolution of discordant samples.
[0038] FIGS 14A, 14B, 14C, and 14D collectively illustrate an overall results summary of final SNP calls by COVID-19 Variant DETECTR® assay and viral WGS, in accordance with an embodiment of the present disclosure. A summary table of the final SNP calls from the COVID-19 Variant DETECTR® assay and the SARS-CoV-2 whole genome sequencing assay after discordant testing is shown. The table includes the lineage classification from DETECTR® calls as well as the PANGO lineage and WHO labels assigned to the WGS calls. Ct values from running an FDA EUA authorized SARS-CoV-2 RT-PCR assay, the Taqpath™ COVID-19 RT-PCR kit, are shown. Discordant samples were reflexed back for reprocessing (indicated by asterisks); COVID-63 was classified as a Delta variant by WGS despite its Q484 SNP call, (indicated by marker f ). [0039] FIGS. 15A, 15B, and 15C illustrate specific detection of 484 mutations, which can enable rapid Omicron identification, in accordance with an embodiment of the present disclosure. FIG. 15A shows a schematic of Omicron mutations within the S-gene LAMP amplicon and relative position of 484-specific gRNAs and degenerate LAMP primers. FIG. 15B shows a heat map comparison of end-point fluorescence (t = 30 min) showing specific detection of 484-specific mutations (E, K, Q, A) on PCR-amplified synthetic gene fragments (n = 3). FIG. 15C shows alignment of final 484 mutation calls comparing the DETECTR® and SARS-CoV-2 WGS assay results across 36 clinical samples.
[0040] FIG. 16 shows a comparison of RT-LAMP primers for processing Omicron clinical samples, in accordance with an embodiment of the present disclosure. Omicron clinical samples (1802), WT (1804) control RNA, and Alpha (1806) control RNAs were amplified using the original RT-LAMP primer set and degenerate RT-LAMP primer set and plotted relative to negative control (1808). The degenerate primers (see Table 3B) amplified both the controls (WT and Alpha) and Omicron samples, whereas the original primer set (see Table 3A) only amplified the control samples (WT and Alpha).
[0041] FIGS. 17A, 17B, 17C, and 17D show a summary of results of final SNP calls by the COVID Variant DETECTR® and WGS assays, in accordance with an embodiment of the present disclosure, with reference to Example 4 below. FIG. 17A shows a schematic of the workflow for determining the final variant calls. If the result was an A484, K484, or Q484, the final variant call was made. If the result was an E484, the sample was reflexed to DETECTR® analysis at the 452 and 501 positions to make the variant determination. FIG. 17B shows an interpretation table including the specific 484 SNPs. FIG. 17C shows a summary table of the final SNP calls from the DETECTR® assay and the SARS-CoV-2 whole genome sequencing assay of Example 4 including the lineage classification from DETECTR® calls as well as the PANGO lineage and WHO labels assigned to the WGS calls. Ct values were obtained from the FDA EUA authorized S Taqpath™ COVID-19 RT- PCR kit. FIG. 17D shows a summary table of the five discordant samples from the DETECTR® assay and WGS after retesting. (NoCall = lack of data generated, N/A = assay not run).
[0042] FIG. 18 shows a summary table of the clinical specimens described in Example 4 with no signal in either the RT-LAMP or the DETECTR® reactions, in accordance with an embodiment of the present disclosure. These samples were called “COVID Not Detected (ND)” by both the COVID- 19 variant DETECTR® assay and the SARS-CoV-2 whole genome sequencing assays. Ct values were obtained from the FDA EUA authorized S Taqpath™ COVID-19 RT-PCR kit. (NoCall = lack of data generated, N/A = assay not run).
DETAILED DESCRIPTION
[0043] Introduction
[0044] Tracking the evolution and spread of pathogenic variants (e.g., SARS-CoV-2 variants) in the community can inform public policy regarding testing and vaccination, as well as guide contact tracing and containment efforts during local outbreaks. In particular, the ability to detect and discriminate genetic phenotypes such as wild-type and/or mutant sequences responsible for disease and infection facilitates a wide range of clinical and epidemiological applications including detection of infectious variants, monitoring the evolution and spread of pathogenic or intervention-resistant strains, and/or discovery of novel target sequences for intervention.
[0045] For example, the emergence of new SARS-CoV-2 variants threatens to substantially prolong the COVID- 19 pandemic. SARS-CoV-2 variants, especially Variants of Concern (VOCs), have caused resurgent COVID- 19 outbreaks worldwide, even in populations with a high proportion of vaccinated individuals. Mutations in the spike protein, which binds to the human ACE2 receptor, can render the virus more infectious and thereby more transmissible, and/or more capable of evading vaccine or naturally acquired immunity, leading to re-infection. Variant identification can also be clinically significant, as some mutations substantially reduce the effectiveness of monoclonal antibody or convalescent plasma therapies for the disease.
[0046] Traditional methods such as whole genome sequencing (WGS) and single nucleotide polymorphism (SNP) genotyping are commonly used to identify variants but can be limited by long turnaround times and/or the requirement for bulky and expensive laboratory instrumentation. Moreover, conventional diagnostic methods such as quantitative reverse transcriptase PCR (qRT-PCR) can result in false negatives when the concentration of target nucleic acids are low. For instance, viral loads for SARS-CoV-2 can vary during the day and at different stages of infection, such that conventional methods may fail to provide sufficient sensitivity to detect target sequences. Given these deficiencies, there is a need for improved systems and methods for the identification of phenotypically relevant mutations at target loci of interest. [0047] Accordingly, diagnostic assays based on Clustered Interspaced Short Palindromic Repeats (CRISPR) have been developed for rapid detection of target nucleic acids in biological samples, such as detection of SARS-CoV-2 in clinical samples. A few have obtained Emergency Use Authorization (EUA) by the US Food and Drug Administration (FDA). These assays can be performed at low cost with a sample-to-answer turnaround time of under 2 hours and thus can be suitable for use in point of care settings.
[0048] The systems and methods disclosed herein further overcome the limitations of conventional methods for identification of genetic phenotypes in biological samples by providing improved mutation calling for target loci using programmable nuclease-based and/or amplification-based assays. Advantageously, these systems and methods provide improved sensitivity and accuracy in identifying target nucleic acids due to the use of specific primers during amplification and/or highly precise programmable nucleases for detection of target sequences. Such identification can be used to determine, e.g, pathogenesis, transmission risk, treatment response, diagnosis, and/or epidemiology of pathogens and infectious diseases, as well as tumor DNA and/or cancer-related viruses. For instance, the systems and methods disclosed herein can be used to screen for rare or novel pathogenic variants (e.g, SARS-CoV-2 variants), alone or in conjunction with conventional sequencing methods. Because the sequencing capacity for most clinical and public health laboratories is limited, the systems and methods of the present disclosure would enable rapid detection of newly emerging variants or currently circulating variants that have acquired additional mutations. The information thus obtained could directly inform outbreak investigation and public health containment efforts, such as quarantine decisions. Thus, the systems and methods disclosed herein have potential as a rapid diagnostic test alternative to sequencingbased methods. Furthermore, identification of specific mutations associated with intervention resistance, such as neutralizing antibody evasion in SARS-CoV-2 variants, could guide the care of individual patients, for instance, with regard to the use of monoclonal antibodies that remain effective in treating the infection.
[0049] In particular, one aspect of the present disclosure provides systems and methods for detecting and determining mutation calls for a target locus in a biological sample, such as a single nucleotide polymorphism in a target sequence. A signal dataset is obtained comprising, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus. The signal dataset represents a plurality of time points, the plurality of wells comprises a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus, and the plurality of wells further comprises a second set of wells representing a second allele for the target locus, where each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus. Each corresponding plurality of reporting signals includes, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value, and each respective well in the first set of wells and each respective well in the second set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample.
[0050] For each respective well in the plurality of wells, a corresponding signal yield is determined using the corresponding plurality of reporting signals for the respective well across the plurality of time points. Furthermore, for each respective well in the first set of wells, a respective candidate call identity is determined based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding second signal yield for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities. A voting procedure across the plurality of candidate call identities for the first set of wells is performed, thereby obtaining a mutation call for the target locus.
[0051] In some embodiments, the present disclosure provides various systems and methods for assaying for and detecting single nucleotide polymorphisms (SNPs) in a target sequence. In particular, the various systems and methods disclosed herein use a programmable nuclease complexed with non-naturally occurring (e.g., engineered) guide nucleic acid sequence to detect the presence or absence of, and/or quantify the amount of, a target sequence having one or more SNPs. In some instances, the various systems and methods disclosed herein are used to distinguish or discriminate between sequences having different mutations or variations therein (and/or between SNP -containing sequences and wild-type sequences). In some embodiments, the SNP-containing target nucleic acids are amplified and/or comprise amplicons. Amplifying SNP-containing target nucleic acids can use, for example, reverse transcription (RT) and/or isothermal amplification (e.g., loop- mediated amplification (LAMP)) or thermal amplification (e.g., polymerase chain reaction (PCR)) of RNA or DNAfy.g. , RNA or DNA extracted from a patient sample). Accordingly, disclosed herein is a programmable nuclease-based assay for detection and discrimination between SNPs in a target sequence.
[0052] In exemplary embodiments, the present disclosure is described in relation to systems and methods for coronavirus variant detection or discrimination in a sample. However, one of ordinary skill in the art will appreciate that this is not intended to be limiting and the compositions and methods disclosed herein may be used to detect and/or determine mutations (e.g, SNPs) in other target sequences of interest. For example, the compositions and methods described herein may be used to detect mutations (e.g, SNPs) associated with other viruses or strains or variants thereof, bacteria or strains or variants thereof, diseases, disorders, and/or genetic traits or susceptibilities of interest.
[0053] In an exemplary aspect, the present disclosure provides various systems and methods of use thereof for assaying for and detecting mutations or variations of interest or concern in a segment of a Spike (S) gene of a coronavirus in a sample. In some embodiments, the coronavirus is SARS-CoV-2 (also known as 2019 novel coronavirus, Wuhan coronavirus, or 2019-nCoV), 229E (alpha coronavirus), NL63 (alpha coronavirus), OC43 (beta coronavirus), HKU1 (beta coronavirus), MERS-CoV, or SARS-CoV. In some embodiments, the coronavirus is a variant of SARS-CoV-2, particularly the alpha variant (also referred to herein as the United Kingdom (UK) variant) known as 20B/501Y.V1, VOC 202012/01, or B.1.1.7 lineage; beta variant (also referred to herein as the South African variant) known as: 20C/501Y.V2 or B.1.351 lineage; the delta variant known as B.1.617.2; the gamma variant known as P.l. The terms “2019-nCoV,” “SARS-CoV-2,” and “COVID-19” may be used interchangeably herein. Exemplary variants of concern or interest are shown in Table 1. The genetic characteristics of these variants are discussed in Leung et. al, Early transmissibility assessment of the N501 Y mutant strains of SARS-Co V-2 in the United Kingdom, October to November 2020, Euro Surveill. 2021;26(l) and in Tegally et al., Emergence and rapid spread of a new severe acute respiratory syndrome-related coronavirus 2 (SARS-CoV-2) lineage with multiple spike mutations in South Africa, MedRxiv 2020.12.21, each of which is hereby incorporated herein by reference in its entirety. In some embodiments, as illustrated in Example 1, below, with reference to FIGS. 5A-5C, the systems and methods disclosed herein specifically target and assay for a segment of a S-gene of the SARS-CoV-2 coronavirus. In some embodiments, the systems and methods disclosed herein are used to detect the presence or absence of the segment of the S-gene of the SARS-CoV-2 in a patient sample. In some implementations, a patient is diagnosed with COVID- 19 if the presence of SARS-CoV-2 is detected in a sample from the patient. In some embodiments, the assays disclosed herein provide single nucleotide target specificity, enabling specific detection of a single coronavirus.
[0054] In some embodiments, candidate call identities are obtained using DETECTR assays, such as those described herein and in Example 1, below. See, for example, Broughton et al., “CRISPR-Cas 12-based detection of SARS-CoV-2,” Nature Biotechnology 38, 870- 874 (2020), which is hereby incorporated herein by reference in its entirety. In some implementations, DETECTR assays disclosed herein use amplification of samples (e.g., RT and/or isothermal amplification (e.g, LAMP) of RNA (e.g, RNA extracted from a patient sample) and/or PCR), followed by Casl2 detection of predefined target loci (e.g., coronavirus sequences), followed by cleavage of a reporter molecule to detect the presence of nucleic acids in the sample having the target loci (e.g., a viral sequence). For instance, in some embodiments, a DETECTR assay targets the E (envelope) genes or N (nucleoprotein) genes of a coronavirus (e.g., SARS-CoV-2). In some cases, a DETECTR assay targets the S-gene of a coronavirus (e.g, SARS-CoV-2) or coronavirus variant. Isothermal amplification can also be performed to amplify one or more regions of the S gene.
[0055] Table 1 : Genetic Changes Characterizing the SARS-CoV-2 Variants of Concern and Variants of Interest
Figure imgf000021_0001
Figure imgf000022_0001
[0056] Any of the regions of the Spike gene comprising the groups of mutations detailed in Table 1 may be selected as target.
[0057] Accordingly, another aspect of the present disclosure provides various compositions and methods of use thereof for assaying for and detecting a SARS-CoV-2 variant in a sample. In some implementations, the various compositions, methods, and reagents disclosed herein use an effector protein complexed with guide nucleic acid sequence to distinguish among SARS-CoV-2 variants or between a SARS-CoV-2 variant and a wildtype SARS-CoV-2. In some embodiments, the variant of SARS-CoV-2 (also known as 2019 novel coronavirus or 2019-nCoV) is one of the variants known as Alpha (B.l.1.7), Beta (B.1.315), Gamma (P.l), Epsilon (B.1.427 and B.1.429), Kappa (B.1.617.1), Omicron (B.1.1.529), Zeta (P.2), and Delta (B.1.617.2) and lineages thereof. In some embodiments, the assays disclosed herein target the S (spike) gene of the SARS-CoV-2 variant.
[0058] Disclosed herein is a method of detecting a SARS-CoV-2 variant in an individual. In some embodiments, such method comprises a) collecting a nasal swab or a throat swab from the individual; b) optionally extracting a target nucleic acid from the nasal swab or the throat swab; c) amplifying the target nucleic acid to produce an amplification product; d) contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof; e) assaying for a change in a signal produced by cleavage of a reporter nucleic acid wherein the effector protein trans cleaves the reporter nucleic acid upon hybridization of the non-naturally occurring guide nucleic acid to the segment of the target nucleic acid or the amplification product thereof; and f) determining a variant call of the sample. In some embodiments, methods described herein comprises amplifying the target nucleic acid (e.g, via reverse transcription-loop-mediated isothermal amplification (RT-LAMP)). In some embodiments, reagents for the amplification reaction comprise an FIP primer, a BIP primer, an F3 primer, a B3 primer, an LF primer, and a LB primer. In some embodiments, the amplification primers are selected from SEQ ID NOS: 8-13, SEQ ID NOS: 14-19, SEQ ID NOS: 28-33, or SEQ ID NOS: 34-39. In some embodiments, determining the variant call of the SARS-CoV-2 variant (e.g., any one of Alpha, Beta, Gamma, Delta, Epsilon, Kappa, Omicron, or Zeta SARS-CoV- 2 variants) comprises detecting one or more S-gene mutation(s) (e.g., associated with a L452R, E484K, E484Q, E484A, and N501Y mutation of the S-gene product) relevant to a wild-type SARS-CoV-2. In some embodiments, the non-naturally occurring guide nucleic acid comprises a nucleotide sequence that is at least 80%, 90%, 95%, 98% or 100% identical to SEQ ID NO: 1, 2, 3, 4, 5, or 6 or any one or SEQ ID NO: 22-27 or 40-42. In some embodiments, the reporter nucleic acid comprises a nucleotide sequence that is at least 75% or 100% identical to SEQ ID NO: 7, wherein the nucleotide sequence is flanked with a fluorescent dye on the 5’ end and a quencher on the 3’ end.
[0059] Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. However, it will be apparent to one of ordinary skill in the art that the present disclosure may be practiced without these specific details. In other instances, well-known methods, procedures, components, circuits, and networks have not been described in detail so as not to unnecessarily obscure aspects of the embodiments.
[0060] Definitions.
[0061] As used herein, the terms “individual,” “subject,” and “patient” are used interchangeably and include any member of the animal kingdom, including humans.
[0062] As used herein, the term “percent (%) sequence identity” describes the number of matches (“hits”) of identical nucleotides/amino acids of two or more aligned nucleic acid or amino acid sequences as compared to the number of nucleotides or amino acid residues making up the overall length of the reference nucleic acid or amino acid sequences. In other terms, using an alignment, for two or more sequences or sub-sequences the percentage of amino acid residues or nucleotides that are the same (e.g, 95% identity) may be determined, when the (sub)sequences are compared and aligned for maximum correspondence over a window of comparison, or over a designated region as measured using a sequence comparison algorithm as known in the art, or when manually aligned and visually inspected. This definition also applies to the complement of any sequence to be aligned.
[0063] In general terms, “amplification” or “amplifying” is a process by which a nucleic acid molecule is enzymatically copied to generate a progeny population with the same sequence as the parental one. For example, in some embodiments, the method provided herein includes repeating hybridizing a primer to the target nucleic acid, and extending a nucleic acid complementary to the strand of the target nucleic acid from the primer using a polymerase for one or more times, e.g, until a desired amount of amplification is achieved. In some instances, amplification of the target nucleic acid increases the concentration of the target nucleic acid in the sample relative to the concentration of nucleic acids that do not correspond to the target nucleic acid.
[0064] In general, the term ‘cleavage assay’ refers to an assay designed to visualize, quantitate or identify the cleavage activity of an effector protein. In some cases, the cleavage activity may be cis-cleavage activity. In some cases, the cleavage activity may be transcleavage activity. In some cases, the effector protein is an activated CRISPR effector protein. The term ‘activated effector protein’ refers to an effector protein associated with a guide RNA bound to a target RNA , thereby forming an ‘activated’ complex capable of exhibiting trans- or cis-cleavage.
[0065] The term “CRISPR-RNA” or “crRNA” or “spacer” refers to an RNA molecule having a sequence with sufficient complementarity to a target nucleic acid sequence to direct sequence-specific binding of an RNA-targeting complex to the target RNA sequence. In some embodiments, crRNAs contain a sequence that mediates target recognition and a sequence that duplexes with a tracrRNA. In some embodiments, the crRNA and tracrRNA duplex are present as parts of a single larger guide RNA molecule.
[0066] In some cases, the crRNA comprises a repeat region that interacts with the effector protein. The repeat region may also be referred to as a “protein-binding segment.” Typically, the repeat region is adjacent to the spacer region. For example, a guide nucleic acid that interacts with the effector protein may comprise a repeat region that is 5’ of the spacer region. The spacer region of the guide nucleic acid may be complementarity to (e.g., hybridize to) a target sequence of a target nucleic acid. In some cases, the spacer region is 15- 28 linked nucleosides in length. In some cases, the spacer region is 15-26, 15-24, 15-22, 15- 20, 15-18, 16-28, 16-26, 16-24, 16-22, 16-20, 16-18, 17-26, 17-24, 17-22, 17-20, 17-18, 18- 26, 18-24, or 18-22 linked nucleosides in length. In some cases, the spacer region is 18-24 linked nucleosides in length. In some cases, the spacer region is at least 15 linked nucleosides in length. In some cases, the spacer region is at least 16, 18, 20, or 22 linked nucleosides in length. In some cases, the spacer region comprises at least 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides. In some cases, the spacer region is at least 17 linked nucleosides in length. In some cases, the spacer region is at least 18 linked nucleosides in length.
[0067] A positive “detectable signal” may be any signal that can be detected using optical, fluorescent, chemiluminescent, electrochemical or other detection methods known in the art. In some embodiments, a first detectable signal may be generated by binding of the CRISPR effector protein complex to the target nucleic acid, indicating that the sample contains the target nucleic acid. In some embodiments, a detectable signal may be generated upon the cleavage event, the event comprising the cleavage of a detector nucleic acid by a Cas effector protein. Alternatively, or in combination, the detectable signal may be generated indirectly by the cleavage event. In some embodiments, the cleavage event is a trans-cleavage reaction or a cis-cleavage reaction.
[0068] As used herein, the term “effector protein”, “CRISPR effector”, “effector”, “CRISPR-associated protein” or “CRISPR enzyme” as used herein refers to a polypeptide, or a fragment thereof, possessing enzymatic activity, and that is capable of binding to a target nucleic acid molecule with the support of a guide nucleic acid molecule. In some embodiments, the binding is sequence-specific. In some embodiments, the guide nucleic acid molecule is DNA or RNA. In some cases, the target nucleic acid molecule may be DNA or RNA. In some cases, the term “effector protein” refers to a protein that is capable of modifying a nucleic acid molecule (e.g, by cleavage, deamination, recombination). In some embodiments, such enzymatic activities include, but are not limited to, nuclease activity, methyltransferase activity, demethylase activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer forming activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity, and glycosylase activity.
[0069] As used herein, the term “guide nucleic acid” or “guide sequence” is any polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct binding of a nucleic acid-targeting complex to the target sequence. In different embodiments, the term “gRNA”, as used here, refers to any RNA molecule that supports the targeting of an effector protein described here to a target nucleic acid. For example, “gRNAs” include, but are not limited to, crRNAs or crRNAs in combination with associated trans-activating RNAs (tracrRNAs). The latter can be independent RNAs or portions of sequences thereof can be fused into a single RNA using a linker. In different embodiments, the gRNA is designed to contain a chemical or biochemical modification. In some embodiments, a gRNA can contain one or more nucleotides. The terms “non-naturally occurring” or “engineered” are used interchangeably and indicate the involvement of the hand of man. The terms, when referring to nucleic acid molecules or polypeptides mean that the nucleic acid molecule or the polypeptide is at least substantially free from at least one other component with which they are naturally associated in nature and as found in nature.
[0070] As used herein, the term “nuclease activity” refers to the enzymatic activity of an enzyme which allows the enzyme to cleave the phosphodiester bonds between the nucleotide subunits of nucleic acids; the term “endonuclease activity” refers to the enzymatic activity of an enzyme which allows the enzyme to cleave the phosphodiester bond within a polynucleotide chain.
[0071] “Nucleotide” refers to a base-sugar-phosphate compound. Nucleotides are the monomeric subunits of both types of nucleic acid polymers, RNA and DNA. “Nucleotide” refers to ribonucleoside triphosphates, rATP, rGTP, rUTP, and rCTP, and deoxyribonucleoside triphosphates, such as dATP, dGTP, dTTP, and dCTP. As used herein, a “nucleoside” refers to a base-sugar combination without a phosphate group. “Base” refers to the nitrogen-containing base, for example, adenine (A), cytidine (C), guanine (G), and thymine (T) and uracil (U). In general, a 2’ deoxyribonucleoside-5’ -triphosphate (dNTP) refers to a base-sugar-phosphate compound, which has hydrogen at the 2’ position of the sugar, including, but not limited to, the four common deoxyribose-containing substrates (dATP, dCTP, dGTP, dTTP), and derivatives and analogs thereof. In general, a ribonucleoside-5’ -triphosphate (rNTP) refers to a base-sugar-phosphate compound, which has a hydroxyl group at the 2’ position of the sugar, including, but not limited to, the four common ribose-containing substrates for an RNA polymerase- ATP, CTP, GTP and UTP, and derivatives and analogs thereof. Each nucleotide in a double stranded DNA or RNA molecule is paired with its Watson-Crick counterpart called its complementary nucleotide. In a double stranded DNA or RNA sequence, the upper (sense) strand sequence is in general, understood as going in the direction from its 5'- to 3'-end, and the complementary sequence is thus understood as the sequence of the lower (antisense) strand in the same direction as the upper strand. Following the same logic, the reverse sequence is understood as the sequence of the upper strand in the direction from its 3'- to its 5'-end, while the ‘reverse complement’ sequence or the ‘reverse complementary’ sequence is understood as the sequence of the lower strand in the direction of its 5'- to its 3'-end.
[0072] As used herein, “reporter” is used interchangeably with “reporter nucleic acid,” “reporter molecule,” or “detector nucleic acid.” As used herein, the term “reporter” or “reporter nucleic acid” refers generally to an off-target nucleic acid molecule that is capable of providing a detectable signal upon cleavage by an ‘activated’ effector protein. By way of non-limiting and illustrative example, a reporter may comprise a single stranded nucleic acid and a detection moiety (e.g, a labeled single stranded RNA reporter), wherein the nucleic acid is capable of being cleaved by an effector protein (e.g, a CRISPR/Cas protein as disclosed herein) or a multimeric complex thereof, releasing the detection moiety, and generating a detectable signal.
[0073] The effector proteins disclosed herein, activated upon hybridization of a guide nucleic acid to a target nucleic acid, may cleave the reporter. As used herein, an ‘activated’ effector protein is a CRISPR nuclease that is activated upon hybridization of a guide nucleic acid to a target nucleic acid. In various embodiments, the reporter nucleic acid is further attached to a moiety, chemical compound or other component that can be used to visualize, quantitate or identify the cleavage of the reporter molecule. Cleaving the “reporter” may be referred to herein as cleaving the “reporter nucleic acid,” the “reporter molecule,” or the “nucleic acid”
[0074] In some embodiments, reporters may comprise RNA. Reporters may comprise DNA. In some embodiments, reporters may be double-stranded. In some embodiments, reporters may be single-stranded. In some instances, reporters comprise a protein capable of generating a signal. A signal may be a calorimetric, potentiometric, amperometric, optical (e.g, fluorescent, colorimetric, etc.), or piezo-electric signal. In some instances, the reporter comprises a detection moiety. Suitable detectable labels and/or moieties that may provide a signal include, but are not limited to, an enzyme, a radioisotope, a member of a specific binding pair; a fluorophore; a fluorescent protein; a quantum dot; and the like. In some instances, the reporter comprises a detection moiety and a quenching moiety. In some instances, the reporter comprises a cleavage site, wherein the detection moiety is located at a first site on the reporter and the quenching moiety is located at a second site on the reporter, wherein the first site and the second site are separated by the cleavage site. Sometimes the quenching moiety is a fluorescence quenching moiety. In some instances, the quenching moiety is 5' to the cleavage site and the detection moiety is 3' to the cleavage site. In some instances, the detection moiety is 5' to the cleavage site and the quenching moiety is 3' to the cleavage site. Sometimes the quenching moiety is at the 5' terminus of the nucleic acid of a reporter. Sometimes the detection moiety is at the 3' terminus of the nucleic acid of a reporter. In some instances, the detection moiety is at the 5' terminus of the nucleic acid of a reporter. In some instances, the quenching moiety is at the 3' terminus of the nucleic acid of a reporter.
[0075] As used herein, the terms “trans-activating crRNA” or “tracrRNA” refer to a transactivating or transactivated RNA molecule or a fragment thereof which contains a sequence with sufficient complementarity to allow association with a crRNA molecule. In some embodiments, the tracrRNA can refer to an RNA molecule that provides a scaffold for binding to a CRISPR effector protein. In some embodiments, the tracrRNA forms a structure facilitating the binding of a CRISPR-associated or a CRISPR effector protein to a specific target nucleic acid. In some embodiments, at least a portion of a crRNA sequence and at least a portion of a tracrRNA sequence are present as parts of a larger single guide RNA molecule.
[0076] The terms, “sgRNA”, “single guide RNA” or “single guide nucleic acid,” as used herein refer to a single nucleic acid system comprising: a first nucleotide sequence that binds non-covalently with an effector protein; and a second nucleotide sequence that hybridizes to a target nucleic acid. In some embodiments, the first nucleotide sequence is referred to as a handle sequence. A handle sequence may comprise at least a portion of a tracrRNA sequence, at least a portion of a repeat sequence, or a combination thereof. A sgRNA does not comprise a tracrRNA.
[0077] In the context of formation of a DNA or RNA-targeting complex, “target sequence” refers to a DNA or RNA sequence to which a DNA or RNA-targeting guide RNA is designed to have complementarity, where hybridization between a target sequence and a guide RNA promotes the association of the CRISPR effector protein with the target RNA.
[0078] As used herein, a “tissue sample” is any biological samples derived from a patient. This term includes, but is not limited to, biological fluids such as blood, serum, plasma, urine, cerebrospinal fluid, tears, saliva, lymph, dialysate, lavage fluid, semen, and other biological fluid samples. Cells and tissues of scientific origin are included as tissue samples. The term also includes cells or cells derived therefrom and their progeny, including cells in culture, cell supernatants, and cell lysates. This definition includes samples that have been manipulated in any way after they are obtained, such as treatment with reagents, solubilization, or enrichment of specific components such as polynucleotides or polypeptides. The term also includes derivatives and fractions of patient samples.
[0079] As used herein “trans-cleavage activity” also referred to as “collateral” or “transcollateral” cleavage may be non-specific cleavage of nearby single-stranded nucleic acid by an activated effector protein, such as trans cleavage of detector nucleic acids with a detection moiety.
[0080] In general, the term “trans cleavage assay” refers to an assay designed to visualize, quantitate or identify the trans-collateral activity of an activated effector protein. As used herein, an ‘activated’ effector protein is a CRISPR nuclease that is activated upon hybridization of a guide nucleic acid to a target nucleic acid, thereby inducing the formation of the nuclease-guide complex. A “DETECTR®” assay, as used herein, is a trans cleavage assay.
[0081] Disclosed herein are non-naturally occurring compositions and systems comprising at least one of an engineered effector protein and an engineered guide nucleic acid, which may simply be referred to herein as an effector protein and a guide nucleic acid, respectively. In general, an engineered effector protein and an engineered guide nucleic acid refer to an effector protein and a guide nucleic acid, respectively, that are not found in nature. In some instances, systems and compositions comprise at least one non-naturally occurring component. For example, compositions and systems may comprise a guide nucleic acid, wherein the sequence of the guide nucleic acid is different or modified from that of a naturally-occurring guide nucleic acid. In some instances, compositions and systems comprise at least two components that do not naturally occur together. For example, compositions and systems may comprise a guide nucleic acid comprising a repeat region and a spacer region which do not naturally occur together. Also, by way of example, composition and systems may comprise a guide nucleic acid and a Cas protein that do not naturally occur together. Conversely, and for clarity, an effector protein or guide nucleic acid that is “natural,” “naturally-occurring,” or “ found in nature” includes Cas proteins and guide nucleic acids from cells or organisms that have not been genetically modified by a human or machine.
[0082] In some instances, the guide nucleic acid comprises a non-natural nucleobase sequence. In some instances, the non-natural sequence is a nucleobase sequence that is not found in nature. The non-natural sequence may comprise a portion of a naturally-occurring sequence, wherein the portion of the naturally-occurring sequence is not present in nature absent the remainder of the naturally-occurring sequence. In some instances, the guide nucleic acid comprises two naturally-occurring sequences arranged in an order or proximity that is not observed in nature. In some instances, compositions and systems comprise a ribonucleotide complex comprising a CRISPR/Cas effector protein and a guide nucleic acid that do not occur together in nature. Engineered guide nucleic acids may comprise a first sequence and a second sequence that do not occur naturally together. For example, an engineered guide nucleic acid may comprise a sequence of a naturally-occurring repeat region and a spacer region that is complementary to a naturally-occurring eukaryotic sequence. The engineered guide nucleic acid may comprise a sequence of a repeat region that occurs naturally in an organism and a spacer region that does not occur naturally in that organism. An engineered guide nucleic acid may comprise a first sequence that occurs in a first organism and a second sequence that occurs in a second organism, wherein the first organism and the second organism are different. The guide nucleic acid may comprise a third sequence at a 3’ or 5’ end of the guide nucleic acid, or between the first and second sequences of the guide nucleic acid. For example, an engineered guide nucleic acid may comprise a naturally occurring crRNA and tracrRNA coupled by a linker sequence. Alternatively, an engineered guide nucleic acid may comprise at least a portion of a crRNA sequence and at least a portion of a tracrRNA sequence.
[0083] In some instances, compositions and systems described herein comprise an engineered effector protein that is similar to a naturally occurring effector protein. The engineered effector protein may lack a portion of the naturally occurring effector protein. The effector protein may comprise a mutation relative to the naturally-occurring effector protein, wherein the mutation is not found in nature. The effector protein may also comprise at least one additional amino acid relative to the naturally-occurring effector protein. In certain embodiments, the nucleotide sequence encoding the effector protein is codon optimized relative to the naturally occurring sequence.
[0084] Programmable Nucleases.
[0085] Disclosed herein are programmable nucleases and uses thereof, e.g., detection of target nucleic acids. In some cases, a programmable nuclease is capable of being activated when complexed with the guide nucleic acid and the target nucleic acid segment. A programmable nuclease can be capable of being activated when complexed with a guide nucleic acid and the target sequence. The programmable nuclease can be activated upon binding of the guide nucleic acid to its target nucleic acid and can non-specifically degrade a non-target nucleic acid in its environment. The programmable nuclease has trans-cleavage activity once activated. A programmable nuclease can be a Cas protein (also referred to, interchangeably, as a Cas nuclease or Cas effector protein). A guide nucleic acid (e.g., crRNA) and Cas protein can form a CRISPR enzyme.
[0086] In some embodiments, one or more programmable nucleases as disclosed herein can be activated to initiate trans-cleavage activity of a reporter (also referred to herein as a reporter molecule). A programmable nuclease as disclosed herein can, in some cases, bind to a target sequence or target nucleic acid to initiate trans-cleavage of a reporter. The programmable nuclease can be referred to as an RNA-activated programmable RNA nuclease. In some instances, the programmable nuclease as disclosed herein can bind to a target DNA to initiate trans-cleavage of an RNA reporter. Such a programmable nuclease can be referred to herein as a DNA-activated programmable RNA nuclease. In some cases, a programmable nuclease as described herein can be activated by a target RNA or a target DNA. For example, a programmable nuclease, e.g, a Cas enzyme, can be activated by a target RNA nucleic acid or a target DNA nucleic acid to cleave RNA reporters. In some embodiments, the programmable nuclease can bind to a target ssDNA which initiates trans- cleavage of RNA reporters. In some instances, a programmable nuclease as disclosed herein can bind to a target DNA to initiate trans-cleavage of a DNA reporter, and this programmable nuclease can be referred to as a DNA-activated programmable DNA nuclease.
[0087] The programmable nuclease can become activated after binding of a guide nucleic acid that is complexed with the programmable nuclease with a target nucleic acid, and the activated programmable nuclease can cleave the target nucleic acid, which can result in a trans-cleavage activity. Trans-cleavage activity can be non-specific cleavage of nearby single-stranded nucleic acids by the activated programmable nuclease, such as trans-cleavage of reporter nucleic acids comprising a detection moiety. Once the reporter is cleaved by the activated programmable nuclease, the detection moiety can be released or separated from the reporter and can directly or indirectly generate a detectable signal. The reporter and/or the detection moiety can be immobilized on a support medium. Often the detection moiety is at least one of a fluorophore, a dye, a polypeptide, or a nucleic acid. In some embodiments, the detection moiety binds to a capture molecule on the support medium to be immobilized. The detectable signal can be visualized on the support medium to assess the presence or concentration of one or more target nucleic acids associated with an ailment, such as a SNP associated with a disease, cancer, or genetic disorder.
[0088] In some embodiments, a programmable nuclease is any enzyme that can be or has been designed, modified, or engineered by human contribution so that the enzyme targets or cleaves a nucleic acid in a sequence-specific manner. Programmable nucleases can include, for example, zinc-finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and/or RNA-guided nucleases such as the bacterial clustered regularly interspaced short palindromic repeat (CRISPR)-Cas (CRISPR-associated) nucleases or Cpfl. Programmable nucleases can also include, for example, PfAgo and/or NgAgo.
[0089] Several programmable nucleases are consistent with the methods and devices of the present disclosure. For example, Cas proteins are programmable nucleases used in the methods and systems disclosed herein. Cas proteins can include any of the known Classes and Types of CRISPR/Cas enzymes. Programmable nucleases disclosed herein include Class 1 Cas proteins, such as the Type I, Type IV, or Type III Cas proteins. Programmable nucleases disclosed herein also include the Class 2 Cas proteins, such as the Type II, Type V, and Type VI Cas proteins. Programmable nucleases included in the methods disclosed herein and methods of use thereof include a Type V or Type VI Cas proteins.
[0090] In some instances, the programmable nuclease is a Type V Cas protein. In general, a Type V Cas effector protein comprises a RuvC domain but lacks an HNH domain. In most instances, the RuvC domain of the Type V Cas effector protein comprises three patrial RuvC domains (RuvC-I, RuvC-II, and RuvC-III, also referred to herein as subdomains). In some instances, the three RuvC subdomains are located within the C-terminal half of the Type V Cas effector protein. In some instances, none of the RuvC subdomains are located at the N terminus of the protein. In some instances, the RuvC subdomains are contiguous. In some instances, the RuvC subdomains are not contiguous with respect to the primary amino acid sequence of the Type V Cas protein, but form a ruvC domain once the protein is produced and folds. In some instances, there are zero to about 50 amino acids between the first and second RuvC subdomains. In some instances, there are zero to about 50 amino acids between the second and third RuvC subdomains. In some instances, the Cas effector is a Casl4 effector. In some instances, the Casl4 effector is a Casl4a, Casl4al, Casl4b, Casl4c, Casl4d, Casl4e, Casl4f, Casl4g, Casl4h, or Casl4u effector. In some instances, the Cas effector is a CasPhi (also referred to herein as a Cas)) effector. In some instances, the Cas effector is a Casl2 effector. In some instances, the Casl2 effector is a Casl2a, Casl2b, Cast 2c, Cast 2d, Casl2e, or Casl2j effector.
[0091] In some instances, the Type V Cas protein comprises a Casl4 protein. Casl4 proteins can comprise a bilobed structure with distinct amino-terminal and carboxy -terminal domains. The amino- and carboxy-terminal domains can be connected by a flexible linker. The flexible linker can affect the relative conformations of the amino- and carboxyl-terminal domains. The flexible linker can be short, for example less than 10 amino acids, less than 8 amino acids, less than 6 amino acids, less than 5 amino acids, or less than 4 amino acids in length. The flexible linker can be sufficiently long to enable different conformations of the amino- and carboxy-terminal domains among two Cas 14 proteins of a Cas 14 dimer complex (e.g., the relative orientations of the amino- and carboxy-terminal domains differ between two Cas 14 proteins of a Cas 14 homodimer complex). The linker domain can comprise a mutation which affects the relative conformations of the amino- and carboxyl-terminal domains. The linker can comprise a mutation which affects Casl4 dimerization. For example, a linker mutation can enhance the stability of a Cas 14 dimer.
[0092] In some instances, the amino-terminal domain of a Cas 14 protein comprises a wedge domain, a recognition domain, a zinc finger domain, or any combination thereof. The wedge domain can comprise a multi-strand P-barrel structure. A multi-strand P-barrel structure can comprise an oligonucleotide/oligosaccharide-binding fold that is structurally comparable to those of some Casl2 proteins. The recognition domain and the zinc finger domain can each (individually or collectively) be inserted between P-barrel strands of the wedge domain. The recognition domain can comprise a 4-a-helix structure, structurally comparable but shorter than those found in some Cas 12 proteins. The recognition domain can comprise a binding affinity for a guide nucleic acid or for a guide nucleic acid-target nucleic acid heteroduplex. In some cases, a REC lobe can comprise a binding affinity for a PAM sequence in the target nucleic acid. The amino-terminal can comprise a wedge domain, a recognition domain, and a zinc finger domain. The carboxy-terminal can comprise a RuvC domain, a zinc finger domain, or any combination thereof. The carboxy-terminal can comprise one RuvC and one zinc finger domain.
[0093] In some embodiments, Cas 14 proteins comprise a RuvC domain or a partial RuvC domain. The RuvC domain can be defined by a single, contiguous sequence, or a set of partial RuvC domains that are not contiguous with respect to the primary amino acid sequence of the Cas 14 protein. In some instances, a partial RuvC domain does not have any substrate binding activity or catalytic activity on its own. A Casl4 protein of the present disclosure can include multiple partial RuvC domains, which can combine to generate a RuvC domain with substrate binding or catalytic activity. For example, a Casl4 can include 3 partial RuvC domains (RuvC-I, RuvC-II, and RuvC-III, also referred to herein as subdomains) that are not contiguous with respect to the primary amino acid sequence of the Casl4 protein but form a RuvC domain once the protein is produced and folds. A Casl4 protein can comprise a linker loop connecting a carboxy terminal domain of the Cast 4 protein with the amino terminal domain of the Cas 14 protein, and where the carboxy terminal domain comprises one or more RuvC domains and the amino terminal domain comprises a recognition domain.
[0094] In some embodiments, Cas 14 proteins comprise a zinc finger domain. In some instances, a carboxy terminal domain of a Casl4 protein comprises a zinc finger domain. In some instances, an amino terminal domain of a Cas 14 protein comprises a zinc finger domain. In some instances, the amino terminal domain comprises a wedge domain (e.g., a multi-P-barrel wedge structure), a zinc finger domain, or any combination thereof. In some cases, the carboxy terminal domain comprises the RuvC domains and a zinc finger domain, and the amino terminal domain comprises a recognition domain, a wedge domain, and a zinc finger domain.
[0095] In some instances, the Type V Cas protein is a Cas) protein. A Cas protein can function as an endonuclease that catalyzes cleavage at a specific sequence in a target nucleic acid. A programmable Cas nuclease can have a single active site in a RuvC domain that is capable of catalyzing pre-crRNA processing and nicking or cleaving of nucleic acids. This compact catalytic site can render the programmable Cas nuclease especially advantageous for genome engineering and new functionalities for genome manipulation.
[0096] In some instances, the programmable nuclease is a Type VI Cas protein. In some embodiments, the Type VI Cas protein is a programmable Cas 13 nuclease. The general architecture of a Cas 13 protein includes an N-terminal domain and two HEPN (higher eukaryotes and prokaryotes nucleotide-binding) domains separated by two helical domains. The HEPN domains each comprise aR-X4-H motif. Shared features across Cas 13 proteins include that upon binding of the crRNA of the guide nucleic acid to a target nucleic acid, the protein undergoes a conformational change to bring together the HEPN domains and form a catalytically active RNase. Thus, two activatable HEPN domains are characteristic of a programmable Cas 13 nuclease of the present disclosure. However, programmable Cas 13 nucleases also consistent with the present disclosure include Cast 3 nucleases comprising mutations in the HEPN domain that enhance the Cast 3 proteins cleavage efficiency or mutations that catalytically inactivate the HEPN domains. Programmable Cast 3 nucleases consistent with the present disclosure also Cast 3 nucleases comprising catalytic components. In some instances, the Cas effector is a Cas 13 effector. In some instances, the Cas 13 effector is a Casl3a, a Casl3b, a Cas 13c, a Cas 13d, or a Cas 13e effector protein.
[0097] In some cases, the programmable nuclease is Cas 13. In some instances, the Cas 13 is Casl3a, Casl3b, Casl3c, Casl3d, or Casl3e. In some cases, the programmable nuclease is Mad7 or Mad2. In some cases, the programmable nuclease is Cas 12. In some embodiments, the Casl2 is Casl2a, Casl2b, Casl2c, Casl2d, or Casl2e. In some cases, the programmable nuclease is Csml, Cas9, C2c4, C2c8, C2c5, C2cl0, C2c9, or CasZ. In some embodiments, the Csml is also called smCmsl, miCmsl, obCmsl, or suCmsl. In some embodiments, Casl3a is called C2c2. In some embodiments, CasZ is called Casl4a, Casl4b, Casl4c, Casl4d, Casl4e, Casl4f, Casl4g, or Casl4h. In some embodiments, the programmable nuclease is a type V CRISPR-Cas system. In some cases, the programmable nuclease is a type VI CRISPR-Cas system. In some embodiments, the programmable nuclease is a type III CRISPR-Cas system. In some cases, the programmable nuclease is from at least one of Leptotrichia shahii (Lsh), Listeria seeligeri (Lse), Leptotrichia buccalis (Lbu), Leptotrichia wadeu (Lwa), Rhodobacter capsulatus (Rea), Herbinix hemicellulosilytica (Hhe), Paludibacter propionicigen.es (Ppr), Lachnospiraceae bacterium (Lba), [Eubacterium] rectale (Ere), Listeria newyorkensis (Lny), Clostridium aminophilum (Cam), Prevotella sp. (Psm), Capnocytophaga canimorsus (Cea, Lachnospiraceae bacterium (Lba), Bergeyella zoohelcum (Bzo), Prevotella intermedia (Pin), Prevotella buccae (Pbu), Alistipes sp. (Asp), Riemerella anatipestifer (Ran), Prevotella aurantiaca (Pau), Prevotella saccharolytica (Psa), Prevotella intermedia (Pin2), Capnocytophaga canimorsus (Cea), Porphyromonas gulae (Pgu), Prevotella sp. (Psp), Porphyromonas gingivalis (Pig), Prevotella intermedia (Pin3), Enterococcus italicus (Ei), Lactobacillus salivarius (Ls), or Thermus thermophilus (Tt). In some embodiments, the Casl3 is at least one of LbuCasl3a, LwaCasl3a, LbaCasl3a, HheCasl3a, PprCasl3a, EreCasl3a, CamCasl3a, or LshCasl3a. The trans-cleavage activity of the programmable nuclease can be activated when the guide nucleic acid is complexed with the target nucleic acid. In some embodiments, the target nucleic acid is RNA or DNA.
[0098] In some embodiments, the programmable nuclease comprises a Cas 12 protein, where the Cas 12 enzyme binds and cleaves double stranded DNA and single stranded DNA. In some embodiments, programmable nuclease comprises a Cast 3 protein, where the Cast 3 enzyme binds and cleaves single stranded RNA. In some embodiments, programmable nuclease comprises a Casl4 protein, where the Casl4 enzyme binds and cleaves both double stranded DNA and single stranded DNA.
[0099] Table 2 provides illustrative amino acid sequences of programmable nucleases having trans-cleavage activity. In some instances, programmable nucleases described herein comprise an amino acid sequence that is at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, or at least 98%, at least 99%, or 100% identical to any one of SEQ ID NOS: 45-117. In some embodiments, the programmable nuclease consists of an amino acid sequence that is at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100% identical to any one of SEQ ID NOS: 45-117. In some embodiments, the programmable nuclease comprises at least about 50, at least about 100, at least about 150, at least about 200, at least about 250, at least about 300, at least about 350, at least about 400, at least about 450, or at least about 500 consecutive amino acids of any one of SEQ ID NOS: 45-117.
[00100] Table 2 : Amino Acid Sequences of Exemplary Programmable Nucleases
Figure imgf000036_0001
Figure imgf000037_0001
Figure imgf000038_0001
Figure imgf000039_0001
Figure imgf000040_0001
Figure imgf000041_0001
Figure imgf000042_0001
Figure imgf000043_0001
Figure imgf000044_0001
Figure imgf000045_0001
Figure imgf000046_0001
Figure imgf000047_0001
Figure imgf000048_0001
Figure imgf000049_0001
Figure imgf000050_0001
Figure imgf000051_0001
Figure imgf000052_0001
Figure imgf000053_0001
Figure imgf000054_0001
Figure imgf000055_0001
Figure imgf000056_0001
Figure imgf000057_0001
Figure imgf000058_0001
Figure imgf000059_0001
Figure imgf000060_0001
Figure imgf000061_0001
Figure imgf000062_0001
Figure imgf000063_0001
Figure imgf000064_0001
Figure imgf000065_0001
Figure imgf000066_0001
Figure imgf000067_0001
[00101] In some instances, effector proteins disclosed herein are engineered proteins. Engineered proteins are not identical to a naturally-occurring protein. Engineered proteins can provide enhanced nuclease or nickase activity as compared to a naturally occurring nuclease or nickase. An engineered protein can comprise a modified form of a wild-type counterpart protein. In some instances, effector proteins comprise at least one amino acid change (e.g, deletion, insertion, or substitution) that enhances or reduces the nucleic acidcleaving activity of the effector protein relative to the wild-type counterpart.
[00102] In some embodiments, a programmable nuclease is thermostable. In some instances, known programmable nucleases (e.g, Casl2 nucleases) are relatively thermosensitive and only exhibit activity (e.g, cis and/or trans-cleavage) sufficient to produce a detectable signal in a diagnostic assay at temperatures less than 40° C, and optimally at about 37° C. A thermostable protein can have enzymatic activity, stability, or folding comparable to those at 37 °C. In some instances, the trans-cleavage activity (e.g, the maximum trans- cleavage rate as measured by fluorescent signal generation) of a programmable nuclease in a trans-cleavage assay at 40 °C, 45 °C, 50 °C, 55 °C, 60 °C, 65 °C, 70 °C, 75 °C, 80 °C, or more is at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 100%, at least 1-fold, at least 2- fold, at least 3-fold, at least 4-fold, at least 5-fold, at least 6-fold, at least 7-fold, at least 8- fold, at least 9-fold, at least 10-fold, at least 11-fold, at least 12-fold, at least 13-fold, at least 14-fold, at least 15-fold, at least 20-fold, at least 25-fold, at least 30-fold, at least 35-fold, at least 40-fold, at least 45-fold, at least 50-fold or more of that at 37 °C.
[00103] Engineered Guide Nucleic Acids.
[00104] As used herein, a guide nucleic acid refers to a nucleic acid that comprises a sequence that targets (e.g., is reverse complementary to) the sequence of a target nucleic acid. A guide nucleic acid can be DNA, RNA, or a combination thereof (e.g., RNA with a thymine base), or include a chemically modified nucleobase or phosphate backbone. Guide nucleic acids are often referred to as a “guide RNA” or “gRNA.” However, a guide nucleic acid can comprise deoxyribonucleotides and/or modified nucleobases. In general, a guide nucleic acid is a nucleic acid molecule that binds to an effector protein (e.g., a Cas effector protein), thereby forming a ribonucleoprotein complex (RNP). In some instances, the engineered guide nucleic acid imparts activity or sequence selectivity to the effector protein. In some cases, a guide nucleic acid, when complexed with an effector protein, brings the effector protein into proximity of a target nucleic acid molecule. An engineered guide nucleic acid can comprise a CRISPR RNA (crRNA) that is at least partially complementary to a target nucleic acid. In some instances, the engineered guide nucleic acid comprises a trans-activating crRNA (tracrRNA), at least a portion of which interacts with the effector protein. A guide nucleic acid can comprise or be coupled to a tracrRNA. The tracrRNA can comprise deoxyribonucleosides in addition to ribonucleosides. The tracrRNA can be separate from but form a complex with a crRNA. In some instances, at least a portion of a crRNA sequence and at least a portion of a tracrRNA sequence are provided as a single guide nucleic acid, also referred to as a single guide RNA (sgRNA). In some instances, a crRNA and tracrRNA function as two separate, unlinked molecules.
[00105] Reporters.
[00106] In some instances, the systems and methods disclosed herein provide a reporter (interchangeably, “reporter nucleic acid” or “reporter molecule”). By way of non-limiting and illustrative example, a reporter can comprise a single stranded nucleic acid and a detection moiety (e.g., a labeled single stranded RNA reporter), where the nucleic acid is capable of being cleaved by a programmable nuclease (e.g., a Type V or VI CRISPR/Cas protein as disclosed herein) or a multimeric complex thereof, releasing the detection moiety, and generating a detectable signal. Moreover, as used interchangeably herein, the term “reporting signal” or “detectable signal” refers to any readout from a reporter, such as a fluorescence intensity and/or a lateral flow readout. The programmable nucleases disclosed herein, activated upon hybridization of a guide RNA to a target nucleic acid, may cleave the reporter. Accordingly, cleaving the “reporter” can be referred to interchangeably herein as cleaving the “reporter nucleic acid,” the “reporter molecule,” or the “nucleic acid of the reporter.” Reporters can comprise RNA. Reporters can comprise DNA. Reporters can be doublestranded. Reporters can be single-stranded.
[00107] In some instances, the systems and methods disclosed herein provide a Type V CRISPR/Cas protein and a reporter nucleic acid configured to undergo transcollateral cleavage by the Type V CRISPR/Cas protein. Transcollateral cleavage of the reporter can generate a signal from the reporter or alter a signal from the reporter. In some cases, the signal is an optical signal, such as a fluorescence signal or absorbance band. Transcollateral cleavage of the reporter can alter the wavelength, intensity, or polarization of the optical signal. For example, the reporter can comprise a fluorophore and a quencher, such that transcollateral cleavage of the reporter separates the fluorophore and the quencher thereby increasing a fluorescence signal from the fluorophore. Herein, detection of reporter cleavage (directly or indirectly) to determine the presence of a target nucleic acid sequence is, in some embodiments, referred to as “DETECTR.” In some embodiments described herein is a method of assaying for a target nucleic acid in a sample comprising contacting the target nucleic acid with a programmable nuclease, a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid, and a reporter nucleic acid, and assaying for a change in a signal, where the change in the signal is produced by or indicative of cleavage of the reporter nucleic acid.
[00108] In some cases, the reporter comprises a detection moiety. In some instances, the reporter comprises a cleavage site, where the detection moiety is located at a first site on the reporter, where the first site is separated from the remainder of reporter upon cleavage at the cleavage site. In some cases, the detection moiety is 3’ to the cleavage site. In some cases, the detection moiety is 5’ to the cleavage site. In some embodiments, the detection moiety is at the 3’ terminus of the nucleic acid of a reporter. In some cases, the detection moiety is at the 5’ terminus of the nucleic acid of a reporter.
[00109] In some embodiments, the reporter comprises a nucleic acid and a detection moiety. In some embodiments, a reporter is connected to a surface by a linkage. In some embodiments, a reporter comprises at least one of a nucleic acid, a chemical functionality, a detection moiety, a quenching moiety, or a combination thereof. In some embodiments, a reporter is configured for the detection moiety to remain immobilized to the surface and the quenching moiety to be released into solution upon cleavage of the reporter. In some embodiments, a reporter is configured for the quenching moiety to remain immobilized to the surface and for the detection moiety to be released into solution, upon cleavage of the reporter. Often the detection moiety is at least one of a label, a polypeptide, a dendrimer, or a nucleic acid, or a combination thereof. In some embodiments, the reporter contains a label. In some embodiments, the label is FITC, DIG, TAMRA, Cy5, AF594, or Cy3. In some embodiments, the label comprises a dye, a nanoparticle configured to produce a signal. In some embodiments, the dye is a fluorescent dye. In some embodiments, the at least one chemical functionality comprises biotin. In some embodiments, the at least one chemical functionality is configured to be captured by a capture probe. In some embodiments, the at least one chemical functionality comprises biotin and the capture probe comprises anti-biotin, streptavidin, avidin or other molecule configured to bind with biotin. In some embodiments, the dye is the chemical functionality. In some embodiments, a capture probe comprises a molecule that is complementary to the chemical functionality. In some embodiments, the capture antibodies are anti-FITC, anti-DIG, anti-TAMRA, anti-Cy5, anti-AF594, or any other appropriate capture antibody capable of binding the detection moiety or conjugate. In some embodiments, the detection moiety is the chemical functionality.
[00110] In some instances, reporters comprise a detection moiety capable of generating a signal. A signal can be a calorimetric, potentiometric, amperometric, optical (e.g, fluorescent, colorimetric, etc.), or piezo-electric signal. Suitable detectable labels and/or moieties that can provide a signal include, but are not limited to, an enzyme, a radioisotope, a member of a specific binding pair, a fluorophore, a fluorescent protein, a quantum dot, and the like.
[00111] In some cases, the reporter comprises a detection moiety and a quenching moiety. In some instances, the reporter comprises a cleavage site, where the detection moiety is located at a first site on the reporter and the quenching moiety is located at a second site on the reporter, where the first site and the second site are separated by the cleavage site. In some embodiments, the quenching moiety is a fluorescence quenching moiety. In some cases, the quenching moiety is 5’ to the cleavage site and the detection moiety is 3’ to the cleavage site. In some cases, the detection moiety is 5’ to the cleavage site and the quenching moiety is 3’ to the cleavage site. In some embodiments, the quenching moiety is at the 5’ terminus of the nucleic acid of a reporter. In some embodiments, the detection moiety is at the 3’ terminus of the nucleic acid of a reporter. In some cases, the detection moiety is at the 5’ terminus of the nucleic acid of a reporter. In some cases, the quenching moiety is at the 3’ terminus of the nucleic acid of a reporter.
[00112] Suitable fluorescent proteins include, but are not limited to, green fluorescent protein (GFP) or variants thereof, blue fluorescent variant of GFP (BFP), cyan fluorescent variant of GFP (CFP), yellow fluorescent variant of GFP (YFP), enhanced GFP (EGFP), enhanced CFP (ECFP), enhanced YFP (EYFP), GFPS65T, Emerald, Topaz (TYFP), Venus, Citrine, mCitrine, GFPuv, destabilised EGFP (dEGFP), destabilised ECFP (dECFP), destabilised EYFP (dEYFP), mCFPm, Cerulean, T-Sapphire, CyPet, YPet, mKO, HcRed, t- HcRed, DsRed, DsRed2, DsRed-monomer, J-Red, dimer2, t-dimer2(12), mRFPl, pocilloporin, Renilla GFP, Monster GFP, paGFP, Kaede protein and kindling protein, Phycobiliproteins and Phycobiliprotein conjugates including B-Phycoerythrin, R- Phycoerythrin and Allophycocyanin. Suitable enzymes include, but are not limited to, horseradish peroxidase (HRP), alkaline phosphatase (AP), beta-galactosidase (GAL), glucose-6-phosphate dehydrogenase, beta-N-acetylglucosaminidase, CE<-glucuronidase, invertase, Xanthine Oxidase, firefly luciferase, and glucose oxidase (GO).
[00113] In some instances, the detection moiety comprises an invertase. The substrate of the invertase can be sucrose. A DNS reagent can be included in the system to produce a colorimetric change when the invertase converts sucrose to glucose. In some cases, the reporter nucleic acid and invertase are conjugated using a heterobifunctional linker via sulfo- SMCC chemistry.
[00114] In some embodiments, suitable fluorophores provide a detectable fluorescence signal in the same range as 6-Fluorescein (Integrated DNA Technologies), IRDye 700 (Integrated DNA Technologies), TYE 665 (Integrated DNA Technologies), Alex Fluor 594 (Integrated DNA Technologies), or ATTO TM 633 (NHS Ester) (Integrated DNA Technologies). Non-limiting examples of fluorophores are fluorescein amidite, 6-Fluorescein, IRDye 700, TYE 665, Alex Fluor 594, or ATTO TM 633 (NHS Ester). The fluorophore can be an infrared fluorophore. The fluorophore can emit fluorescence in the range of 500 nm and 720 nm. In some cases, the fluorophore emits fluorescence at a wavelength of 700 nm or higher. In other cases, the fluorophore emits fluorescence at about 665 nm. In some cases, the fluorophore emits fluorescence in the range of 500 nm to 520 nm, 500 nm to 540 nm, 500 nm to 590 nm, 590 nm to 600 nm, 600 nm to 610 nm, 610 nm to 620 nm, 620 nm to 630 nm, 630 nm to 640 nm, 640 nm to 650 nm, 650 nm to 660 nm, 660 nm to 670 nm, 670 nm to 680 nm, 690 nm to 690 nm, 690 nm to 700 nm, 700 nm to 710 nm, 710 nm to 720 nm, or 720 nm to 730 nm. In some cases, the fluorophore emits fluorescence in the range 450 nm to 750 nm, 500 nm to 650 nm, or 550 to 650 nm.
[00115] In some embodiments, systems comprise a quenching moiety. A quenching moiety may be chosen based on its ability to quench the detection moiety. A quenching moiety can be a non-fluorescent fluorescence quencher. A quenching moiety may quench a detection moiety that emits fluorescence in the range of 500 nm and 720 nm. A quenching moiety may quench a detection moiety that emits fluorescence in the range of 500 nm and 720 nm. In some cases, the quenching moiety quenches a detection moiety that emits fluorescence at a wavelength of 700 nm or higher. In other cases, the quenching moiety quenches a detection moiety that emits fluorescence at about 660 nm or about 670 nm. In some cases, the quenching moiety quenches a detection moiety that emits fluorescence in the range of 500 to 520, 500 to 540, 500 to 590, 590 to 600, 600 to 610, 610 to 620, 620 to 630, 630 to 640, 640 to 650, 650 to 660, 660 to 670, 670 to 680, 690 to 690, 690 to 700, 700 to 710, 710 to 720, or 720 to 730 nm. In some cases, the quenching moiety quenches a detection moiety that emits fluorescence in the range 450 nm to 750 nm, 500 nm to 650 nm, or 550 to 650 nm. A quenching moiety may quench fluorescein amidite, 6-Fluorescein, IRDye 700, TYE 665, Alex Fluor 594, or ATTO TM 633 (NHS Ester). A quenching moiety can be Iowa Black RQ, Iowa Black FQ or IRDye QC-1 Quencher. In some embodiments, a quenching moiety quenches fluorescein amidite, 6-Fluorescein (Integrated DNA Technologies), IRDye 700 (Integrated DNA Technologies), TYE 665 (Integrated DNA Technologies), Alex Fluor 594 (Integrated DNA Technologies), or ATTO TM 633 (NHS Ester) (Integrated DNA Technologies). A quenching moiety can be Iowa Black RQ (Integrated DNA Technologies), Iowa Black FQ (Integrated DNA Technologies) or IRDye QC-1 Quencher (LiCor). Any of the quenching moieties described herein can be from any commercially available source, can be an alternative with a similar function, a generic, or a non-trade name of the quenching moieties listed.
[00116] Generally, the generation of the detectable signal from the release of the detection moiety indicates that cleavage by the programmable nucleases has occurred and that the sample contains the target nucleic acid. In some cases, the detection moiety comprises a fluorescent dye. In some embodiments, the detection moiety comprises a fluorescence resonance energy transfer (FRET) pair. In some cases, the detection moiety comprises an infrared (IR) dye. In some cases, the detection moiety comprises an ultraviolet (UV) dye. Alternatively, or in combination, the detection moiety comprises a protein. In some embodiments, the detection moiety comprises a biotin. In some embodiments, the detection moiety comprises at least one of avidin or streptavidin. In some instances, the detection moiety comprises a polysaccharide, a polymer, or a nanoparticle. In some instances, the detection moiety comprises a gold nanoparticle or a latex nanoparticle.
[00117] A detection moiety can be any moiety capable of generating a calorimetric, potentiometric, amperometric, optical (e.g., fluorescent, colorimetric, etc.), or piezo-electric signal. A nucleic acid of a reporter, in some embodiments, is protein-nucleic acid that is capable of generating a calorimetric, potentiometric, amperometric, optical (e.g, fluorescent, colorimetric, etc.), or piezo-electric signal upon cleavage of the nucleic acid. Often a calorimetric signal is heat produced after cleavage of the nucleic acids of a reporter. In some embodiments, a calorimetric signal is heat absorbed after cleavage of the nucleic acids of a reporter. A potentiometric signal, for example, is electrical potential produced after cleavage of the nucleic acids of a reporter. An amperometric signal can be movement of electrons produced after the cleavage of nucleic acid of a reporter. Often, the signal is an optical signal, such as a colorimetric signal or a fluorescence signal. An optical signal is, for example, a light output produced after the cleavage of the nucleic acids of a reporter. In some embodiments, an optical signal is a change in light absorbance between before and after the cleavage of nucleic acids of a reporter. Often, a piezo-electric signal is a change in mass between before and after the cleavage of the nucleic acid of a reporter. Other methods of detection may also be used, such as optical imaging, surface plasmon resonance (SPR), and/or interferometric sensing.
[00118] In some embodiments, the detectable signal is a colorimetric signal or a signal visible by eye. In some instances, the detectable signal is fluorescent, electrical, chemical, electrochemical, or magnetic. In some cases, a detectable signal (e.g., a first detectable signal) is generated by binding of the detection moiety to the capture molecule in the detection region, where the detectable signal indicates that the sample contained the target nucleic acid. In some embodiments, systems are capable of detecting more than one type of target nucleic acid, where the system comprises more than one type of guide nucleic acid and more than one type of reporter nucleic acid. For example, systems may be capable of distinguishing between two different target nucleic acids in a sample (e.g, wild-type or mutant when investigating SNPs as described herein). In some cases, the detectable signal is generated directly by the cleavage event. Alternatively, or in combination, the detectable signal is generated indirectly by the cleavage event. In some instances, the detectable signal is not a fluorescent signal. In some instances, the detectable signal is a colorimetric or colorbased signal. In some cases, the detected target nucleic acid is identified based on its spatial location on the detection region of the support medium. In some cases, a second detectable signal is generated in a spatially distinct location than a first detectable signal when two or more detectable signals are generated.
[00119] Often, the reporter is an enzyme-nucleic acid. The enzyme can be sterically hindered when present as in the enzyme-nucleic acid, but then functional upon cleavage from the nucleic acid by the programmable nuclease. Often, the enzyme is an enzyme that produces a reaction with an enzyme substrate. An enzyme can be invertase. Often, the substrate of invertase is sucrose and DNS reagent.
[00120] In some embodiments, the reporter is a substrate-nucleic acid. Often the substrate is a substrate that produces a reaction with an enzyme. Release of the substrate upon cleavage by the programmable nuclease may free the substrate to react with the enzyme.
[00121] In some embodiments, a reporter is attached to a solid support. The solid support, for example, can be a surface. A surface can be an electrode. In some embodiments, the solid support is a bead. Often the bead is a magnetic bead. Upon cleavage, the detection moiety is liberated from the solid support and interacts with other mixtures. For example, the detection moiety is an enzyme, and upon cleavage of the nucleic acid of the enzyme-nucleic acid, the enzyme flows through a chamber into a mixture comprising the substrate. When the enzyme meets the enzyme substrate, a reaction occurs, such as a colorimetric reaction, which is then detected. As another example, the detection moiety is an enzyme substrate, and upon cleavage of the nucleic acid of the enzyme substrate-nucleic acid, the enzyme flows through a chamber into a mixture comprising the enzyme. When the enzyme substrate meets the enzyme, a reaction occurs, such as a calorimetric reaction, which is then detected.
[00122] In some embodiments, the reporter comprises a nucleic acid conjugated to an affinity molecule which is in turn conjugated to the fluorophore (e.g, nucleic acid - affinity molecule - fluorophore) or the nucleic acid conjugated to the fluorophore which is in turn conjugated to the affinity molecule (e.g, nucleic acid - fluorophore - affinity molecule). In some embodiments, a linker conjugates the nucleic acid to the affinity molecule. In some embodiments, a linker conjugates the affinity molecule to the fluorophore. In some embodiments, a linker conjugates the nucleic acid to the fluorophore. A linker can be any suitable linker known in the art. In some embodiments, the nucleic acid of the reporter can be directly conjugated to the affinity molecule and the affinity molecule can be directly conjugated to the fluorophore or the nucleic acid can be directly conjugated to the fluorophore and the fluorophore can be directly conjugated to the affinity molecule. In this context, “directly conjugated” indicates that no intervening molecules, polypeptides, proteins, or other moi eties are present between the two moieties directly conjugated to each other. For example, if a reporter comprises a nucleic acid directly conjugated to an affinity molecule and an affinity molecule directly conjugated to a fluorophore, then no intervening moiety is present between the nucleic acid and the affinity molecule and no intervening moiety is present between the affinity molecule and the fluorophore. In some implementations, the affinity molecule is biotin, avidin, streptavidin, or any similar molecule.
[00123] In some cases, the reporter comprises a substrate-nucleic acid. The substrate can be sequestered from its cognate enzyme when present as in the substrate-nucleic acid, but then is released from the nucleic acid upon cleavage, where the released substrate can contact the cognate enzyme to produce a detectable signal. Often, the substrate is sucrose and the cognate enzyme is invertase, and a DNS reagent can be used to monitor invertase activity.
[00124] A reporter can be a hybrid nucleic acid reporter. A hybrid nucleic acid reporter comprises a nucleic acid with at least one deoxyribonucleotide and at least one ribonucleotide. In some embodiments, the nucleic acid of the hybrid nucleic acid reporter is of any length and/or has any mixture of DNAs and RNAs. For example, in some cases, longer stretches of DNA can be interrupted by a few ribonucleotides. Alternatively, longer stretches of RNA can be interrupted by a few deoxyribonucleotides. Alternatively, every other base in the nucleic acid can alternate between ribonucleotides and deoxyribonucleotides. A major advantage of the hybrid nucleic acid reporter is increased stability as compared to a pure RNA nucleic acid reporter. For example, a hybrid nucleic acid reporter can be more stable in solution, lyophilized, or vitrified as compared to a pure DNA or pure RNA reporter.
[00125] Modified Nucleic Acids.
[00126] In some cases, a reporter and/or guide nucleic acid comprises one or more modifications, e.g., a base modification, a backbone modification, a sugar modification, etc., to provide the nucleic acid with anew or enhanced feature (e.g., improved stability).
Examples of suitable modifications include modified nucleic acid backbones and non-natural intemucleoside linkages. Other suitable modifications include nucleic acid mimetics. The nucleic acids described herein can include one or more substituted sugar moieties. The nucleic acids described herein can include nucleobase modifications or substitutions.
[00127] The nucleic acids described and referred to herein can comprise a plurality of base pairs. As used herein, a base pair refers to a biological unit comprising two nucleobases bound to each other by hydrogen bonds. Nucleobases can comprise adenine, guanine, cytosine, thymine, and/or uracil. In some cases, the nucleic acids described and referred to herein can comprise different base pairs. In some cases, the nucleic acids described and referred to herein can comprise one or more modified base pairs. The one or more modified base pairs can be produced when one or more base pairs undergo a chemical modification leading to new bases. The one or more modified base pairs can be, for example, Hypoxanthine, Inosine, Xanthine, Xanthosine, 7-Methylguanine, 7-Methylguanosine, 5,6- Dihydrouracil, Dihydrouridine, 5 -Methylcytosine, 5-Methylcytidine, 5- hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC), or 5-carboxylcytosine (5caC).
[00128] Target Loci.
[00129] As used herein, the term “locus” (e.g. , target locus) refers to one or more nucleotides that map to a reference sequence, such as a genome. In some embodiments, a locus refers to a position (e.g., a site) within a reference sequence, e.g., on a particular chromosome of a genome. In some embodiments, a locus refers to a single nucleotide position within a reference sequence, e.g., on a particular chromosome within a genome. In some embodiments, a locus refers to a group of nucleotide positions within a reference sequence. In some instances, a locus is defined by a mutation (e.g., substitution, insertion, deletion, inversion, or translocation) of consecutive nucleotides within a reference sequence. In some instances, a locus is defined by a gene, a sub-genic structure (e.g., a regulatory element, exon, intron, or combination thereof), or a predefined span of a chromosome. In some embodiments, a locus (e.g. , a target locus) is represented in a biological sample by one or more nucleic acid molecules (e.g., target nucleic acid molecules) in the biological sample, where all or a portion of the one or more nucleic acid molecules map to the respective locus.
[00130] In some cases, a locus is defined by a biomarker and/or antimicrobial resistance gene that maps to the position of the respective locus. For instance, in some embodiments, a respective target locus comprises a plurality of alleles including an allele having a mutation that confers resistance to a microorganism against antimicrobial interventions that target the respective locus (e.g., antibacterial resistance, antiprotozoal resistance, antifungal resistance, antihelminthic resistance, and/or antiviral resistance). In some such embodiments, a respective target locus comprises a genetic marker for the target locus that indicates a resistance to antimicrobial interventions. Examples of antimicrobial resistance markers (e.g., genes and/or amino acid residues) include, but are not limited to, the antimicrobial resistance markers listed below in, e.g, Capela et al., 2019, “An Overview of Drug Resistance in Protozoal Diseases,” Int J Mol Sci. 20(22): 5748; doi: 10.3390/ijms20225748; Beech et al., 2011, “Anthelmintic resistance: markers for resistance, or susceptibility?” Parasitology 138(2): 160-174; doi: 10.1017/S0031182010001198; and Toledu-Rueda e/ a/., 2018, “Antiviral resistance markers in influenza virus sequences in Mexico, 2000-2017,” Infect Drug Resist 11: 1751-1756; doi: 10.2147/IDR.S153154, each of which is hereby incorporated herein by reference in its entirety.
[00131] Target Nucleic Acids.
[00132] Disclosed herein are systems and methods for detecting a target nucleic acid and/or distinguishing between different target nucleic acids. In some instances, the target nucleic acid is a single stranded nucleic acid. Alternatively, or in combination, the target nucleic acid is a double stranded nucleic acid and is prepared into single stranded nucleic acids before or upon contacting the programmable nuclease-based detection reagents (e.g, programmable nuclease, guide nucleic acid, and/or reporter). In some embodiments, the target nucleic acid is a double stranded nucleic acid. In some embodiments, the double stranded nucleic acid is DNA. In some implementations, the target nucleic acid is an RNA. Target nucleic acids include but are not limited to mRNA, rRNA, tRNA, non-coding RNA, long non-coding RNA, and microRNA (miRNA). In some instances, the target nucleic acid is complementary DNA (cDNA) synthesized from a single-stranded RNA template in a reaction catalyzed by a reverse transcriptase. In some cases, the target nucleic acid is single-stranded RNA (ssRNA) or mRNA. In some cases, the target nucleic acid is from a virus, a parasite, or a bacterium described herein.
[00133] In some cases, the target nucleic acid comprises 5 to 100, 5 to 90, 5 to 80, 5 to 70, 5 to 60, 5 to 50, 5 to 40, 5 to 30, 5 to 25, 5 to 20, 5 to 15, or 5 to 10 nucleotides in length. In some cases, the target nucleic acid comprises 10 to 90, 20 to 80, 30 to 70, or 40 to 60 nucleotides in length. In some instances, the target nucleic acid sequence can be from 10 to 95, from 20 to 95, from 30 to 95, from 40 to 95, from 50 to 95, from 60 to 95, from 10 to 75, from 20 to 75, from 30 to 75, from 40 to 75, from 50 to 75, from 5 to 50, from 15 to 50, from 25 to 50, from 35 to 50, or from 45 to 50 nucleotides in length. In some cases, the target nucleic acid comprises 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 45, 50, 60, 70, 80, 90, or 100 nucleotides in length. In some instances, the target nucleic acid comprises at least 10, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, or at least 100 nucleotides in length. The target nucleic acid can be reverse complementary to a guide nucleic acid. In some cases, at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 45, 50, 60, 70, 80, 90, or 100 nucleotides of a guide nucleic acid can be reverse complementary to a target nucleic acid.
[00134] A target nucleic acid can be an amplified nucleic acid of interest. The nucleic acid of interest can be any nucleic acid disclosed herein or from any sample as disclosed herein. The nucleic acid of interest can be an RNA that is reverse transcribed before amplification. In some embodiments, the nucleic acid of interest is amplified, and then the amplicons are transcribed into RNA.
[00135] In some embodiments, target nucleic acids activate a programmable nuclease to initiate sequence-independent cleavage of a nucleic acid-based reporter (e.g., a reporter comprising an RNA sequence, or a reporter comprising DNA and RNA). For example, a programmable nuclease of the present disclosure is activated by a target nucleic acid to cleave reporters having an RNA (also referred to herein as an “RNA reporter”). Alternatively, a programmable nuclease of the present disclosure is activated by a target RNA to cleave reporters having an RNA (also referred to herein as a “RNA reporter”). The RNA reporter can comprise a single-stranded RNA labeled with a detection moiety or can be any RNA reporter as disclosed herein.
[00136] In some embodiments, the target nucleic acid as described in the methods herein does not initially comprise a PAM sequence. However, any target nucleic acid of interest can be generated using the methods described herein to comprise a PAM sequence, and thus be a PAM target nucleic acid. A PAM target nucleic acid, as used herein, refers to a target nucleic acid that has been amplified to insert a PAM sequence that is recognized by a CRISPR/Cas system.
[00137] In some embodiments, the target nucleic acid is in a cell. In some embodiments, the cell is a single-cell eukaryotic organism; a plant cell an algal cell; a fungal cell; an animal cell; a cell an invertebrate animal; a cell a vertebrate animal such as fish, amphibian, reptile, bird, and mammal; or a cell a mammal such as a human, a non-human primate, an ungulate, a feline, a bovine, an ovine, and a caprine. In preferred embodiments, the cell is a eukaryotic cell. In preferred embodiments, the cell is a mammalian cell, a human cell, or a plant cell.
[00138] In some embodiments, the target nucleic acid sequence comprises a nucleic acid sequence of a virus, a bacterium, or other pathogen responsible for a disease in a plant (e.g, a crop). Methods and compositions of the disclosure can be used to treat or detect a disease in a plant. For example, the methods of the disclosure can be used to target a viral nucleic acid sequence in a plant. In some embodiments, a programmable nuclease of the disclosure (e.g, Cast 4) cleaves the viral nucleic acid. In some embodiments, the target nucleic acid sequence comprises a nucleic acid sequence of a virus or a bacterium or other agents (e.g, any pathogen) responsible for a disease in the plant (e.g, a crop). In some embodiments, the target nucleic acid comprises RNA. The target nucleic acid, in some cases, is a portion of a nucleic acid from a virus or a bacterium or other agents responsible for a disease in the plant (e.g, a crop). In some cases, the target nucleic acid is a portion of a nucleic acid from a genomic locus, or any NA amplicon, such as a reverse transcribed mRNA or a cDNA from a gene locus, a transcribed mRNA, or a reverse transcribed cDNA from a gene locus in at a virus or a bacterium or other agents (e.g, any pathogen) responsible for a disease in the plant (e.g, a crop). A virus infecting the plant can be an RNA virus. A virus infecting the plant can be a DNA virus. Non-limiting examples of viruses that can be targeted with the disclosure include Tobacco mosaic virus (TMV), Tomato spotted wilt virus (TSWV), Cucumber mosaic virus (CMV), Potato virus Y (PVY), Cauliflower mosaic virus (CaMV) (RT virus), Plum pox virus (PPV), Brome mosaic virus (BMV) and Potato virus X (PVX).
[00139] Mutations
[00140] In some instances, target nucleic acids comprise a mutation. In some instances, a sequence comprising a mutation is modified to a wild-type sequence with a composition, system or method described herein. In some instances, a sequence comprising a mutation is detected with a composition, system or method described herein. The mutation can be a mutation of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more nucleotides. Non-limiting examples of mutations are insertion-deletion (indel), single nucleotide polymorphism (SNP), and frameshift mutations. In some instances, guide nucleic acids described herein hybridize to a region of the target nucleic acid comprising the mutation. For example, in some embodiments, a target locus comprises a plurality of alleles including a first allele ( .g., a wild-type allele) and a second allele (e.g, a mutant allele), where the first allele and the second allele differ by a mutation in the nucleic acid sequence of the respective locus, and the respective guide nucleic acid for each respective allele is a nucleic acid that is reverse complementary to the respective allele. In some embodiments, the mutation is located in a non-coding region or a coding region of a gene.
[00141] In some instances, target nucleic acids comprise a mutation, where the mutation is a SNP. The single nucleotide mutation or SNP can be associated with a phenotype of the sample or a phenotype of the organism from which the sample was taken. The SNP, in some embodiments, is associated with altered phenotype from wild-type phenotype. The SNP can be a synonymous substitution or a nonsynonymous substitution. The nonsynonymous substitution can be a missense substitution or a nonsense point mutation. The synonymous substitution can be a silent substitution. The mutation can be a deletion of one or more nucleotides. Often, the single nucleotide mutation, SNP, or deletion is associated with a disease such as cancer or a genetic disorder. The mutation, such as a single nucleotide mutation, a SNP, or a deletion, can be encoded in the sequence of a target nucleic acid from the germline of an organism or can be encoded in a target nucleic acid from a diseased cell, such as a cancer cell.
[00142] In some instances, target nucleic acids comprise a mutation, where the mutation is a deletion of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more nucleotides. The mutation can be a deletion of about 5, about 10, about 15, about 20, about 25, about 30, about 35, about 40, about 45, about 50, about 55, about 60, about 65, about 70, about 75, about 80, about 85, about 90, about 95, about 100, about 200, about 300, about 400, about 500, about 600, about 700, about 800, about 900, or about 1000 nucleotides. The mutation can be a deletion of 1 to 5, 5 to 10, 10 to 15, 15 to 20, 20 to 25, 25 to 30, 30 to 35, 35 to 40, 40 to 45, 45 to 50, 50 to 55, 55 to 60, 60 to 65, 65 to 70, 70 to 75, 75 to 80, 80 to 85, 85 to 90, 90 to 95, 95 to 100, 100 to 200, 200 to 300, 300 to 400, 400 to 500, 500 to 600, 600 to 700, 700 to 800, 800 to 900, 900 to 1000, 1 to 50, 1 to 100, 25 to 50, 25 to 100, 50 to 100, 100 to 500, 100 to 1000, or 500 to 1000 nucleotides.
[00143] As used herein, the term “single nucleotide polymorphism” or “SNP,” refers to the variation of a single nucleotide or nucleobase at a specific position in a nucleic acid sequence. The single nucleotide or nucleobase variation is generally between the genomes of two members of the same species, or some other specific population. In some cases, a SNP occurs at a specific nucleic acid site in genomic DNA in which different alternative sequences, e.g., “alleles,” exist more frequently in certain member of a population. In some cases, a less frequent allele comprises the SNP and has an abundance of at least 1%, 0.8%, 0.5%, 0.4%, 0.3%, 0.2% or 0.1%. In some instances, a SNP is any point mutation that is sufficiently present in a population (e.g, 1%, 0.8%, 0.5%, 0.4%, 0.3%, 0.2%, 0.1% or more). A SNP can be disease-causing, at least partially, or can be associated with a disease. SNPs are known to skilled artisans and can be located in relevant published papers and genomic databases.
[00144] As used herein, the term “allele” refers to a variant of a given gene and can be the result of a SNP or any sequence variation from a reference sequence. From any number of different alleles of a given gene, an allelic frequency can be determined, thereby providing the fraction of all chromosomes that carry a particular allele of a gene within a specific population.
[00145] As used herein, the term “wild-type,” or “wild-type variant” refers to a segment or region of nucleic acid sequence, or fragment thereof, that is the universal form (e.g, present in at least 40%) within a population. In some embodiments, “wild-type,” or “wild-type variant” refers to a segment or region of nucleic acid sequence, or fragment thereof, lacking commonly known sequence variations or allelic variations which can be silent, causal, disease-associated, or disease-risk causing. In some embodiments, “wild-type,” or “wild-type variant” refers to a to a polypeptide or protein expressed by a naturally occurring organism, or a polypeptide or protein having the characteristics of a polypeptide or protein isolated from a naturally occurring organism, where the polypeptide or protein is relatively constant (e.g, present in at least 40% of) a species population.
[00146] In some instances, mutations are associated with a disease, that is the mutation in a subject indicates that the subject is susceptible to, or suffers from, a disease, disorder, or pathological state. In some examples, a mutation associated with a disease refers to a mutation which causes the disease, contributes to the development of the disease, or indicates the existence of the disease. A mutation associated with a disease can also refer to any mutation which generates transcription or translation products at an abnormal level, or in an abnormal form, in cells affected by a disease relative to a control without the disease. Nonlimiting examples of diseases associated with mutations are hemophilia, sickle cell anemia, P-thalassemia, Duchene muscular dystrophy, severe combined immunodeficiency (SCID, also known as “bubble boy syndrome”), Huntington’s disease, cystic fibrosis, and various cancers. [00147] Samples
[00148] The systems and methods of the present disclosure can be used to detect one or more target sequences or nucleic acids in one or more samples. The one or more samples can comprise one or more target sequences or nucleic acids for detection of an ailment, such as a disease, cancer, or genetic disorder, or genetic information, such as for phenotyping, genotyping, or determining ancestry and are compatible with the reagents and support mediums as described herein. Generally, a sample can be taken from any place where a nucleic acid can be found. Samples can be taken from an individual/human, a non-human animal, or a crop, or an environmental sample can be obtained to test for presence of a disease, virus, pathogen, cancer, genetic disorder, or any mutation or pathogen of interest. A biological sample can be blood, serum, plasma, lung fluid, exhaled breath condensate, saliva, spit, urine, stool, feces, mucus, lymph fluid, peritoneal , cerebrospinal fluid, amniotic fluid, breast milk, gastric secretions, bodily discharges, secretions from ulcers, pus, nasal secretions, sputum, pharyngeal exudates, urethral secretions/mucus, vaginal secretions/mucus, anal secretion/mucus, semen, tears, an exudate, an effusion, tissue fluid, interstitial fluid (e.g, tumor interstitial fluid), cyst fluid, tissue, or, in some instances, any combination thereof. A sample can be an aspirate of a bodily fluid from an animal (e.g, human, animals, livestock, pet, etc.) or plant. A tissue sample can be from any tissue that can be infected or affected by a pathogen (e.g, a wart, lung tissue, skin tissue, and the like). A tissue sample (e.g, from animals, plants, or humans) can be dissociated or liquified prior to application to detection system of the present disclosure. A sample can be from a plant (e.g, a crop, a hydroponically grown crop or plant, and/or house plant). Plant samples can include extracellular fluid, from tissue (e.g, root, leaves, stem, trunk etc.). A sample can be taken from the environment immediately surrounding a plant, such as hydroponic fluid/ water, or soil. A sample from an environment can be from soil, air, or water. In some instances, the environmental sample is taken as a swab from a surface of interest or taken directly from the surface of interest. In some instances, the raw sample is applied to the detection system. In some instances, the sample is diluted with a buffer or a fluid or concentrated prior to application to the detection system. In some cases, the sample is contained in no more than about 200 nanoliters (nL). In some cases, the sample is contained in about 200 nL. In some cases, the sample is contained in a volume that is greater than about 200 nL and less than about 20 microliters (pL). In some cases, the sample is contained in no more than 20 pl. In some cases, the sample is contained in no more than 1, 5, 10, 15, 20, 25, 30, 35 40, 45, 50, 55, 60, 65, 70, 75, 80, 90, 100, 200, 300, 400, 500 pl, or any of value from 1 pl to 500 pl. In some cases, the sample is contained in from 1 pL to 500 pL, from 10 pL to 500 pL, from 50 pL to 500 pL, from 100 pL to 500 pL, from 200 pL to 500 pL, from 300 pL to 500 pL, from 400 pL to 500 pL, from 1 pL to 200 pL, from 10 pL to 200 pL, from 50 pL to 200 pL, from 100 pL to 200 pL, from 1 pL to 100 pL, from 10 pL to 100 pL, from 50 pL to 100 pL, from 1 pL to 50 pL, from 10 pL to 50 pL, from 1 pL to 20 pL, from 10 pL to 20 pL, or from 1 pL to 10 pL. In some embodiments, the sample is contained in more than 500 pl.
[00149] In some instances, the sample is taken from a single-cell eukaryotic organism; a plant or a plant cell; an algal cell; a fungal cell; an animal or an animal cell, tissue, or organ; a cell, tissue, or organ from an invertebrate animal; a cell, tissue, fluid, or organ from a vertebrate animal such as fish, amphibian, reptile, bird, and mammal; a cell, tissue, fluid, or organ from a mammal such as a human, a non-human primate, an ungulate, a feline, a bovine, an ovine, and a caprine. In some instances, the sample is taken from nematodes, protozoans, helminths, or malarial parasites. In some cases, the sample can comprise nucleic acids from a cell lysate from a eukaryotic cell, a mammalian cell, a human cell, a prokaryotic cell, or a plant cell. In some cases, the sample can comprise nucleic acids expressed from a cell.
[00150] The sample used for phenotyping testing can comprise at least one target nucleic acid segment that can bind to a guide nucleic acid of the reagents described herein. The target nucleic acid segment, in some cases, is a portion of a nucleic acid from a gene associated with a phenotypic trait.
[00151] The sample used for genotyping testing can comprise at least one target nucleic acid segment that can bind to a guide nucleic acid of the reagents described herein. The target nucleic acid segment, in some cases, is a portion of a nucleic acid from a gene associated with a genotype.
[00152] The sample used for ancestral testing can comprise at least one target nucleic acid segment that can bind to a guide nucleic acid of the reagents described herein. The target nucleic acid segment, in some cases, is a portion of a nucleic acid from a gene associated with a geographic region of origin or ethnic group.
[00153] The sample can be used for identifying a disease status. For example, a sample is any sample described herein, and is obtained from a subject for use in identifying a disease status of a subject. The disease can be a cancer or genetic disorder. In some embodiments, a method may comprise obtaining a serum sample from a subject; and identifying a disease status of the subject. Often, the disease status is prostate disease status. In any of the embodiments described herein, the device can be configured for asymptomatic, pre- symptomatic, and/or symptomatic diagnostic applications, irrespective of immunity. In any of the embodiments described herein, the device can be configured to perform one or more serological assays on a sample (e.g, a sample comprising blood).
[00154] In some cases, the target sequence is a portion of a nucleic acid from a virus or a bacterium or other agents responsible for a disease in the sample. The target sequence, in some cases, is a portion of a nucleic acid from a sexually transmitted infection or a contagious disease, in the sample. The target sequence, in some cases, is a portion of a nucleic acid from an upper respiratory tract infection, a lower respiratory tract infection, or a contagious disease, in the sample. The target sequence, in some cases, is a portion of a nucleic acid from a hospital acquired infection or a contagious disease, in the sample. The target sequence, in some cases, is a portion of a nucleic acid from sepsis, in the sample. These diseases can include but are not limited to respiratory viruses (e.g, SARS-CoV-2 (i.e., a virus that causes COVID- 19), SARS-CoV-1, MERS-CoV, influenza, Adenovirus, Coronavirus HKU1, Coronavirus NL63, Coronavirus 229E, Coronavirus OC43, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), Human Metapneumovirus (hMPV), Human Rhinovirus (HRVs A, B, C), Human Enterovirus, Influenza A, Influenza A/Hl, Influenza A/H2, Influenza A/H3, Influenza A/H4, Influenza A/H5, Influenza A/H6, Influenza A/H7, Influenza A/H8, Influenza A/H9, Influenza A/H10, Influenza A/Hl 1, Influenza A/H12, Influenza A/H13, Influenza A/H14, Influenza A/H15, Influenza A/H16, Influenza A/Hl- 2009, Influenza A/Nl, Influenza A/N2, Influenza A/N3, Influenza A/N4, Influenza A/N5, Influenza A/N6, Influenza A/N7, Influenza A/N8, Influenza A/N9, Influenza A/N 10, Influenza A/N 11 , oseltamivir-resistant Influenza A, Influenza B, Influenza B - Victoria V 1 , Influenza B - Yamagata Yl, Influenza C, Parainfluenza Virus 1, Parainfluenza Virus 2, Parainfluenza Virus 3, Parainfluenza Virus 4, Respiratory Syncytial Virus A, Respiratory Syncytial Virus B) and respiratory bacteria (e.g, Bordetella parapertussis, Bordetella pertussis, Bordetella bronchiseptica, Bordetella holmesii, Chlamydia pneumoniae, Mycoplasma pneumoniae). Other viruses include human immunodeficiency virus (HIV), human papillomavirus (HPV), chlamydia, gonorrhea, syphilis, trichomoniasis, sexually transmitted infection, malaria, Dengue fever, Ebola, chikungunya, and leishmaniasis.
Pathogens include viruses, fungi, helminths, protozoa, malarial parasites, Plasmodium parasites, Toxoplasma parasites, and Schistosoma parasites. Helminths include roundworms, heartworms, and phytophagous nematodes, flukes, Acanthocephala, and tapeworms. Protozoan infections include infections from Giardia spp., Trichomonas spp., African trypanosomiasis, amoebic dysentery, babesiosis, balantidial dysentery, Chagas disease, coccidiosis, malaria and toxoplasmosis. Examples of pathogens such as parasitic/protozoan pathogens include, but are not limited to: Plasmodium falciparum, P. vivax, Trypanosoma cruzi and Toxoplasma gondii. Fungal pathogens include, but are not limited to Cryptococcus neoformans, Histoplasma capsulatum, Coccidioides immitis, Blastomyces dermatitidis, Chlamydia trachomatis, Chlamydia pneumoniae, Chlamydia psittaci, and Candida albicans. Pathogenic viruses include but are not limited to: respiratory viruses (e.g., adenoviruses, parainfluenza viruses, severe acute respiratory syndrome (SARS), coronavirus, MERS), gastrointestinal viruses (e.g., noroviruses, rotaviruses, some adenoviruses, astroviruses), exanthematous viruses (e.g., the virus that causes measles, the virus that causes rubella, the virus that causes chickenpox/shingles, the virus that causes roseola, the virus that causes smallpox, the virus that causes fifth disease, chikungunya virus infection); hepatic viral diseases (e.g., hepatitis A, B, C, D, E); cutaneous viral diseases (e.g., warts (including genital, anal), herpes (including oral, genital, anal), molluscum contagiosum); hemmorhagic viral diseases (e.g., Ebola, Lassa fever, dengue fever, yellow fever, Marburg hemorrhagic fever, Crimean-Congo hemorrhagic fever); neurologic viruses (e.g., polio, viral meningitis, viral encephalitis, rabies), sexually transmitted viruses (e.g., HIV, HPV, and the like), immunodeficiency virus (e.g., HIV); influenza virus; dengue; West Nile virus; herpes virus; yellow fever virus; Hepatitis Virus C; Hepatitis Virus A; Hepatitis Virus B; papillomavirus; and the like. Pathogens include, e.g., HIV virus, Mycobacterium tuberculosis, Klebsiella pneumoniae, Acinetobacter baumannii, Bacillus anthracis, Bordetella pertussis, Burkholderia cepacia, Corynebacterium diphtheriae, Coxiella burnetii, Streptococcus agalactiae, methicillin-resistant Staphylococcus aureus, Legionella longbeachae, Legionella pneumophila, Leptospira interrogans, Moraxella catarrhalis, Streptococcus pyogenes, Escherichia coli, Neisseria gonorrhoeae, Neisseria meningitidis, Neisseria elongate, Neisseria gonorrhoeae, Parechovirus, Pneumococcus, Pneumocystis jirovecii, Cryptococcus neoformans, Histoplasma capsulatum, Haemophilus influenzae B, Treponema pallidum, Lyme disease spirochetes, Pseudomonas aeruginosa, Mycobacterium leprae, Brucella abortus, rabies virus, influenza virus, cytomegalovirus, herpes simplex virus I, herpes simplex virus II, human serum parvo-like virus, respiratory syncytial virus (RSV), M. genitalium, T. Vaginalis, varicella-zoster virus, hepatitis B virus, hepatitis C virus, measles virus, adenovirus, human T-cell leukemia viruses, Epstein-Barr virus, murine leukemia virus, mumps virus, vesicular stomatitis virus, Sindbis virus, lymphocytic choriomeningitis virus, wart virus, blue tongue virus, Sendai virus, feline leukemia virus, Reovirus, polio virus, simian virus 40, mouse mammary tumor virus, dengue virus, rubella virus, West Nile virus, Plasmodium falciparum, Plasmodium vivax, Toxoplasma gondii, Trypanosoma rangeli, Trypanosoma cruzi, Trypanosoma rhodesiense, Trypanosoma brucei, Schistosoma mansoni, Schistosoma japonicum, Babesia bovis, Eimeria tenella, Onchocerca volvulus, Leishmania tropica, Mycobacterium tuberculosis, Trichinella spiralis, Theileria parva, Taenia hydatigena, Taenia ovis, Taenia saginata, Echinococcus granulosus, Mesocestoides corti, Mycoplasma arthritidis, M. hyorhinis, M. orale, M. arginini, Acholeplasma laidlawii, M. salivarium, M. pneumoniae, Enter obacter cloacae, Kiebsiella aerogenes, Proteus vulgaris, Serratia macesens, Enterococcus faecalis, Enterococcus faecium, Streptococcus intermdius, Streptococcus pneumoniae, and Streptococcus pyogenes. Often the target nucleic acid comprises a sequence from a virus or a bacterium or other agents responsible for a disease that can be found in the sample. In some cases, the target nucleic acid is a portion of a nucleic acid from a genomic locus, a transcribed mRNA, or a reverse transcribed cDNA from a gene locus in at least one of: human immunodeficiency virus (HIV), human papillomavirus (HPV), chlamydia, gonorrhea, syphilis, trichomoniasis, sexually transmitted infection, malaria, Dengue fever, Ebola, chikungunya, and leishmaniasis. Pathogens include viruses, fungi, helminths, protozoa, malarial parasites, Plasmodium parasites, Toxoplasma parasites, and Schistosoma parasites. Helminths include roundworms, heartworms, and phytophagous nematodes, flukes, Acanthocephala, and tapeworms. Protozoan infections include infections from Giardia spp., Trichomonas spp., African trypanosomiasis, amoebic dysentery, babesiosis, balantidial dysentery, Chagas disease, coccidiosis, malaria and toxoplasmosis. Examples of pathogens such as parasitic/protozoan pathogens include, but are not limited to: Plasmodium falciparum, P. vivax, Trypanosoma cruzi and Toxoplasma gondii. Fungal pathogens include, but are not limited to Cryptococcus neoformans, Histoplasma capsulatum, Coccidioides immitis, Blastomyces dermatitidis , Chlamydia trachomatis, and Candida albicans. Pathogenic viruses include but are not limited to immunodeficiency virus (e.g., HIV); influenza virus; dengue; West Nile virus; herpes virus; yellow fever virus; Hepatitis Virus C; Hepatitis Virus A; Hepatitis Virus B; papillomavirus; and the like. Pathogens include, e.g., HIV virus, Mycobacterium tuberculosis, Streptococcus agalactiae, methicillin- resistant Staphylococcus aureus, Staphylococcus epidermidis, Legionella pneumophila, Streptococcus pyogenes, Streptococcus salivarius, Escherichia coli, Neisseria gonorrhoeae, Neisseria meningitidis, Pneumococcus, Cryptococcus neoformans, Histoplasma capsulatum, Hemophilus influenzae B, Treponema pallidum, Lyme disease spirochetes, Pseudomonas aeruginosa, Mycobacterium leprae, Brucella abortus, rabies virus, influenza virus, cytomegalovirus, herpes simplex virus I, herpes simplex virus II, human serum parvo-like virus, respiratory syncytial virus (RSV), Alphacoronavirus, Betacoronavirus, Sarbecovirus, SARS-related virus, Gammacoronavirus, Deltacoronavirus, M. genitalium, T. vaginalis, varicella-zoster virus, hepatitis B virus, hepatitis C virus, measles virus, human adenovirus (type A, B, C, D, E, F, G), human T-cell leukemia viruses, Epstein-Barr virus, murine leukemia virus, mumps virus, vesicular stomatitis virus, Sindbis virus, lymphocytic choriomeningitis virus, wart virus, blue tongue virus, Sendai virus, feline leukemia virus, Reovirus, polio virus, simian virus 40, mouse mammary tumor virus, dengue virus, rubella virus, West Nile virus, Human Bocavirus, Plasmodium falciparum, Plasmodium vivax, Toxoplasma gondii, Trypanosoma rangeli, Trypanosoma cruzi, Trypanosoma rhodesiense, Trypanosoma brucei, Schistosoma mansoni, Schistosoma japonicum, Babesia bovis, Eimeria tenella, Onchocerca volvulus, Leishmania tropica, Mycobacterium tuberculosis, Trichinella spiralis, Theileria parva, Taenia hydatigena, Taenia ovis, Taenia saginata, Echinococcus granulosus, Mesocestoides corti, Mycoplasma arthritidis, M. hyorhinis, M. orale, M. arginini, Acholeplasma laidlawii, M. salivarium and AT. pneumoniae. SARS-CoV-2 Variants include Coronavirus HKU1, Coronavirus NL63, Coronavirus 229E, Coronavirus OC43, SARS-CoV-2 85 A, SARS-CoV-2 T1001I, SARS-CoV-2 3675-3677A, SARS-CoV-2 P4715L, SARS-CoV-2 S5360L, SARS-CoV-2 69-70A, SARS-CoV-2 Tyrl44fs, SARS-CoV- 2242-244A, SARS-CoV-2 Y453F, SARS-CoV-2 S477N, SARS-CoV-2 E848K, SARS-CoV- 2 N501Y, SARS-CoV-2 D614G, SARS-CoV-2 P681R, SARS-CoV-2 P681H, SARS-CoV-2 L21F, SARS-CoV-2 Q27Stop, SARS-CoV-2 Mlfs, and SARS-CoV-2 R203fs. In some cases, the target sequence is a portion of a nucleic acid from a genomic locus, a transcribed mRNA, or a reverse transcribed cDNA from a gene locus of bacterium or other agents responsible for a disease in the sample comprising a mutation that confers resistance to a treatment, such as a single nucleotide mutation that confers resistance to antibiotic treatment.
[00155] In some instances, the target sequence is a portion of a nucleic acid from a subject having cancer. The cancer can be a solid cancer (tumor). The cancer can be a blood cell cancer, including leukemias and lymphomas. Non-limiting types of cancer that could be treated with such methods and compositions include colon cancer, rectal cancer, renal-cell carcinoma, liver cancer, bladder cancer, cancer of the kidney or ureter, lung cancer, cancer of the small intestine, esophageal cancer, melanoma, bone cancer, pancreatic cancer, skin cancer, brain cancer (e.g., glioblastoma), cancer of the head or neck, melanoma, uterine cancer, ovarian cancer, breast cancer, testicular cancer, cervical cancer, stomach cancer, Hodgkin’s Disease, non-Hodgkin’s lymphoma, thyroid cancer. The cancer can be a leukemia, such as, by way of non-limiting example, acute myeloid (or myelogenous) leukemia (AML), chronic myeloid (or myelogenous) leukemia (CML), acute lymphocytic (or lymphoblastic) leukemia (ALL), and chronic lymphocytic leukemia (CLL).
[00156] In some instances, the target sequence is a portion of a nucleic acid from a cancer cell. A cancer cell can be a cell harboring one or more mutations that results in unchecked proliferation of the cancer cell. Such mutations are known in the art. Non-limiting examples of antigens are ADRB3, AKAP-4,ALK, Androgen receptor, B7H3, BCMA, BORIS, BST2, CAIX, CD 179a, CD123, CD171, CD19, CD20, CD22, CD24, CD30, CD300LF, CD33, CD38, CD44v6, CD72, CD79a, CD79b, CD97, CEA, CLDN6, CLEC12A, CLL-1, CS-1, CXORF61, CYP1B1, Cyclin B 1, E7, EGFR, EGFRvIII, ELF2M, EMR2, EPC AM, ERBB2 (Her2/neu), ERG (TMPRSS2 ETS fusion gene), ETV6-AML, EphA2, Ephrin B2, FAP, FCAR, FCRL5, FLT3, Folate receptor alpha, Folate receptor beta, Fos-related antigen 1, Fucosyl GM1, GD2, GD3, GM3, GPC3, GPR20, GPRC5D, GloboH, HAVCR1, HMWMAA, HPV E6, IGF-I receptor, IL-13Ra2, IL-1 IRa, KIT, LAGE-la, LAIR1, LCK, LILRA2, LMP2, LY6K, LY75, LewisY, MAD-CT-1, MAD-CT-2, MAGE Al, MAGE-A1, ML-IAP, MUC1, MYCN, MelanA/MARTl, Mesothelin, NA17, NCAM, NY-BR-1, NY-ESO-1, OR51E2, OY- TES 1, PANX3, PAP, PAX3, PAX5, PCTA-l/Galectin 8, PDGFR-beta, PLAC1, PRSS21, PSCA, PSMA, Polysialic acid, Prostase, RAGE-1, ROR1, RU1, RU2, Ras mutant, RhoC, SART3, SSEA-4, SSX2, TAG72, TARP, TEM1/CD248, TEM7R, TGS5, TRP-2, TSHR, Tie 2, Tn Ag, UPK2, VEGFR2, WT1, XAGE1, and IGLL1.
[00157] In some cases, the target sequence is a portion of a nucleic acid from a control gene in a sample. In some embodiments, the control gene is an endogenous control. The endogenous control can include human 18S rRNA, human GAPDH, human HPRT1, human GUSB, human RNase P, MS2 bacteriophage, or any other control sequence of interest within the sample.
[00158] The sample used for cancer testing or cancer risk testing can comprise at least one target sequence or target nucleic acid segment that can bind to a guide nucleic acid of the reagents described herein. The target nucleic acid segment, in some cases, is a portion of a nucleic acid from a gene with a mutation associated with cancer, from a gene whose overexpression is associated with cancer, a tumor suppressor gene, an oncogene, a checkpoint inhibitor gene, a gene associated with cellular growth, a gene associated with cellular metabolism, or a gene associated with cell cycle. In some embodiments, the target nucleic acid encodes for a cancer biomarker, such as a prostate cancer biomarker or non-small cell lung cancer. In some cases, the assay can be used to detect “hotspots” in target nucleic acids that can be predictive of cancer, such as lung cancer, cervical cancer, in some cases, the cancer can be a cancer that is caused by a virus. Some non-limiting examples of viruses that cause cancers in humans include Epstein-Barr virus (e.g, Burkitt’s lymphoma, Hodgkin’s Disease, and nasopharyngeal carcinoma); papillomavirus (e.g, cervical carcinoma, anal carcinoma, oropharyngeal carcinoma, penile carcinoma); hepatitis B and C viruses (e.g, hepatocellular carcinoma); human adult T-cell leukemia virus type 1 (HTLV-1) (e.g, T-cell leukemia); and Merkel cell polyomavirus (e.g, Merkel cell carcinoma). One skilled in the art will recognize that viruses can cause or contribute to other types of cancers. In some cases, the target nucleic acid is a portion of a nucleic acid that is associated with a blood fever. In some instances, the mutation is located in a portion of a nucleic acid from a genomic locus, any DNA amplicon of, a reverse transcribed mRNA, or a cDNA from a locus of at least one of: ALK, APC, ATM, AXIN2, BAP1, BARD1, BLM, BMPR1A, BRCA1, BRCA2, BRIP1, CASR, CDC73, CDH1, CDK4, CDKN1B, CDKN1C, CDKN2A, CEBPA, CHEK2, CTNNA1, DICER1, DIS3L2, EGFR, EPCAM, FH, FLCN, GATA2, GPC3, GREM1, HOXB13, HRAS, system, MAX, MEN1, MET, MITF, MLH1, MSH2, MSH3, MSH6, MUTYH, NBN, NF1, NF2, NTHL1, PALB2, PDGFRA, PHOX2B, PMS2, POLDI, POLE, POTI, PRKAR1A, PTCHI, PTEN, RAD50, RAD51C, RAD51D, RBI, RECQL4, RET, RUNX1, SDHA, SDHAF2, SDHB, SDHC, SDHD, SMAD4, SMARCA4, SMARCB1, SMARCE1, STK11, SUFU, TERC, TERT, TMEM127, TP53, TSC1, TSC2, VHL, WRN, and WT1. In some instances, the mutation is associated with a blood disorder, e.g., a thalassemia or an anemia.
[00159] The sample used for genetic disorder testing can comprise at least one target sequence or target nucleic acid segment that can bind to a guide nucleic acid of the reagents described herein. In some embodiments, the genetic disorder is hemophilia, sickle cell anemia, P-thalassemia, Duchene muscular dystrophy, severe combined immunodeficiency, or cystic fibrosis. The target nucleic acid, in some cases, is from a gene with a mutation associated with a genetic disorder, from a gene whose overexpression is associated with a genetic disorder, from a gene associated with abnormal cellular growth resulting in a genetic disorder, or from a gene associated with abnormal cellular metabolism resulting in a genetic disorder. In some cases, the target nucleic acid is a nucleic acid from a genomic locus, a transcribed mRNA, or a reverse transcribed mRNA, a DNA amplicon of or a cDNA from a locus of at least one of: CFTR, FMRI, SMN1, ABCB11, ABCC8, ABCD1, ACAD9, AC ADM, ACADVL, AC ATI, ACOX1, ACSF3, ADA, ADAMTS2, ADGRG1, AGA, AGL, AGPS, AGXT, AIRE, ALDH3A2, ALDOB, ALG6, ALMS1, ALPL, AMT, AQP2, ARG1, ARSA, ARSB, ASL, ASNS, ASPA, ASS1, ATM, ATP6V1B1, ATP7A, ATP7B, ATRX, BBS1, BBS10, BBS12, BBS2, BCKDHA, BCKDHB, BCS1L, BLM, BSND, CAPN3, CBS, CDH23, CEP290, CERKL, CHM, CHRNE, CIITA, CLN3, CLN5, CLN6, CLN8, CLRN1, CNGB3, COL27A1, COL4A3, COL4A4, COL4A5, COL7A1, CPS1, CPT1A, CPT2, CRB1, CTNS, CTSK, CYBA, CYBB, CYP11B1, CYP11B2, CYP17A1, CYP19A1, CYP27A1, DBT, DCLRE1C, DHCR7, DHDDS, DLD, DMD, DNAH5, DNAI1, DNAI2, DYSF, EDA, EIF2B5, EMD, ERCC6, ERCC8, ESCO2, ETFA, ETFDH, ETHE1, EVC, EVC2, EYS, F9, FAH, FAM161A, FANCA, FANCC, FANCG, FH, FKRP, FKTN, G6PC, GAA, GALC, GALK1, GALT, GAMT, GBA, GBE1, GCDH, GFM1, GJB1, GJB2, GLA, GLB1, GLDC, GLE1, GNE, GNPTAB, GNPTG, GNS, GRHPR, HADHA, HAX1, HBA1„ HBA2, HBB, HEXA, HEXB, HGSNAT, HLCS, HMGCL, HOGA1, HPS1, HPS3, HSD17B4, HSD3B2, HYAL1, HYLS1, IDS, IDUA, IKBKAP, IL2RG, IVD, KCNJ11, LAMA2, LAMA3, LAMB3, LAMC2, LCA5, LDLR, LDLRAP1, LHX3, LIFR, LIPA, LOXHD1, LPL, LRPPRC, MAN2B1, MCOLN1, MED17, MESP2, MFSD8, MKS1, MLC1, MMAA, MMAB, MMACHC, MMADHC, MPI, MPL, MPV17, MTHFR, MTM1, MTRR, MTTP, MUT, MYO7A, NAGLU, NAGS, NBN, NDRG1, NDUFAF5, NDUFS6, NEB, NPC1, NPC2, NPHS1, NPHS2, NR2E3, NTRK1, OAT, OP A3, OTC, PAH, PC, PCCA, PCCB, PCDH15, PDHA1, PDHB, PEX1, PEX10, PEX12, PEX2, PEX6, PEX7, PFKM, PHGDH, PKHD1, PMM2, POMGNT1, PPT1, PROP1, PRPS1, PSAP, PTS, PUS1, PYGM, RAB23, RAG2, RAPSN, RARS2, RDH12, RMRP, RPE65, RPGRIP1L, RSI, RTEL1, SACS, SAMHD1, SEPSECS, SGCA, SGCB, SGCG, SGSH, SLC12A3, SLC12A6, SLC17A5, SLC22A5, SLC25A13, SLC25A15, SLC26A2, SLC26A4, SLC35A3, SLC37A4, SLC39A4, SLC4A11, SLC6A8, SLC7A7, SMARCAL1, SMPD1, STAR, SUMF1, TAT, TCIRG1, TECPR2, TFR2, TGM1, TH, TMEM216, TPP1, TRMU, TSFM, TTP A, TYMP, USH1C, USH2A, VPS13A, VPS13B, VPS45, VRK1, VSX2, WNT10A, XPA, XPC, and ZFYVE26.
[00160] Amplification Techniques
[00161] In some embodiments, target nucleic acids are amplified and/or comprise amplicons. The methods described herein can comprise amplifying a target nucleic acid molecule, and/or amplifying a nucleic acid molecule in a sample to produce a target nucleic acid molecule. Amplification can occur prior to or simultaneously with detection of a signal indicative of reporter cleavage. In some embodiments, amplification and detection can occur within the same reaction volume sequentially or simultaneously. In some embodiments, amplification and detection can occur sequentially in different reaction volumes. In some embodiments, amplification and detection can occur at different temperatures. In some embodiments, amplification and detection can occur at the same temperature. In some embodiments, amplifying can improve at least one of sensitivity, specificity, or accuracy of the detection of the target nucleic acid.
[00162] Amplification can be isothermal or can comprise thermocycling. In some embodiments, amplification comprises transcription mediated amplification (TMA), helicase dependent amplification (HD A), circular helicase dependent amplification (cHDA), strand displacement amplification (SDA), recombinase polymerase amplification (RPA), polymerase chain reaction (PCR), quantitative PCR (qPCR), nested PCR, multiplex PCR, degenerative PCR, asymmetric PCR, touchdown PCR, random primer PCR, hemi-nested PCR, polymerase cycling assembly (PCA), colony PCR, ligase chain reaction (LCR), digital PCR, methylation-specific PCR (MSP), co-amplification at lower denaturation temperature PCR (COLD-PCR), allele-specific PCR, intersequence-specific PCR (ISS-PCR), whole genome amplification (WGA), inverse PCR, thermal asymmetric interlaced PCR (TAIL- PCR), loop mediated amplification (LAMP), exponential amplification reaction (EXPAR), rolling circle amplification (RCA), ligase chain reaction (LCR), simple method amplifying RNA targets (SMART), single primer isothermal amplification (SPIA), multiple displacement amplification (MDA), nucleic acid sequence based amplification (NASBA), hinge-initiated primer-dependent amplification of nucleic acids (HIP), nicking enzyme amplification reaction (NEAR), and/or improved multiple displacement amplification (IMDA). In some embodiments, the amplification reaction includes reverse transcription (RT), for example when amplifying an RNA target nucleic acid.
[00163] In some embodiments, amplification of the target nucleic acid comprises modifying the sequence of the target nucleic acid. For example, amplification can be used to insert a protospacer adjacent motif (PAM) sequence into a target nucleic acid that lacks a PAM sequence. In some cases, amplification is used to increase the homogeneity of a target nucleic acid in a sample. For example, amplification can be used to remove a nucleic acid variation that is not of interest in the target nucleic acid sequence. [00164] Subjects.
[00165] As used herein, the term “subject” refers to any living or non-living organism including, but not limited to, a human (e.g. , a male human, female human, fetus, pregnant female, child, or the like), a non-human mammal, or a non-human animal. Any human or non-human animal can serve as a subject, including but not limited to mammal, reptile, avian, amphibian, fish, ungulate, ruminant, bovine (e.g, cattle), equine (e.g., horse), caprine and ovine (e.g., sheep, goat), swine (e.g, pig), camelid (e.g, camel, llama, alpaca), monkey, ape (e.g., gorilla, chimpanzee), ursid (e.g., bear), poultry, dog, cat, mouse, rat, fish, dolphin, whale and shark. In some embodiments, a subject is a male or female of any age (e.g, a man, a woman, or a child).
[00166] Reference Sequences.
[00167] As used herein, the term “reference sequence” refers to a sequence of nucleotide bases. In some embodiments, a reference sequence is a reference genome. For example, a “genome” or “reference genome” can refer to any particular known, sequenced or characterized genome, whether partial or complete, of any organism or virus that can be used to reference identified sequences from a subject. Exemplary reference genomes used for human subjects as well as many other organisms are provided in the on-line genome browser hosted by the National Center for Biotechnology Information (“NCBI”) or the University of California, Santa Cruz (UCSC). A “genome” refers to the complete genetic information of an organism or virus, expressed in nucleic acid sequences. As used herein, a reference sequence or reference genome often is an assembled or partially assembled genomic sequence from an individual subject or from multiple subjects. In some embodiments, a reference genome is an assembled or partially assembled genomic sequence from one or more human subjects. In some embodiments, a reference genome is an assembled or partially assembled genomic sequence from one or more microorganisms of the same species. The reference genome can be viewed as a representative example of a species’ set of genes. In some embodiments, a reference genome comprises sequences assigned to chromosomes. Exemplary human reference genomes include but are not limited to NCBI build 34 (UCSC equivalent: hg!6), NCBI build 35 (UCSC equivalent: hg!7), NCBI build 36.1 (UCSC equivalent: hg!8), GRCh37 (UCSC equivalent: hg!9), and GRCh38 (UCSC equivalent: hg38). [00168] The implementations described herein provide various technical solutions for determining a mutation call for a target locus in a biological sample. Details of implementations are now described in conjunction with the Figures.
[00169] Exemplary System Embodiments
[00170] FIG. 1 is a block diagram illustrating a system 100 for determining a mutation call for a target locus in a biological sample, in accordance with some implementations. The device 100 in some implementations includes one or more central processing units (CPU(s)) 102 (also referred to as processors), one or more network interfaces 104, a user interface 106, a non-persistent memory 111, a persistent memory 112, and one or more communication buses 110 for interconnecting these components. The one or more communication buses 110 optionally include circuitry (sometimes called a chipset) that interconnects and controls communications between system components. The non-persistent memory 111 typically includes high-speed random access memory, such as DRAM, SRAM, DDR RAM, ROM, EEPROM, flash memory, whereas the persistent memory 112 typically includes CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices. The persistent memory 112 optionally includes one or more storage devices remotely located from the CPU(s) 102. The persistent memory 112, and the non-volatile memory device(s) within the non-persistent memory 112, comprises non-transitory computer readable storage medium. In some implementations, the non-persistent memory 111 or alternatively the non-transitory computer readable storage medium stores the following programs, modules and data structures, or a subset thereof, sometimes in conjunction with the persistent memory 112:
• an optional operating system 116, which includes procedures for handling various basic system services and for performing hardware dependent tasks;
• an optional network communication module (or instructions) 118 for connecting the visualization system 100 with other devices, or a communication network;
• a signal data store 120 comprising, for each respective well 122 in a plurality of wells, a corresponding plurality of reporting signals 124 for a respective one or more nucleic acid molecules in the biological sample that map to the target locus, where: o the plurality of wells comprises a first set of wells 122 (e.g., 122-1-1, ... 122- K-l) representing a first allele for the target locus, o the plurality of wells further comprises a second set of wells 122 (e.g., 122-1- 2, ... 122-K-2) representing a second allele for the target locus, o each corresponding plurality of reporting signals 124 comprises, for each respective time point in a plurality of time points (e.g, P time points, where P is a positive integer), a respective reporting signal in the form of a corresponding discrete attribute value (e.g., 124-1-1-1, ... 124-1-1-P; 124-2-1- 1, ... 124-2-1 -P), o each respective well 122 in the first set of wells and each respective well 122 in the second set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample, and o optionally, the signal data store 120 further comprises a plurality of control wells 126 that are free of nucleic acid derived from the biological sample, including a first set of control wells 126-1 representing the first allele for the target locus and a second set of control wells 126-2 representing the second allele for the target locus;
• a data analysis construct 130 for determining: o for each respective well 122 in the plurality of wells, a corresponding signal yield 132 (e.g., 132-1-1, 132-1-2) for the respective well 122 using the corresponding plurality of reporting signals across the plurality of time points, and o for each respective well 122 in the first set of wells (e.g., 122-1-1, ... 122-K-l), a respective candidate call identity 134 (e.g, 134-1) based on a comparison between a corresponding first signal yield (e.g, 132-1-1) for the respective well in the first set of wells and a corresponding second signal yield (e.g, 132- 1-2) for a corresponding well in the second set of wells; and
• a voting construct 140 for performing a voting procedure across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus.
[00171] In some implementations, one or more of the above identified elements are stored in one or more of the previously mentioned memory devices and correspond to a set of instructions for performing a function described above. The above identified modules, data, or programs (e.g., sets of instructions) need not be implemented as separate software programs, procedures, data sets, or modules, and thus various subsets of these modules and data may be combined or otherwise re-arranged in various implementations. In some implementations, the non-persistent memory 111 optionally stores a subset of the modules and data structures identified above. Furthermore, in some embodiments, the memory stores additional modules and data structures not described above. In some embodiments, one or more of the above identified elements is stored in a computer system, other than that of system 100, that is addressable by system 100 so that system 100 may retrieve all or a portion of such data when needed.
[00172] Although FIG. 1 depicts a “system 100,” the figures are intended more as a functional description of the various features which may be present in computer systems than as a structural schematic of the implementations described herein. In practice, and as recognized by those of ordinary skill in the art, items shown separately could be combined and some items could be separated. Moreover, although FIG. 1 depicts certain data and modules in non-persistent memory 111, some or all of these data and modules may be in persistent memory 112.
[00173] Embodiments for Determining Mutation Calls
[00174] While a system in accordance with the present disclosure has been disclosed with reference to FIG. 1, a method in accordance with the present disclosure is now detailed with reference to FIGS. 2A-2G.
[00175] Referring to FIGS. 2A-2G, the present disclosure provides a method 200 for determining a mutation call for a target locus in a biological sample. In some embodiments, the method is performed at a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors.
[00176] Referring to Block 202, the method includes obtaining a signal dataset 120 comprising, for each respective well 122 in a plurality of wells in a common plate, a corresponding plurality of reporting signals 124 for a respective one or more nucleic acid molecules in the biological sample that map to the target locus. The signal dataset 120 represents a plurality of time points. The plurality of wells 122 comprises a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus. The plurality of wells 122 further comprises a second set of wells representing a second allele for the target locus, where each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus. Each corresponding plurality of reporting signals 124 comprises, for each respective time point in the plurality of time points, a respective reporting signal 124 in the form of a corresponding discrete attribute value. Each respective well in the first set of wells and each respective well in the second set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample.
[00177] Samples.
[00178] In some embodiments, the biological sample is a clinical sample, a diagnostic sample, an environmental sample, a consumer quality sample, a food sample, a biological product sample, a microbial testing sample, a tumor sample, a forensic sample and/or a laboratory or hospital sample. In some embodiments, the biological sample is obtained from a human or an animal. In some embodiments, a biological sample is a sample from a patient undergoing a treatment.
[00179] In some embodiments, the biological sample is collected from an environmental source, such as a field (e.g, an agricultural field), lake, river, creek, ocean, watershed, water tank, water reservoir, pool (e.g, swimming pool), pond, air vent, wall, roof, soil, plant, and/or other environmental source. In some embodiments, the biological sample is collected from an industrial source, such as a clean room (e.g, in manufacturing or research facilities), hospital, medical laboratory, pharmacy, pharmaceutical compounding center, food processing area, food production area, water or waste treatment facility, and/or food product. In some embodiments, the biological sample is an air sample, such as ambient air in a facility (e.g, a medical facility or other facility), exhaled or expectorated air from a subject, and/or aerosols, including any biological contaminants present therein (e.g, bacteria, fungi, viruses, and/or pollens). In some embodiments, the biological sample is a water sample, such as dialysis systems in medical facility (e.g, to detect waterborne pathogens of clinical significance and/or to determine the quality of water in a facility). In some embodiments, the biological sample is an environmental surface sample, such as before or after a sterilization or disinfecting process (e.g, to confirm the effectiveness of the sterilization or disinfecting procedure).
[00180] In some embodiments, the biological sample is obtained from a subject, such as a human (e.g, a patient). In some embodiments, the biological sample is obtained from any tissue, organ or fluid from the subject. In some embodiments, a plurality of biological samples is obtained from the subject (e.g, a plurality of replicates and/or a plurality of samples including a healthy sample and a diseased sample). [00181] In some embodiments, the biological sample is a clinical sample. In some embodiments, the biological sample is a bodily fluid. In some embodiments, the bodily fluid is sputum, saliva, nasopharyngeal fluid, oropharyngeal fluid or blood. In some embodiments, the biological sample is obtained from a nasopharyngeal or oropharyngeal swab. In some embodiments, the biological sample is obtained from a sample repository.
[00182] In some embodiments, the biological sample is obtained from a human with a disease condition (e.g., an infectious disease and/or a disease caused by a pathogenic microorganism). For instance, referring to Block 204, in some embodiments, the biological sample is obtained from a subject that is infected with a pathogen. In some embodiments, the pathogen is a virus, bacteria, fungus, or parasite.
[00183] In some embodiments, the pathogen is the causative agent of influenza, common cold, measles, rubella, chickenpox, norovirus, polio, infectious mononucleosis (mono), herpes simplex virus (HSV), human papillomavirus (HPV), human immunodeficiency virus (HIV), viral hepatitis (e.g., hepatitis A, B, C, D, and/or E), viral meningitis, West Nile Virus, rabies, ebola, strep throat, bacterial urinary tract infections (UTIs) (e.g, coliform bacteria), bacterial food poisoning (e.g, E. coli, Salmonella, and/ or Shigella), bacterial cellulitis (e.g, Staphylococcus aureus (MRSA)), bacterial vaginosis, gonorrhea, chlamydia, syphilis, Clostridium difficile (C. diff), tuberculosis, whooping cough, pneumococcal pneumonia, bacterial meningitis, Lyme disease, cholera, botulism, tetanus, anthrax, vaginal yeast infection, ringworm, athlete’s foot, thrush, aspergillosis, histoplasmosis, Cryptococcus infection, fungal meningitis, malaria, toxoplasmosis, trichomoniasis, giardiasis, tapeworm infection, roundworm infection, pubic and head lice, scabies, leishmaniasis, and/or river blindness. In some embodiments, the pathogen is the causative agent of one or more viral respiratory diseases. In some embodiments, the pathogen is the causative agent of a coronavirus infection. Referring to Block 206, in some embodiments, the pathogen is severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
[00184] In some embodiments, the biological sample is any of the embodiments for samples described herein. Other suitable embodiments of samples and/or subjects include those disclosed above (see, for example, the sections entitled “Target Nucleic Acids: Samples” and “Subjects,” above), and any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art. [00185] Referring to Block 208, in some embodiments, the target locus is selected from the group consisting of a single nucleotide variant, a multi-nucleotide variant, an indel, a DNA rearrangement, and a copy number variation. In some embodiments, the target locus is selected from the group consisting of a restriction fragment length polymorphism (RFLP), a random amplified polymorphic DNA (RAPD), an amplified fragment length polymorphism (AFLP), a variable number tandem repeat (VNTR), an oligonucleotide polymorphism (OP), a single nucleotide polymorphism (SNP), an allele specific associated primer (ASAP), an inverse sequence-tagged repeat (ISTR), an inter-retrotransposon amplified polymorphism (IRAP), and a simple sequence repeat (SSR or microsatellite).
[00186] In some embodiments, the target locus is selected from a database. For instance, in some embodiments, the target locus maps to a corresponding reference sequence (e.g., a reference genome), and the corresponding reference sequence is obtained from a nucleotide sequence database. Suitable nucleotide sequence databases include global genome databases and/or microorganism-specific genome databases. For example, in some embodiments, a reference sequence for a microorganism is obtained from NCBI, BLAST, EMBL-EBI, GenBank, Ensembl, EuPathDB, The Human Microbiome Project, Pathogen Portal, RDP, SILVA, GREENGENES, EBI Metagenomics, EcoCyc, PATRIC, TBDB, PlasmoDB, the Microbial Genome Database (MBGD), and/or the Microbial Rosetta Stone Database. See, for example, Zhulin, 2015, “Databases for Microbiologists,” J Bacteriol 197:2458 -2467, doi: 10.1128/JB.00330-15; Uchiyama et a/., 2019, “MBGD update 2018: microbial genome database based on hierarchical orthology relations covering closely related and distantly related comparisons,” Nuc Acids Res., 47 (DI), D382-D389, doi: 10.1093/nar/gkyl054; and Ecker et al., 2005, “The Microbial Rosetta Stone Database: A compilation of global and emerging infectious microorganisms and bioterrorist threat agents,” BMC Microbiology 5, 19, doi: 10.1186/1471-2180-5-19, each of which is hereby incorporated by reference herein in its entirety.
[00187] In some embodiments, the target locus is a gene. In some embodiments, the target locus maps to all or a portion of a gene. In some embodiments, the target locus maps to all or a portion of a plurality of genes.
[00188] In some embodiments, the target locus comprises DNA or RNA. For instance, in some such embodiments, the target locus maps to all or a portion of a reference genome that consists essentially of RNA sequences. In some embodiments, the target locus maps to all or a portion of a reference genome that consists essentially of DNA sequences. [00189] In some embodiments, the target locus comprises DNA, and the one or more nucleic acid molecules in the biological sample that map to the target locus is obtained from a transcription reaction of a DNA molecule for the target locus. In some such embodiments, the transcription reaction generates one or more RNA transcripts that correspond to the target locus and, for each respective well in the plurality of wells in the common plate, the corresponding aliquot of nucleic acid derived from the biological sample includes the one or more RNA transcripts used for obtaining the respective plurality of reporting signals.
[00190] In some embodiments, the target locus, nucleic acids, mutations, and/or reference sequences include any of the embodiments described herein. Other suitable embodiments of target loci, nucleic acids, mutations, and/or reference sequences include those disclosed above (see, for example, the sections entitled “Target Loci,” “Target Nucleic Acids,” and “Reference Sequences,” above), as well as any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art.
[00191] For instance, in some embodiments, the target locus comprises a plurality of alleles, including a first allele and a second allele. Referring to Block 210, in some embodiments, the first allele is wild-type, the second allele is mutant, and the mutation call is wild-type or mutant. In some embodiments, referring to Block 212, the first allele is a first mutant, and the second allele is a second mutant. In some such embodiments, the mutation call is for the first mutant allele or the second mutant allele. In some embodiments, the first allele and the second allele are selected from a plurality of alleles for the respective target locus.
[00192] In some embodiments, the plurality of alleles includes at least 2, at least 3, at least 4, at least 5, or at least 8 alleles. In some embodiments, the plurality of alleles includes no more than 10, no more than 8, no more than 5, no more than 3, or no more than 2 alleles. In some embodiments, the plurality of alleles is from 2 to 5, from 3 to 8, from 3 to 10, or from 5 to 10 alleles. In some embodiments, the plurality of alleles falls within another range starting no lower than 2 alleles and ending no higher than 10 alleles.
[00193] In some embodiments, the target locus comprises a plurality of alleles, including at least a wild-type allele. In some embodiments, the target locus comprises a plurality of alleles, including at least a mutant allele. In some embodiments, the target locus comprises a plurality of alleles, including at least a first mutant allele and a second mutant allele. In some embodiments, the target locus comprises one or more mutant alleles. For example, in some embodiments, the target locus comprises at least 2, at least 3, at least 4, at least 5, or at least 8 mutant alleles. In some embodiments, the target locus comprises no more than 10, no more than 8, no more than 5, no more than 3, or no more than 2 mutant alleles. In some embodiments, the target locus comprises from 2 to 5, from 3 to 8, from 3 to 10, or from 5 to 10 mutant alleles. In some embodiments, the target locus comprises a set of mutant alleles that falls within another range starting no lower than 2 alleles and ending no higher than 10 alleles.
[00194] Accordingly, in some embodiments, the mutation call is for a wild-type allele. In some embodiments, the mutation call is for a mutant allele. In some embodiments, the mutation call is for a respective mutant allele in a plurality of mutant alleles. In some embodiments, the mutation call is selected from the group consisting of a wild-type allele and a respective mutant allele in a plurality of mutant alleles.
[00195] In some embodiments, the first plurality of guide nucleic acids that have the first allele of the target locus hybridize to the first allele. In some embodiments, the first plurality of guide nucleic acids that have the first allele of the target locus have a nucleic acid sequence that is reverse complementary to the nucleic acid sequence of the first allele. In some embodiments, the second plurality of guide nucleic acids that have the second allele of the target locus hybridize to the second allele. In some embodiments, the second plurality of guide nucleic acids that have the second allele of the target locus have a nucleic acid sequence that is reverse complementary to the nucleic acid sequence of the second allele.
[00196] In some embodiments, a respective guide nucleic acid comprises any of the embodiments for guide nucleic acids disclosed herein, as well as any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art. See, for example, the section entitled “Engineered Guide Nucleic Acids,” above.
[00197] In some embodiments, the corresponding aliquot of nucleic acid derived from the biological sample is RNA or DNA. In some embodiments, the corresponding aliquot of nucleic acid derived from the biological sample comprises nucleic acids obtained from within cells. Alternatively, or in addition, in some embodiments, the corresponding aliquot of nucleic acid derived from the biological sample comprises cell-free nucleic acid molecules.
[00198] In some embodiments, the one or more nucleic acid molecules comprises synthetic nucleic acid molecules. For instance, Example 1 describes systems and methods for identifying mutation calls using synthetic nucleic acid molecules, in accordance with an embodiment of the present disclosure.
[00199] In some embodiments, the method includes isolating the one or more nucleic acid molecules from the biological sample. In some embodiments, isolation of nucleic acid molecules includes removing one or more cells from the biological sample, subjecting the biological sample to a lysis step, and/or treating the biological sample in order to separate cellular nucleic acid molecules from cell-free nucleic acid molecules. In some embodiments, isolation of nucleic acid molecules includes isolation of cell-free nucleic acid molecules from a liquid biological sample (e.g, plasma and/or blood).
[00200] In some embodiments, the method includes heat-inactivating the biological sample.
[00201] In some embodiments, the method includes performing an amplification of one or more nucleic acid molecules derived from the biological sample. See, for instance, the section entitled, “Nucleic Acid Amplification,” below.
[00202] Other methods for deriving nucleic acid from biological samples and/or preparing biological samples for the same are contemplated for use in the present disclosure, as will be apparent to one skilled in the art.
[00203] In some embodiments, the plurality of wells 122 comprises at least 3, at least 5, at least 10, at least 20, at least 40, at least 50, at least 80, at least 100, at least 300, or at least 500 wells. In some embodiments, the plurality of wells comprises no more than 1000, no more than 500, no more than 200, no more than 80, no more than 50, or no more than 30 wells. In some embodiments, the plurality of wells is from 3 to 20, from 5 to 100, from 8 to 50, or from 10 to 500 wells. In some embodiments, the plurality of wells falls within another range starting no lower than 3 wells and ending no higher than 1000 wells.
[00204] In some embodiments, the common plate is a multi-well plate.
[00205] Nucleic Acid Amplification.
[00206] Referring to Block 214, in some embodiments, the signal dataset 120 is obtained by a procedure comprising amplifying a first plurality of nucleic acids 304 derived from the biological sample, thereby generating a plurality of amplified nucleic acids 404. The procedure further includes, for each respective well 122 in the first set of wells and each respective well in the second set of wells, partitioning, from the plurality of amplified nucleic acids 404, the respective corresponding aliquot of nucleic acid 410; contacting the respective corresponding aliquot of nucleic acid with at least a programmable nuclease, a guide nucleic acid, and one or more reporters; and cleaving the one or more reporters using the programmable nuclease if the first allele or the second allele, respectively, is present, thereby generating a reporting signal 124.
[00207] Accordingly, in an example embodiment illustrated in FIG. 3, a biological sample 302 is used to obtain a plurality of nucleic acids 304. For instance, in some implementations, the nucleic acids 304 are extracted from the biological sample 302. The signal dataset 120 is obtained by a procedure 306 such as a DETECTR reaction, which includes an optional amplification reaction 308 and a programmable nuclease-based assay 310. The amplifying 308 includes any suitable method for amplification of nucleic acids known in the art, such as reverse transcriptase loop-mediated isothermal amplification (RT-LAMP). The programmable nuclease-based assay 310 also includes any suitable detection reaction, such as a Cas nuclease-based reaction, in which recognition of a target nucleic acid in the sample nucleic acids 304 by a guide nucleic acid (e.g, gRNA) induces cleavage of a reporter (e.g, a nucleic acid probe) by a Cas nuclease. In some implementations, the Cas nuclease is complexed with the guide nucleic acid. Reporting signals generated from cleavage of the reporter can be detected, measured, and used for mutation calling and variant identification 312, in accordance with methods of the present disclosure.
[00208] In some embodiments, the procedure 306 is a DETECTR reaction. In some embodiments, the procedure 306 is SHERLOCK. In some embodiments, the signal dataset 120 is obtained by a procedure that does not comprise an amplification step.
[00209] Generation of reporting signals using amplification and/or programmable nuclease-based assays are further described in, e.g, Broughton et al., “CRISPR-Cas 12-based detection of SARS-CoV-2,” Nature Biotechnology 38, 870-874 (2020), which is hereby incorporated herein by reference in its entirety. Other embodiments for the generation of reporting signals using amplification and/or programmable nuclease-based assays are contemplated herein, as will be apparent to one skilled in the art.
[00210] Referring to Block 216, in some embodiments, the amplifying is performed using isothermal amplification, loop-mediated isothermal amplification (LAMP), reverse transcriptase loop-mediated isothermal amplification (RT-LAMP), recombinase polymerase amplification (RPA), reverse transcriptase recombinase polymerase amplification (RT-RPA), polymerase chain reaction (PCR), or reverse transcriptase polymerase chain reaction (RT- PCR). In some embodiments, the amplifying is performed using a multiplexed amplification reaction.
[00211] In some embodiments, the amplifying is performed using transcription mediated amplification (TMA), helicase dependent amplification (HD A), circular helicase dependent amplification (cHDA), strand displacement amplification (SDA), recombinase polymerase amplification (RPA), polymerase chain reaction (PCR), quantitative PCR (qPCR), nested PCR, multiplex PCR, degenerative PCR, asymmetric PCR, touchdown PCR, random primer PCR, hemi-nested PCR, polymerase cycling assembly (PCA), colony PCR, ligase chain reaction (LCR), digital PCR, methylation-specific PCR (MSP), co-amplification at lower denaturation temperature PCR (COLD-PCR), allele-specific PCR, intersequence-specific PCR (ISS-PCR), whole genome amplification (WGA), inverse PCR, thermal asymmetric interlaced PCR (TAIL-PCR), loop mediated amplification (LAMP), exponential amplification reaction (EXPAR), rolling circle amplification (RCA), ligase chain reaction (LCR), simple method amplifying RNA targets (SMART), single primer isothermal amplification (SPIA), multiple displacement amplification (MDA), nucleic acid sequence based amplification (NASBA), hinge-initiated primer-dependent amplification of nucleic acids (HIP), nicking enzyme amplification reaction (NEAR), and/or improved multiple displacement amplification (IMDA). In some embodiments, the amplification reaction includes reverse transcription (RT), for example when amplifying an RNA target nucleic acid. See, e.g., the section entitled “Target Nucleic Acids: Amplification Techniques,” above.
[00212] In particular, isothermal amplification techniques employ a polymerase and a set of specialized primers designed to recognize distinct sequences in one or more target nucleic acids. For example, LAMP techniques typically perform amplification of target nucleic acid molecules at a constant temperature (e.g, 60-65 °C) using multiple inner and outer primers and a polymerase having strand displacement activity. In some instances, LAMP is initiated by an inner primer pair containing a nucleic acid sequence complementary to a portion of the sense and antisense strands of the target nucleic acid. Following strand displacement synthesis by the inner primers, strand displacement synthesis primed by an outer primer pair can cause release of a single-stranded amplicon. The single-stranded amplicon can serve as a template for further synthesis primed by a second inner and second outer primer that hybridize to the other end of the target nucleic acid and produce a stem-loop nucleic acid structure. In subsequent LAMP cycling, one inner primer hybridizes to the loop on the product and initiates displacement and target nucleic acid synthesis, yielding the original stem-loop product and a new stem-loop product with a stem twice as long. Additionally, the 3’ terminus of an amplicon loop structure serves as initiation site for self-templating strand synthesis, yielding a hairpin-like amplicon that forms an additional loop structure to prime subsequent rounds of self-templated amplification. The amplification continues with accumulation of many copies of the target nucleic acid. The final products of the LAMP process are stem-loop nucleic acids with concatenated repeats of the target nucleic acid in cauliflower-like structures with multiple loops formed by annealing between alternately inverted repeats of a target nucleic acid sequence in the same strand.
[00213] Typically, LAMP assays produce a detectable signal (e.g, fluorescence) during the amplification reaction. In some embodiments, the method comprises detecting and/or quantifying a detectable signal (e.g, fluorescence) produced during the LAMP assay. Any suitable method for detecting and quantifying florescence can be used.
[00214] In some embodiments, the amplifying is performed using any of the embodiments described herein, or any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art. See, for example, the section entitled “Target Nucleic Acids: Amplification Techniques,” above.
[00215] Referring to Block 218, the method further includes, prior to the amplifying, partitioning the first plurality of nucleic acids 304 derived from the biological sample into a plurality of amplification replicates 404 (e.g, 404-1, 404-2, 404-3), where the amplifying generates a respective plurality of amplified nucleic acids for each respective amplification replicate. Accordingly, the method comprises generating aliquots 404 for amplification reactions, as illustrated in FIG. 4A. In some embodiments, the first plurality of nucleic acids is not partitioned into amplification replicates prior to the amplifying.
[00216] In some embodiments, the plurality of amplification replicates comprises at least 3, at least 4, at least 5, at least 6, at least 8, at least 10, at least 15, or at least 20 amplification replicates. In some embodiments, the plurality of amplification replicates comprises no more than 30, no more than 20, no more than 10, or no more than 6 amplification replicates. In some embodiments, the plurality of amplification replicates is from 3 to 9, from 4 to 12, or from 10 to 25 amplification replicates. In some embodiments, the plurality of amplification replicates falls within another range starting no lower than 3 replicates and ending no higher than 30 replicates. [00217] In some embodiments, the method further includes applying an amplification threshold filter to each respective amplification replicate, where, when the respective plurality of amplified nucleic acids fails to satisfy an amplification threshold, the respective amplification replicate is removed from the plurality of amplification replicates.
[00218] In some embodiments, the method further includes applying an amplification threshold filter to each respective biological sample in a plurality of biological samples, where, when the respective plurality of amplified nucleic acids fails to satisfy an amplification threshold, the respective sample is removed from the plurality of samples.
[00219] In some embodiments, the amplification threshold is established by performing a visual inspection of amplification reaction curves. In some embodiments, the amplification reaction curve is a receiver operating characteristic (ROC) curve. In some embodiments, the amplification threshold is determined by selecting an optimal Youden Index point such that the absolute difference between a sensitivity metric and a false positive rate calculated using reference amplification data is maximized.
[00220] In some embodiments, the amplification threshold is derived using positive and negative (e.g, no template) amplification controls from the amplification reaction (e.g, LAMP) of a given sample. For instance, in some such embodiments, the amplification threshold includes a time threshold and/or a fluorescence rate threshold. Positive amplification controls are deemed to represent an ideal sample, where the ideal sample exhibits a classic sigmoidal rise of fluorescence over time, and negative amplification controls are deemed to represent the background fluorescence. For instance, in some such embodiments, an ideal sample is approximated as having fluorescence kinetics similar to that of a positive amplification control. In some embodiments, the amplification threshold is determined based on a mean value between the amplification controls within a common plate. This determination can be used to reduce the impact of background that can occur in amplification replicates.
[00221] In some implementations, the determination of the amplification threshold further includes collecting fluorescence values at the time threshold. The time threshold can be used to exclude those samples that would amplify closer to the endpoint, signifying the amplification intermediates (e.g, LAMP intermediates) to be the majority contributors of the rise in the signal and not the actual sample itself. The determination further includes assigning a score for each sample, where the score is calculated as a ratio of the rate of fluorescence threshold to the rate fluorescence value at the time threshold for each sample. In some embodiments, when the ratio of rate of fluorescence between controls and samples is less than a threshold ratio (e.g, 1), then samples are deemed to have failed to reach the minimum fluorescence required to be called out as amplified and, when the ratio is greater than or equal to the threshold ratio (e.g, 1), then samples are deemed to have amplified sufficiently.
[00222] In some embodiments, the determination of the amplification threshold comprises performing an ROC analysis using the calculated score and/or a measured amplification metric (e.g, ground truth) of each sample in a plurality of samples, thereby identifying an exact score value for the respective sample.
[00223] In some embodiments, the time threshold is from 5 minutes to 40 minutes. In some embodiments, the time threshold is a time point within the total duration of the amplification reaction that is at least 10%, at least 20%, or at least 30% into the total duration and is no more than 90%, no more than 80%, or no more than 70% into the total duration. For example, in some embodiments, the amplification reaction has a duration of 40 minutes and the time threshold is selected from a range of from 10 minutes to 30 minutes. In some embodiments, the amplification reaction has a duration of 40 minutes and the time threshold is 18 minutes.
[00224] In some embodiments, one or more amplification replicates in a plurality of amplification replicates are pooled and/or diluted, e.g, by any suitable method known in the art.
[00225] In some embodiments, referring to Block 220, the method further includes, prior to the contacting, partitioning the respective plurality of amplified nucleic acids for each respective amplification replicate 404 into a plurality of detection replicates 408, where each respective detection replicate 410 in the plurality of detection replicates represents a respective well 122 in the plurality of wells.
[00226] Accordingly, the method comprises generating a respective plurality of aliquots 408 from a respective amplification reaction 404, as illustrated in FIGS. 4A-4B. For instance, for each plurality of aliquots partitioned from a respective amplification reaction, each respective aliquot 410 corresponds to a respective well 122 in the plurality of wells. In some embodiments, wells are grouped into a plurality of sets depending on the type of guide nucleic acid used to contact the corresponding aliquot of nucleic acid, as illustrated in FIGS. 4A-4B (e.g, a first set of wells comprises allele 1 (WT) guide sequences and a second set of wells comprises allele 2 (MUT) guide sequences). Thus, as illustrated in FIG. 4B, each well in the first set of wells contains amplified nucleic acids that are interrogated by the allele 1 guide sequences (WT; set “A”) and each well in the second set of wells contains amplified nucleic acids that are interrogated by the allele 2 guide sequences (MUT; set “B”).
[00227] In some embodiments, each respective detection replicate represents a respective well in the plurality of wells. In some embodiments, each respective detection replicate represents a pair of wells in a respective comparative programmable nuclease-based reaction. For example, in a respective comparative programmable nuclease-based reaction, a pair of wells includes (i) a first well in which the corresponding aliquot of nucleic acid is interrogated with a guide nucleic acid having the first allele and (ii) a second well in which the corresponding aliquot of nucleic acid is interrogated with a guide nucleic acid having the second allele. Thus, as FIG. 4B illustrates, in some such embodiments, a detection replicate includes, e.g, well 410-1-1-1-A (e.g., well 410 for plate 1, amplification replicate 1, detection reaction 1, and wild-type nucleic acid guide “A”) and well 410- 1-1-1 -B (e.g., well 410 for plate 1, amplification replicate 1, detection reaction 1, mutant nucleic acid guide “B”).
[00228] In some embodiments, for each respective amplification replicate, the plurality of partitioned detection replicates comprises at least 3, at least 4, at least 5, at least 6, at least 8, at least 10, at least 15, or at least 20 detection replicates. In some embodiments, for each respective amplification replicate, the plurality of partitioned detection replicates comprises no more than 30, no more than 20, no more than 10, or no more than 6 detection replicates. In some embodiments, for each respective amplification replicate, the plurality of partitioned detection replicates is from 3 to 9, from 4 to 12, or from 10 to 25 detection replicates. In some embodiments, for each respective amplification replicate, the plurality of partitioned detection replicates falls within another range starting no lower than 3 replicates and ending no higher than 30 replicates. In some embodiments, nucleic acids in the biological sample, and/or a respective plurality of amplified nucleic acids derived therefrom, are not partitioned into a plurality of detection replicates prior to the contacting.
[00229] In some embodiments, the cleaving the one or more reporters using the programmable nuclease is performed using a detection assay, such as a programmable nuclease-based step in a DETECTR assay. See, e.g, Broughton et al., “CRISPR-Cas 12- based detection of SARS-CoV-2,” Nature Biotechnology 38, 870-874 (2020), which is hereby incorporated herein by reference in its entirety. [00230] In some embodiments, the method further comprises, prior to the amplifying, partitioning the first plurality of nucleic acids 304 derived from the biological sample into a plurality of replicates for each locus in a plurality of loci (e.g, an amino acid position, gene, and/or target sequence of interest). In some embodiments, nucleic acids in the biological sample are not partitioned into a plurality of replicates for each locus in a plurality of loci.
[00231] In some embodiments, the method further comprises, prior to the contacting, partitioning the respective plurality of amplified nucleic acids for a respective amplification replicate 404 into a plurality of replicates for each locus in a plurality of loci (e.g, an amino acid position, gene, and/or target sequence of interest). For instance, as illustrated in FIG. 4A, nucleic acids 304 are interrogated at each of a plurality of loci of interest 406.
Accordingly, the amplification replicates 404 are partitioned across a plurality of multi-well plates, where each respective plate corresponds to a respective target locus in the plurality of loci (e.g, 406-1, 406-2, 406-3). In some such embodiments, for each respective multi-well plate 406 corresponding to a respective target locus, the respective amplification replicate 404 is further partitioned into a plurality of detection replicates 408 (e.g, 408-1-1, 408-1-2, 408-1-3), where each respective detection replicate 410 in the plurality of detection replicates represents a respective well 122 in the plurality of wells.
[00232] In some embodiments, one or more detection replicates in a plurality of detection replicates are pooled and/or diluted, e.g, by any suitable method known in the art.
[00233] Programmable Nuclease-Based Assays.
[00234] In some embodiments, the method comprises contacting the respective corresponding aliquot of nucleic acid with a plurality of programmable nucleases. In some embodiments, each respective programmable nuclease in the plurality of programmable nucleases is the same or different type of nuclease.
[00235] Referring to Block 222, in some embodiments, a respective programmable nuclease is a trans-cleaving programmable nuclease (e.g, capable of nonspecific cleavage of nucleic acids). In some embodiments, the programmable nuclease targets DNA (e.g, singlestranded DNA) or RNA.
[00236] Referring to Block 224, in some embodiments, the programmable nuclease is a Cas nuclease. In some embodiments, the Cas nuclease is selected from the group consisting of a Casl2, Casl3, Cas 14, CasPhi, and/or any subtypes or orthologs thereof. In some embodiments, the Cas nuclease is LbCasl2a, AsCasl2a, or CasDxl. [00237] In some embodiments, the programmable nuclease is any of the embodiments described herein, or any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art. See, for example, the section entitled “Programmable Nucleases,” above. For instance, in some embodiments, the programmable nuclease is any of the programmable nucleases described in Broughton et al., “CRISPR-Casl2-based detection of SARS-CoV -2, ’’ Nature Biotechnology 38, 870-874 (2020), which is hereby incorporated herein by reference in its entirety.
[00238] In some embodiments, referring to Block 226, for each respective well 122 in the first set of wells, the programmable nuclease is programmed for the target locus using a corresponding guide nucleic acid in the first plurality of guide nucleic acids that have the first allele of the target locus, where the cleaving the one or more reporters using the programmable nuclease is initiated upon recognition of the target locus by the corresponding guide nucleic acid. For instance, as described herein (see, e.g., the sections entitled “Engineered Guide Nucleic Acids” and “Programmable Nucleases,” above), a guide nucleic acid directs a programmable nuclease to a respective target locus for cleavage by recognizing the nucleic acid sequence of the target locus. Recognition of the target locus can be engineered by designing guide nucleic acids that hybridize to (e.g., are reverse complementary to) all or a portion of the nucleic acid sequence of the target locus. In some embodiments, the corresponding guide nucleic acid is complexed to the programmable nuclease.
[00239] Referring to Block 228, in some embodiments, the first plurality of guide nucleic acids is RNA. In some embodiments, the first plurality of guide nucleic acids is gRNA or sgRNA.
[00240] Referring to Block 230, in some embodiments, the first plurality of guide nucleic acids hybridizes to a wild-type sequence for the target locus. In some embodiments, the first plurality of guide nucleic acids is reverse complementary to a wild-type sequence for the target locus.
[00241] In some embodiments, for each respective well in the second set of wells, the programmable nuclease is programmed for the target locus using a corresponding guide nucleic acid in the second plurality of guide nucleic acids that have the second allele of the target locus, where the cleaving the one or more reporters using the programmable nuclease is initiated upon recognition of the target locus by the corresponding guide nucleic acid. In some embodiments, the second plurality of guide nucleic acids is RNA. In some embodiments, the second plurality of guide nucleic acids hybridizes to (e.g., is reverse complementary to) a mutant sequence for the target locus.
[00242] In some embodiments, the guide nucleic acids comprise any of the embodiments described herein, or any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art. See, for example, the section entitled “Engineered Guide Nucleic Acids,” above.
[00243] Signal Dataset.
[00244] In some embodiments, each respective reporter in the one or more reporters is single-stranded DNA. In some embodiments, each respective reporter in the one or more reporters is single-stranded RNA.
[00245] Referring to Block 232, in some embodiments, each respective reporter in the one or more reporters comprises a fluorescent reporter linked to a quencher.
[00246] As described above, in some implementations, for each respective well in the first set of wells, the one or more reporters are cleaved by the programmable nuclease when the first allele is present in the corresponding aliquot of nucleic acid derived from the biological sample, thus generating a reporting signal. In some embodiments, when the first allele is not present in the corresponding aliquot of nucleic acid, the one or more reporters are not cleaved by the programmable nuclease, and the reporting signal is zero or background.
[00247] In some implementations, for each respective well in the second set of wells, the one or more reporters are cleaved by the programmable nuclease when the second allele is present in the corresponding aliquot of nucleic acid derived from the biological sample, thus generating a reporting signal. In some embodiments, when the second allele is not present in the corresponding aliquot of nucleic acid, the one or more reporters are not cleaved by the programmable nuclease, and the reporting signal is zero or background.
[00248] In some embodiments, each respective reporting signal 124 is obtained from a detection moiety. For instance, in some embodiments, each respective reporting signal is a fluorescence emission by a fluorophore, and the corresponding discrete attribute value is a fluorescence intensity. In some embodiments, each respective reporting signal is an intensity of light, and the corresponding discrete attribute value is measured in relative units (e.g., relative fluorescent units). [00249] In some embodiments, the corresponding discrete attribute value for each respective reporting signal is detected using a plate reader.
[00250] In some embodiments, the respective reporting signal 124 is a lateral flow readout. In some such embodiments, in the lateral flow readout is detected manually by visual inspection. In some embodiments, the lateral flow readout is detected using an image analysis algorithm (e.g., ImageJ, FIJI, etc.). Detection of reporting signals by lateral flow readout is further described in Broughton et al., “CRISPR-Cas 12-based detection of SARS-CoV-2,” Nature Biotechnology 38, 870-874 (2020), which is hereby incorporated herein by reference in its entirety.
[00251] In some embodiments, the one or more reporters and/or the plurality of reporting signals, and methods of detection thereof, comprise any of the embodiments described herein, as well as any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art. See, for example, the section entitled “Reporters,” above.
[00252] In some embodiments, the plurality of time points represented by the signal dataset comprises between 20 and 2000 time points. In some embodiments, the plurality of time points represented by the signal dataset is at least 10, at least 20, at least 50, at least 80, at least 100, at least 200, at least 500, at least 1000, or at least 2000 time points. In some embodiments, the plurality of time points is no more than 5000, no more than 2000, no more than 1000, no more than 500, no more than 200, no more than 100, or no more than 50 time points. In some embodiments, the plurality of time points represented by the signal dataset is from 10 to 500, from 50 to 200, from 40 to 100, or from 200 to 1000. In some embodiments, the plurality of time points represented by the signal dataset falls within another range starting no lower than 10 time points and ending no higher than 5000 time points.
[00253] In some embodiments, the plurality of time points is obtained over a duration of between 30 seconds and 1 hour. In some embodiments, the plurality of time points is obtained over a duration of at least 30 seconds, at least 1 minute, at least 5 minutes, at least 10 minutes, at least 30 minutes, or at least 1 hour. In some embodiments, the plurality of time points is obtained over a duration of no more than 2 hours, no more than 1 hour, no more than 30 minutes, or no more than 5 minutes. In some embodiments, the plurality of time points is obtained over a duration of from 30 seconds to 10 minutes, from 5 minutes to 1 hour, or from 20 minutes to 50 minutes. In some embodiments, the plurality of time points is obtained over a duration that falls within another range starting no lower than 30 seconds and ending no higher than 2 hours.
[00254] In some embodiments, each respective time point corresponds to a respective measurement obtained during a detection assay, where the detection assay is performed over a duration of time. For instance, in some embodiments, each respective time point corresponds to a respective readout obtained from a plate reader, where each respective readout in a plurality of readouts is taken at intervals over the duration of the detection assay (e.g., every 30 seconds, every 1 second, every 0.5 seconds, etc.).
[00255] Accordingly, in some embodiments, the plurality of reporting signals 124 for each respective well 122 in the plurality of wells comprises at least 10, at least 20, at least 50, at least 80, at least 100, at least 200, at least 500, at least 1000, or at least 2000 reporting signals. In some embodiments, the plurality of reporting signals for each respective well is no more than 5000, no more than 2000, no more than 1000, no more than 500, no more than 200, no more than 100, or no more than 50 reporting signals. In some embodiments, the plurality of reporting signals for each respective well is from 10 to 500, from 50 to 200, from 40 to 100, or from 200 to 1000 reporting signals. In some embodiments, the plurality of reporting signals for each respective well falls within another range starting no lower than 10 reporting signals and ending no higher than 5000 reporting signals.
[00256] In some embodiments, the signal dataset 120 further comprises, for each respective control well in a plurality of control wells in the common plate, a corresponding plurality of control reporting signals, where the plurality of control wells are free of nucleic acid derived from the biological sample. The plurality of control wells comprises a first set of control wells 126-1 representing the first allele for the target locus, where each control well in the first set of control wells includes a first plurality of guide nucleic acids that have the first allele of the target locus. The plurality of control wells comprises a second set of control wells 126-2 representing the second allele for the target locus, where each control well in the second set of control wells includes a second plurality of guide nucleic acids that have the second allele of the target locus. Each corresponding plurality of control reporting signals comprises, for each respective time point in the plurality of time points, a respective control reporting signal in the form of a corresponding discrete attribute value.
[00257] In some embodiments, each respective control well is a negative control well and/or a no template control wells. [00258] In some embodiments, the signal dataset 120 further comprises, for each respective positive control well in a plurality of positive control wells in the common plate, a corresponding plurality of positive control reporting signals. The plurality of positive control wells comprises a first set of positive control wells representing the first allele for the target locus, where each respective positive control well in the first set of positive control wells includes a first plurality of guide nucleic acids that have the first allele of the target locus. The signal dataset further includes a second set of positive control wells representing the second allele for the target locus, where each respective positive control well in the second set of positive control wells comprises a second plurality of guide nucleic acids that have the second allele of the target locus.
[00259] In some embodiments, each respective well in the first set of wells comprises one or more nucleic acid molecules corresponding to the first allele of the target locus, and each respective well in the second set of wells comprises one or more nucleic acid molecules corresponding to the second allele of the target locus.
[00260] As described above, in some embodiments, a respective guide nucleic acid in a respective plurality of guide nucleic acids in a well (e.g, a negative control well and/or a positive control well) hybridizes to all or a portion of the nucleic acid sequence of the respective allele.
[00261] Moreover, in some embodiments, any of the embodiments disclosed herein for guide nucleic acids, reporting signals and methods of obtaining the same, alleles, target loci, and wells are contemplated for use with control wells and/or positive control wells, as will be apparent to one skilled in the art. See, e.g., the sections entitled, “Samples,” “Nucleic Acid Amplification,” “Programmable Nuclease-Based Assays,” and “Signal Dataset,” above.
[00262] In some embodiments, the signal dataset is preprocessed for analysis. In some such embodiments, the preprocessing comprises data cleaning, normalization, filtering, and/or any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art.
[00263] Candidate Call Identities.
[00264] Referring to Block 234, the method further includes determining, for each respective well 122 in the plurality of wells, a corresponding signal yield 132 for the respective well using the corresponding plurality of reporting signals 124 for the respective well across the plurality of time points. [00265] In particular, referring to Block 236, in some embodiments, the determining signal yield further comprises, for each respective well 122 in the plurality of wells, normalizing the respective plurality of reporting signals 124 for the respective well by scaling a maximum discrete attribute value by a minimum discrete attribute value in the corresponding plurality of reporting signals 124 for the respective well, thereby obtaining the corresponding signal yield 132 for the respective well. Accordingly, in some such embodiments, the maximum discrete attribute value is an endpoint fluorescence intensity, and the minimum discrete attribute value is a minimum fluorescence intensity. Without being limited to any one theory of operation, each well in the common plate is deemed to have a similar minimum fluorescence at the start of a detection assay, and thus the signal yield for a respective target locus is approximated to be consistent across wells when the concentrations of sample and/or target nucleic acids in the corresponding aliquots of nucleic acid are similar.
[00266] In some embodiments, the signal yield has the form Fy = max(F) / min(F), where Fy is the signal yield, max(F) is the maximum discrete attribute value in the corresponding plurality of reporting signals, and min(F) is the minimum discrete attribute value in the corresponding plurality of reporting signals.
[00267] In some embodiments, when the signal yield for a respective well fails to satisfy a signal yield threshold, the respective well is deemed a no-call. In some embodiments, the signal yield threshold is obtained by determining, for each respective control well in a plurality of control wells in the common plate, a corresponding control signal yield for the respective well using the corresponding plurality of control reporting signals for the respective well across the plurality of time points, thereby obtaining a plurality of control signal yields including a maximum control signal yield and a minimum control signal yield. Accordingly, in some embodiments, the signal yield threshold is a respective control signal yield (e.g, the maximum and/or the minimum control signal yield), and the respective well is deemed a no-call when the signal yield for the respective well is equal to or less than the respective control signal yield. In some embodiments, the signal yield threshold is a central tendency metric for the plurality of control signal yields (e.g, an average control signal yield), and the respective well is deemed a no-call when the signal yield for the respective well is equal to or less than the central tendency metric.
[00268] In some embodiments, each respective control signal yield is determined by normalizing a maximum discrete attribute value in the corresponding plurality of control reporting signals for the corresponding control well by a minimum discrete attribute value in the corresponding plurality of control reporting signals for the corresponding control well. Without being limited to any one theory of operation, the maximum discrete attribute value and the minimum discrete attribute value will generally be similar, due to the lack of target nucleic acid derived from the biological sample. For instance, in the absence of target nucleic acids to which a corresponding guide nucleic acid can hybridize, a complexed programmable nuclease is unlikely to initiate reporter cleavage. As a result, reporting signals are expected to maintain background levels throughout the course of the detection assay. Accordingly, in some embodiments, each respective control signal yield in the plurality of control signal yields is 1 or about 1, and the respective well is deemed a no-call when the signal yield for the respective well is equal to 1 or about 1.
[00269] Referring to Block 238, the method further includes determining, for each respective well 122 in the first set of wells, a respective candidate call identity 134 based on a comparison between (i) a corresponding first signal yield 132 for the respective well in the first set of wells and (ii) a corresponding second signal yield 132 for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities 134. Referring to Block 240, in some embodiments, the respective candidate call identity is selected from the group consisting of first allele, second allele, and no-call (e.g, wild-type, mutant, and/or no-call).
[00270] In some embodiments, for each respective well in the first set of wells, the corresponding signal yield from the corresponding well in the second set of wells originates from a common biological replicate of the biological sample. For instance, in some embodiments, the first well and the corresponding well are matched for comparison across detection replicates in order to determine candidate call identities.
[00271] In some embodiments, the first well and the corresponding well are matched for comparison within a comparative programmable nuclease-based reaction. In some such embodiments, for a respective comparative reaction, a respective portion of nucleic acid is divided into two wells, where a first well is interrogated by a guide nucleic acid having the first allele sequence and a second well is interrogated by a guide nucleic acid having the second allele sequence. Relative reporting signals obtained from the first well and the second well are compared. Comparative programmable nuclease-based reactions are further described in detail herein (see, e.g, the section entitled, “Programmable Nuclease-Based Assays,” above). In some embodiments, the comparative programmable nuclease-based reaction is a DETECTR reaction. In some embodiments, the respective portion of nucleic acid is an aliquot of nucleic acid. In some embodiments, the respective portion of nucleic acid is a plurality of nucleic acids derived from a biological sample. FIG. 4B illustrates a comparative detection reaction including, for each respective pair of wells, a first well 410 interrogated by the wild-type guide nucleic acid “A” and a second well 410 interrogated by the mutant guide nucleic acid “B.”
[00272] In some embodiments, the first well and the corresponding well are matched for comparison across amplification replicates in order to determine candidate call identities.
[00273] Referring to Block 242, in some embodiments, the respective candidate call identity is obtained using a relative intensity metric between the corresponding first signal yield and the corresponding second signal yield. In some embodiments, the relative intensity metric is a ratio, a log ratio, a ratio of binary logarithm, a difference, and/or a percent change.
[00274] In some embodiments, the relative intensity metric has the form log2(Fy(FAl)) / log2(Fy(SAl)),
[00275] where Fy(FAl) is the first allele corresponding signal yield, and Fy(SAl) is the second allele corresponding signal yield.
[00276] In some embodiments, when the relative intensity metric satisfies a first threshold criterion, the candidate call identity is first allele, and when the relative intensity metric satisfies a second threshold criterion, the candidate call identity is second allele. In some embodiments, the first threshold criterion and/or the second threshold criterion is based on a control relative intensity metric for a first control signal yield and a second control signal yield. In some embodiments, the control relative intensity metric is a ratio, a log ratio, a ratio of binary logarithm, a difference, and/or a fold change.
[00277] For instance, in some embodiments, the first threshold criterion is satisfied when the relative intensity metric is greater than the ratio of (i) the binary logarithm of a first control signal yield and (i) the binary logarithm of a second control signal yield, where the first control signal yield is obtained for a first control well in the first set of control wells including a first plurality of guide nucleic acids that have the first allele of the target locus, and the second control signal yield is obtained for a second control well in the second set of control wells including a second plurality of guide nucleic acids that have the second allele of the target locus. Accordingly, in some embodiments, the first threshold criterion is satisfied when the relative intensity metric is greater than log2(Fc(FAl)) / log2(Fc(SAl))
[00278] where Fc(FAl) is the first control signal yield, and Fc(SAl) is the second control signal yield.
[00279] In some embodiments, the ratio of (i) the binary logarithm of a first control signal yield and (i) the binary logarithm of a second control signal yield is 1 or about 1.
Accordingly, in some embodiments, when the relative intensity metric is greater than 1, the candidate call identity is first allele.
[00280] In some embodiments, the second threshold criterion is satisfied when the relative intensity metric is less than the ratio of (i) the binary logarithm of a first control signal yield and (i) the binary logarithm of a second control signal yield, where the first control signal yield is obtained for a first control well in the first set of control wells including a first plurality of guide nucleic acids that have the first allele of the target locus, and the second control signal yield is obtained for a second control well in the second set of control wells including a second plurality of guide nucleic acids that have the second allele of the target locus. Accordingly, in some embodiments, the second threshold criterion is satisfied when the relative intensity metric is less than log2(Fc(FAl)) / log2(Fc(SAl))
[00281] where Fc(FAl) is the first control signal yield, and Fc(SAl) is the second control signal yield.
[00282] In some embodiments, the ratio of (i) the binary logarithm of a first control signal yield and (i) the binary logarithm of a second control signal yield is 1 or about 1.
Accordingly, in some embodiments, when the relative intensity metric is less than 1, the candidate call identity is second allele.
[00283] In some embodiments, the method further comprises obtaining a central tendency metric for the corresponding first signal yield and the corresponding second signal yield. In some embodiments, the central tendency metric is an arithmetic mean, weighted mean, log mean, midrange, midhinge, trimean, geometric mean, geometric median, Winsorized mean, median, and/or mode of the distribution of values.
[00284] In some embodiments, the central tendency metric is calculated as
[log2(Fy(FAl) + Fy(SAl))] / 2, [00285] where Fy(FAl) is the corresponding first signal yield, and Fy(SAl) is the corresponding second signal yield.
[00286] Referring to Block 244, the signal dataset 120 further comprises, for each respective control well in a plurality of control wells in the common plate, a corresponding plurality of control reporting signals, where the plurality of control wells are free of nucleic acid derived from the biological sample. The plurality of control wells comprises a first set of control wells 126-1 representing the first allele for the target locus, where each well in the first set of control wells includes a first plurality of guide nucleic acids that have the first allele of the target locus. The plurality of control wells comprises a second set of control wells 126-2 representing the second allele for the target locus, where each well in the second set of control wells includes a second plurality of guide nucleic acids that have the second allele of the target locus. Each corresponding plurality of control reporting signals comprises, for each respective time point in the plurality of time points, a respective control reporting signal in the form of a corresponding discrete attribute value.
[00287] In some such embodiments, the method further comprises determining, for each respective control well in the plurality of control wells, a corresponding control signal yield for the respective well using the corresponding plurality of control reporting signals for the respective well across the plurality of time points, thereby obtaining a plurality of control signal yields including a maximum control signal yield and a minimum control signal yield for the common plate. The method further includes evaluating whether a respective candidate call identity 134 in the plurality of candidate call identities is a no-call using the maximum control signal yield and the minimum control signal yield.
[00288] For instance, referring to Block 246, in some embodiments, the evaluating comprises performing a comparison of (i) the relative intensity metric between the corresponding first and second signal yields and (ii) the plurality of control signal yields, where, when the relative intensity metric between the corresponding first and second signal yields falls within a range bounded by the maximum control signal yield and the minimum control signal yield, the respective candidate call identity 134 is deemed a no-call.
[00289] In some embodiments, referring to Block 248, each respective control signal yield is determined by normalizing a maximum discrete attribute value in the corresponding plurality of control reporting signals for the corresponding control well by a minimum discrete attribute value in the corresponding plurality of control reporting signals for the corresponding control well. As described above, in some embodiments, each respective control signal yield is 1 or about 1.
[00290] Referring to Block 250, in some embodiments, the method further includes obtaining a central tendency metric for the corresponding first signal yield and the corresponding second signal yield. In some such embodiments, the method further comprises determining, for each respective control well in the first set of control wells, a respective control central tendency metric, thereby obtaining a plurality of control central tendency metrics including a maximum control central tendency metric and a minimum control central tendency metric. In some such embodiments, the evaluating comprises performing a comparison between (i) the central tendency metric of the corresponding first and second signal yields and (ii) the plurality of control central tendency metrics for the first set of control wells, where, when the central tendency metric of the corresponding first and second signal yields falls within a range bounded by the maximum control central tendency metric and the minimum control central tendency metric, the respective candidate call identity 134 is deemed a no-call.
[00291] In some embodiments, the evaluating comprises performing a comparison between (i) the central tendency metric of the corresponding first and second signal yields and (ii) the plurality of control central tendency metrics for the first set of control wells, where, when the central tendency metric of the corresponding first and second signal yields is less than a respective control central tendency threshold, the respective candidate call identity 134 is deemed a no-call. In some embodiments, the control central tendency threshold is a respective control central tendency metric in the plurality of control central tendency metrics (e.g, the maximum control central tendency metric).
[00292] In some embodiments, the control central tendency metric is an arithmetic mean, weighted mean, log mean, midrange, midhinge, trimean, geometric mean, geometric median, Winsorized mean, median, and/or mode of the distribution of values.
[00293] In some embodiments, the control central tendency metric is a log average calculated as
[log2(Fc(FAl) + Fc(SAl))] / 2,
[00294] where Fc(FAl) is a first control signal yield for the respective control well in the first set of control wells, and Fc(SAl) is a second control signal yield for a corresponding control well in the second set of control wells. [00295] Voting Methods.
[00296] Referring to Block 252, the method further includes performing a voting procedure across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus.
[00297] In some embodiments, each respective candidate call identity for each respective well in the first set of wells corresponds to a respective biological sample in one or more biological samples, and the voting procedure comprises assigning the respective candidate call identity as the mutation call for the target locus.
[00298] In some embodiments, the voting procedure comprises applying a concordance vote across the plurality of candidate call identities for the first set of wells, where the mutation call for the target locus is assigned based on a common candidate call identity that is shared by at least a majority of the first set of wells. In some embodiments, the voting procedure comprises applying a concordance vote across the plurality of candidate call identities for the first set of wells, where the mutation call for the target locus is assigned based on a common candidate call identity that is shared by all of the wells in the first set of wells. In some embodiments, a respective mutation call is no-call when no candidate call identity is shared by at least a majority of the first set of wells. For instance, in some embodiments, when the candidate call identities for the first set of wells are equally distributed between first allele and second allele, then the respective mutation call is no-call. In some embodiments, when a respective candidate call identity in the plurality of candidate call identities is no-call, the no-call well is removed from the first set of wells and the concordance vote is performed using the candidate call identities of the remaining wells. In some embodiments, each well in the first set of wells corresponds to a different respective replicate of an amplification reaction. In some embodiments, each well in the first set of wells corresponds to a different respective replicate of a detection assay (e.g, a programmable nuclease-based assay). Partitioning biological samples into replicates is illustrated in FIGS. 4A-4C and described in further detail above (see, e.g., the sections entitled “Nucleic Acid Amplification” and “Programmable Nuclease-Based Assays,” above). In some embodiments, each well in the first set of wells corresponds to a different respective replicate of an amplification reaction, and a respective mutation call is no-call when one or more wells in the amplification reaction fails to satisfy an amplification threshold. See, e.g., the section entitled “Nucleic Acid Amplification,” above. [00299] Other embodiments for voting procedures and/or concordance votes are contemplated, as will be apparent to one skilled in the art and as disclosed further herein.
[00300] For instance, referring to Block 254, in some embodiments, the method further includes, prior to the performing the voting procedure, binning the first set of wells into a plurality of bins, where each respective bin comprises a respective subset of wells originating from a common respective biological replicate of the biological sample. The voting procedure further comprises (i) performing, for each respective bin in the plurality of bins, a respective first concordance vote across the candidate call identities 134 for the subset of wells in the respective bin, thereby generating a plurality of bin votes 414, and (ii) applying a second concordance vote across the plurality of bin votes 414, thereby obtaining the mutation call 416 for the target locus. Referring to Block 256, in some embodiments, each respective bin vote in the plurality of bin votes is selected from the group consisting of first allele, second allele, and no-call.
[00301] In some embodiments, as illustrated in FIGS. 4B-4C, the first allele is wild-type and the second allele is mutant. In some embodiments, the first allele is a first mutant, and the second allele is a second mutant.
[00302] Referring to Block 258, in some embodiments, a respective first concordance vote generates a respective bin vote based on a common candidate call identity that is shared by at least a majority of the subset of wells for the respective bin. For example, as illustrated in FIG. 4B, consider the case of three wells, in which at least two wells have a candidate call identity of wild-type. In some such embodiments, the first concordance vote generates a bin vote 414 that is wild-type. Similarly, as illustrated in FIG. 4C, where the subset of wells consists of three wells and at least two of the wells have a candidate call identity of mutant, then the first concordance vote generates a bin vote 414 that is mutant.
[00303] In some embodiments, the respective first concordance vote generates a respective bin vote based on a common candidate call identity that is shared by at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 95% of the wells in the subset of wells for the respective bin.
[00304] In some embodiments, a respective first concordance vote generates a respective bin vote based on a common candidate call identity that is shared by all of the wells in the subset of wells for the respective bin. [00305] In some embodiments, a respective bin vote is no-call when no candidate call identity is shared by at least a majority of the subset of wells for the respective bin. For instance, in some embodiments, when the candidate call identities for the subset of wells are equally distributed between first allele and second allele, then the respective bin vote is nocall.
[00306] Referring to Block 260, in some embodiments, a respective bin vote is no-call when at least one well in the subset of wells for the respective bin has a candidate call identity of no-call. For instance, as illustrated in FIG. 4C, where the subset of wells consists of three wells and one of the wells has a candidate call identities of no-call, then the first concordance vote generates a bin vote 414 for the respective bin that is no-call. In some embodiments, when a respective candidate call identity is no-call, the no-call well is removed from the plurality of wells and the bin vote is performed using the candidate call identities of the remaining wells.
[00307] Referring to Block 262, in some embodiments, the second concordance vote generates a mutation call 416 based on a common bin vote that is shared by at least a majority of the bins in the plurality of bins. FIG. 4B illustrates a plurality of three bins, where at least two of the bins have a bin vote of wild-type, and where the second concordance vote generates a mutation call 416 for the target locus that is wild-type. Similarly, FIG. 4C illustrates a plurality of three bins, where at least two of the bins have a bin vote of mutant, and where the second concordance vote generates a mutation call 416 for the target locus that is mutant.
[00308] In some embodiments, the second concordance vote generates a mutation call based on a common bin vote that is shared by at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 95% of the bins in the plurality of bins. In some embodiments, the second concordance vote generates a mutation call based on a common bin vote that is shared by all of the bins in the plurality of bins.
[00309] In some embodiments, the mutation call is no-call when no bin vote is shared by at least a majority of the plurality of bins. For instance, in some embodiments, when the bin votes for the plurality of bins are equally distributed between first allele and second allele, then the mutation call is no-call. [00310] In some embodiments, as illustrated in FIG. 4C, when a respective bin vote is nocall, the no-call bin is removed from the plurality of bin votes and the second concordance vote is performed with the remaining bins.
[00311] Referring to Block 264, in some embodiments, each respective bin in the plurality of bins corresponds to a different respective biological replicate of an amplification reaction, and the respective subset of wells for the respective bin is partitioned from the respective biological replicate. Partitioning biological samples into replicates is illustrated in FIGS. 4A- 4C and described in further detail above (see, e.g, the sections entitled “Nucleic Acid Amplification” and “Programmable Nuclease-Based Assays,” above).
[00312] In some embodiments, the plurality of bins comprises between 2 and 10 bins. In some embodiments, the plurality of bins comprises at least 3, at least 4, at least 5, at least 6, at least 8, at least 10, at least 15, or at least 20 bins. In some embodiments, the plurality of bins comprises no more than 30, no more than 20, no more than 10, or no more than 6 bins. In some embodiments, the plurality of bins is from 3 to 9, from 4 to 12, or from 10 to 25 bins. In some embodiments, the plurality of bins falls within another range starting no lower than 3 bins and ending no higher than 30 bins.
[00313] In some embodiments, each respective bin in the plurality of bins comprises between 2 and 10 wells in the respective subset of wells. In some embodiments, each respective subset of wells comprises at least 3, at least 4, at least 5, at least 6, at least 8, at least 10, at least 15, or at least 20 wells. In some embodiments, each respective subset of wells comprises no more than 30, no more than 20, no more than 10, or no more than 6 wells. In some embodiments, each respective subset of wells is from 3 to 9, from 4 to 12, or from 10 to 25 wells. In some embodiments, each respective subset of wells falls within another range starting no lower than 3 wells and ending no higher than 30 wells.
[00314] In some embodiments, the systems and methods disclosed herein are used for comparison with one or more additional mutation calling methods (e.g., sequencing-based methods such as whole genome sequencing). In some embodiments, the systems and methods disclosed herein are performed concurrently with one or more additional mutation calling methods (e.g., sequencing-based methods such as whole genome sequencing).
[00315] In some embodiments, the systems and methods for determining a mutation call disclosed herein are performed for each respective mutant allele in a plurality of mutant alleles (see, e.g., the section entitled “Samples,” above). For example, in some embodiments, the method further includes performing an instance of the obtaining a mutation call for the target locus, for each respective mutant allele in the plurality of mutant alleles for the target locus.
[00316] In one example of some such embodiments, the method further comprises, for each candidate second allele in a plurality of candidate second alleles, repeating the obtaining a signal dataset, determining corresponding signal yields, determining respective candidate call identities, and performing a voting procedure, thereby obtaining a plurality of candidate mutation calls for the target locus. In some such embodiments, the method further includes performing a mutation call voting procedure across the plurality of candidate mutation calls for the target locus, thereby obtaining a final mutation call for the target locus.
[00317] In some embodiments, the first allele is a wild-type allele and each respective candidate second allele in the plurality of candidate second alleles is a different respective mutant allele for the target locus. For example, in some such embodiments, the target locus comprises at least a first allele that is a wild-type allele and a plurality of candidate second alleles, where each respective candidate second allele is a different mutant allele for the target locus.
[00318] In some embodiments, when a single respective candidate mutation call in the plurality of candidate mutation calls corresponds to a respective candidate second allele in the plurality of candidate second alleles, the respective candidate mutation call is determined as the final mutation call for the target locus. For example, in some implementations, the first allele is a wild-type allele, each respective candidate second allele in the plurality of candidate second alleles is a different respective mutant allele for the target locus, and the final mutation call for the target locus is a candidate mutation call that corresponds to a respective mutant allele. In some such embodiments, the final mutation call for the target locus is the only candidate mutation call that corresponds to a respective mutant allele, where every other candidate mutation call in the plurality of candidate mutation calls corresponds to the wild-type allele.
[00319] Thus, in an example embodiment, a target allele comprises a wild-type allele and n mutant alleles, where n is a positive integer of 1 or greater (e.g., n is between 1 and 100). N comparisons are performed between the wild-type allele and each of the n mutant alleles, in accordance with some embodiments of the present disclosure, thus obtaining a corresponding n candidate mutation calls. In some implementations, one of the n comparisons yields a candidate mutation call that identifies the corresponding mutant allele, while every other comparison yields a candidate mutation call that identifies the wild-type allele. In some such implementations, the candidate mutation call that identifies the corresponding mutant allele is selected as the final mutation call, and the identity of the target locus is determined to be the corresponding mutant allele (e.g, SNP).
[00320] Accordingly, for an example target allele comprising a wild-type allele and four possible mutant alleles (n = 4 in this example each instance of the obtaining a mutation call for each of the four mutant alleles compared with the wild-type allele would generate the following candidate mutation calls: log2(Fy(WT)) > log2(Fy(Ml)) Wild Type log2(Fy(WT)) > log2(Fy(M2)) Wild Type log2(Fy(WT)) > log2(Fy(M3)) Wild Type log2(Fy(WT)) < log2(Fy(M4)) Mutant (4)
[00321] As only one of the mutant alleles (e.g, mutant 4) is represented as a candidate mutation call, in some such embodiments, mutant 4 is selected as the final mutation call.
[00322] In some embodiments, when a sub-plurality of candidate mutation calls in the plurality of candidate mutations calls corresponds to a respective sub-plurality of candidate second alleles in the plurality of candidate second alleles, the obtaining the final mutation call for the target locus is based upon a comparison of signal yields for each pair of candidate second alleles in the sub-plurality of candidate second alleles. For example, in some implementations, the first allele is a wild-type allele, each respective candidate second allele in the plurality of candidate second alleles is a different respective mutant allele for the target locus, and two or more candidate mutation calls in the plurality of candidate mutation calls identifies a corresponding two or more mutant alleles. In some such embodiments, a comparison is performed between each respective pair of mutant alleles corresponding to candidate mutation calls to determine which mutant allele has the highest signal yield and thus is selected as the final mutation call. In some such embodiments, the comparison is performed by repeating the obtaining a signal dataset, determining corresponding signal yields, determining respective candidate call identities, and performing a voting procedure, in accordance with the methods disclosed above, where the first allele is a first mutant allele in a respective pair of mutant alleles and the second allele is a second mutant allele in the respective pair of mutant alleles.
[00323] Accordingly, for an example target allele comprising a wild-type allele and four possible mutant alleles, each instance of the obtaining a mutation call for each of the four mutant alleles compared with the wild-type allele would generate the following candidate mutation calls: log2(Fy(WT)) > log2(Fy(Ml)) Wild Type log2(Fy(WT)) < log2(Fy(M2)) Mutant(2) log2(Fy(WT)) > log2(Fy(M3)) Wild Type log2(Fy(WT)) < log2(Fy(M4)) Mutant (4)
[00324] Two of the mutant alleles (e.g, mutant 2 and mutant 4) are represented as a candidate mutation call. Comparison of the signal yields of the pair of mutant alleles represented as candidate mutation calls reveals that one of the mutant alleles has a higher signal yield than the other: log2(Fy(M2)) < log2(Fy(M4)) Mutant(4)
[00325] Thus, the mutant allele (e.g. , mutant 4) having the highest signal yield in the comparison is selected as the final mutation call.
[00326] In another example of some such embodiments, all the candidate second alleles in the plurality of candidate second alleles are in the same plate. In such embodiments, the signal dataset comprises, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus. In this example, the signal dataset represents a plurality of time points. Moreover, the plurality of wells comprises a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus. Further still, the plurality of wells also comprises, for each respective candidate second allele in a set of candidate second alleles, a respective additional set of wells representing the respective candidate second allele for the target locus, where each well in this respective additional set of wells includes a corresponding plurality of guide nucleic acids that have the respective candidate second allele for the target locus. Each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value. Also, each respective well in the first set of wells and each respective well in each respective additional set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample. In this example, there is determined, for each respective well in the plurality of wells, a corresponding signal yield for the respective well using the corresponding plurality of reporting signals for the respective well across the plurality of time points. In this example, there is further determined, for each respective well in the first set of wells, for each respective candidate second allele in the set of candidate second alleles, a respective candidate call identity based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding respective second signal yield for a corresponding well in the respective additional set of wells for the respective candidate second allele, thereby obtaining a plurality of candidate call identities. A voting procedure is then performed across the plurality of candidate call identities, thereby obtaining a mutation call for the target locus that is one of the candidate second alleles in the set of candidate second alleles or the first allele. In some embodiments in accordance with this example, the first allele is a wild-type allele and each respective candidate second allele in the set of candidate second alleles is a different respective mutant allele for the target locus. In some embodiments in accordance with this example, the first allele is other than a wild-type allele. In some embodiments in accordance with this example, the set of candidate second alleles consists of between 2 and 10, between 2 and 20, or between 2 and 100 candidate second alleles. In some embodiments, the systems and methods for determining a mutation call disclosed herein are performed for each target locus in a plurality of target loci. For instance, FIGS. 4A-4B illustrates a method in accordance with the present disclosure, which is performed for each of three target loci. Additionally, Example 1 describes an exemplary method in accordance with the present disclosure performed for each of amino acid positions 452, 484, and 501. In some embodiments, each respective target locus in the plurality of target loci comprises any of the embodiments disclosed herein for a first respective target locus, as well as any substitutions, modifications, additions, deletions, and/or combinations thereof (see, e.g., the section entitled “Samples,” above), as will be apparent to one skilled in the art. [00327] In some embodiments, the plurality of target loci comprises at least 2, at least 5, at least 10, at least 20, at least 30, at least 50, at least 100, or at least 200 target loci. In some embodiments, the plurality of target loci comprises no more than 500, no more than 200, no more than 100, no more than 50, or no more than 10 target loci. In some embodiments, the plurality of target loci is from 2 to 10, from 5 to 50, or from 10 to 100 target loci. In some embodiments, the plurality of target loci falls within another range starting no lower than 2 loci and ending no higher than 500 loci.
[00328] In some embodiments, each respective target locus in the plurality of target loci maps to a reference sequence for a single organism. For instance, in some embodiments, a respective organism (e.g., a virus) comprises a plurality of target loci at which mutations can occur. Determining the number and type of mutations at each of the target loci of interest can inform identification and classification of organisms (e.g., by strain, variant, and/or subtype).
[00329] In some embodiments, the systems and methods for determining a mutation call disclosed herein are performed for each sample in a plurality of samples. For instance, FIG. 4A illustrates a first plurality of amplification replicates partitioned from a first sample and a second plurality of amplification replicates partitioned from a second sample. FIG. 4B illustrates the determination of mutation calls, in accordance with an embodiment of the present disclosure, using candidate call identities obtained for the first sample, while FIG. 4C illustrates the determination of mutation calls using candidate call identities obtained for the second sample.
[00330] In some embodiments, the plurality of samples comprises at least 2, at least 5, at least 10, at least 20, at least 30, at least 50, at least 100, at least 200, or at least 500 samples. In some embodiments, the plurality of samples comprises no more than 1000, no more than 500, no more than 200, no more than 100, no more than 50, or no more than 10 samples. In some embodiments, the plurality of samples is from 2 to 20, from 5 to 100, or from 10 to 200 samples. In some embodiments, the plurality of samples falls within another range starting no lower than 2 samples and ending no higher than 1000 samples.
[00331] Suitable embodiments for performing the present systems and methods for a plurality of loci and/or for a plurality of samples, including samples, target loci, target nucleic acids, wells, nucleic acid amplification, programmable nuclease-based assays, reporters, guide nucleic acid sequences, candidate call identities and determining the same, voting procedures, and mutation calls, and any characteristics or elements thereof, include any of the embodiments for samples, target loci, target nucleic acids, wells, nucleic acid amplification, programmable nuclease-based assays, reporters, guide nucleic acid sequences, candidate call identities and determining the same, voting procedures, and mutation calls, disclosed herein for a single target locus and/or for a single sample, as well as any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art.
[00332] Additional embodiments.
[00333] As described above, one aspect of the present disclosure provides a method for determining a mutation call for a target locus in a biological sample, at a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors. The method includes amplifying a first plurality of nucleic acids derived from the biological sample, thereby obtaining a plurality of amplified nucleic acids. For each respective well in a plurality of wells in a common plate, a procedure is performed including (i) partitioning, from the plurality of amplified nucleic acids, a respective corresponding aliquot of nucleic acid derived from the biological sample, and (ii) contacting the respective corresponding aliquot of nucleic acid with at least a programmable nuclease, a guide nucleic acid, and a plurality of reporters.
[00334] A signal dataset is obtained including, for each respective well in the plurality of wells in the common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules derived from the biological sample that map to the target locus. The signal dataset represents a plurality of time points. The plurality of wells includes a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus. The plurality of wells further includes a second set of wells representing a second allele for the target locus, where each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus. Each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value obtained from a cleavage of one or more respective reporters, in the plurality of reporters, by the programmable nuclease.
[00335] For each respective well in the plurality of wells, a corresponding signal yield for the respective well is determined using the corresponding plurality of reporting signals for the respective well across the plurality of time points. For each respective well in the first set of wells, a respective candidate call identity is determined based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding second signal yield for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities. A voting procedure is performed across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus.
[00336] Another aspect of the present disclosure provides a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors, the one or more programs comprising instructions for performing a method including obtaining a signal dataset comprising, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus. The signal dataset represents a plurality of time points. The plurality of wells comprises a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus. The plurality of wells further comprises a second set of wells representing a second allele for the target locus, where each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus. Each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value. Each respective well in the first set of wells and each respective well in the second set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample.
[00337] For each respective well in the plurality of wells, a corresponding signal yield for the respective well is determined using the corresponding plurality of reporting signals for the respective well across the plurality of time points. For each respective well in the first set of wells, a respective candidate call identity is determined based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding second signal yield for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities. A voting procedure is performed across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus. [00338] Another aspect of the present disclosure provides a non-transitory computer readable storage medium storing one or more programs configured for execution by a computer, the one or more programs comprising instructions for carrying out a method including obtaining a signal dataset comprising, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus. The signal dataset represents a plurality of time points. The plurality of wells comprises a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus. The plurality of wells further comprises a second set of wells representing a second allele for the target locus, where each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus. Each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value. Each respective well in the first set of wells and each respective well in the second set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample.
[00339] For each respective well in the plurality of wells, a corresponding signal yield for the respective well is determined using the corresponding plurality of reporting signals for the respective well across the plurality of time points. For each respective well in the first set of wells, a respective candidate call identity is determined based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding second signal yield for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities. A voting procedure is performed across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus.
[00340] Another aspect of the present disclosure provides a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors, the one or more programs comprising instructions for performing any of the methods and/or embodiments disclosed herein. In some embodiments, any of the presently disclosed methods and/or embodiments are performed at a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors. [00341] Another aspect of the present disclosure provides a non-transitory computer readable storage medium storing one or more programs configured for execution by a computer, the one or more programs comprising instructions for carrying out any of the methods disclosed herein.
[00342] Embodiments for Detecting SARS-CoV-2 Variants
[00343] I. Samples.
[00344] Described herein are compositions, systems and methods for assaying for a SARS-CoV-2 variant. A number of samples are consistent with the methods, and reagents disclosed herein.
[00345] In some cases, the sample comprises a target nucleic acid which is a gene fragment of a SARS-CoV-2 variant. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Alpha strain/B.1.1.7 and Q lineages and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Beta strain/B.1.351 and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Gamma strain/P.1 and descendent lineages. In some embodiments, the SARS- CoV-2 variant that is specifically targeted is Epsilon strain/B.1.427 and B.1.429 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Kappa strain/B.1.617.1 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Zeta strain/P.2 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Delta strain/B.1.617.2 and AY lineages and descendent lineages. In some embodiments, the SARS- CoV-2 variant that is specifically targeted is Zeta strain and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is B.1.526 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is B.1.526.1 lineage and descendent lineages. In some embodiments, the SARS- CoV-2 variant that is specifically targeted is B.1.525 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is B.1.617 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is B.1.617.3 lineage and descendent lineages. In some embodiments, the SARS- CoV-2 variant that is specifically targeted is B.1.617.1 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Lambda strain/C.37 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Eta strain/B.1.525 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Iota strain/B.1.526 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Mu strain/B.1.621 and B.1.621.1 lineages and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Omicron strain/B.1.1.529 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is a lineage obtained from a reference database (see, for example, Rambaut et al. , (2020) Nature Microbiology, doi:10.1038/s41564-020-0770-5; and Cov-Lineages, available on the Internet at cov-lineages.org/lineage_hst.html).
[00346] In some embodiments, the SARS-CoV-2 variant that is specifically targeted has one or more mutation(s) in the gene encoding Spike (S) protein. In some embodiments, the SARS-CoV-2 variant that is specifically targeted has one or more mutation(s) in the gene encoding Nucleocapsid (N) protein. In some embodiments, the SARS-CoV-2 variant that is specifically targeted has one or more mutation(s) in the gene encoding Envelop (E) protein. In some embodiments, the SARS-CoV-2 variant that is specifically targeted has one or more mutation(s) in the gene encoding Membrane glycoprotein (M) protein. In some embodiments, the SARS-CoV-2 variant that is specifically targeted has one or more mutation(s) in ORF la. In some embodiments, the SARS-CoV-2 variant that is specifically targeted has one or more mutation(s) in ORF lb. In some instances, the target nucleic acid comprises a segment of a nucleic acid from the SARS-CoV-2 variant comprising the one or more mutations.
[00347] In some instances, the sample is a biological sample from an individual. In some instances, a biological sample is a blood, serum, plasma, saliva, urine, mucosal sample, peritoneal sample, cerebrospinal fluid, gastric secretions, nasal secretions, sputum, pharyngeal exudates, urethral or vaginal secretions, an exudate, an effusion, or tissue sample. In some instances, the sample is a nasal swab. In specific instances, the nasal swab is a nasopharyngeal swab. In some instances, the sample is a throat swab. In specific instances, the throat swab is an oropharyngeal swab. A tissue sample may be dissociated or liquified prior to application to detection system of the disclosure. A sample from an environment may be from soil, air, or water. In some instances, the environmental sample is taken as a swab from a surface of interest or taken directly from the surface of interest. In some instances, the raw sample is applied to the detection system. In some instances, the sample is diluted with a buffer or a fluid or concentrated prior to application to the detection system or be applied neat to the detection system. [00348] Sometimes, the sample is contained in no more 20 pL. The sample, in some cases, is contained in no more than 1, 5, 10, 15, 20, 25, 30, 35 40, 45, 50, 55, 60, 65, 70, 75, 80, 90, 100, 200, 300, 400, 500 pL, or any of value from 1 pL to 500 pL. Sometimes, the sample is contained in more than 500 pL.
[00349] Some methods described herein can detect a target nucleic acid present in the sample in various concentrations or amounts as a target nucleic acid. In some cases, the sample has at least 2 target nucleic acids. In some cases, the sample has at least 3, 5, 10, 20, 30, 40, 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, or 10000 target nucleic acids. In some cases, the method detects target nucleic acid present at least at one copy per 101 non-target nucleic acids, 102 non-target nucleic acids, 103 non-target nucleic acids, 104 non-target nucleic acids, 105 non-target nucleic acids, 106 non-target nucleic acids, 107 non-target nucleic acids, 108 non-target nucleic acids, 109 non-target nucleic acids, or IO10 non-target nucleic acids.
[00350] Any of the above disclosed samples are consistent with the systems, assays, and effector proteins disclosed herein.
[00351] II. Methods of Nucleic Acid Detection.
[00352] Provided herein are methods of detecting target nucleic acids indicative of, at least in part, SARS-CoV-2 variants. The SARS-CoV-2 variant, in some embodiments, is any one of the WA-1, Alpha, Beta, Gamma, Delta, Epsilon, Kappa, Omicron, or Zeta SARS-CoV-2 variants. In some embodiments, the target nucleic acid comprises a segment of an S (Spike) gene of a SARS-CoV-2 variant. The segment of the S gene, in some embodiments, comprises a mutation (e.g., an SNP) relative to a wild-type SARS-CoV-2 protein. The mutation, in some embodiments, is used, at least in part, to identify the SARS-CoV-2 variant (e.g, to distinguish it from other variants or from a wild-type SARS-CoV-2). Therefore detection of the target nucleic acid comprising the mutation in a sample, in some embodiments, is used to identify the sample as comprising the SARS-CoV-2 variant.
[00353] Methods described herein, in some embodiments, comprises contacting the sample to a complex comprising a guide nucleic acid comprising a segment that is reverse complementary to a segment of the target nucleic acid and an effector protein that exhibits sequence independent cleavage upon forming a complex comprising the segment of the guide nucleic acid binding to the segment of the target nucleic acid (e.g, comprising a segment of a SARS-CoV-2 variant); and assaying for a signal indicating cleavage of a reporter, wherein the signal indicates a presence of the target nucleic acid in the sample and wherein absence of the signal indicates an absence of the target nucleic acid in the sample. When a guide nucleic acid binds to a target nucleic acid (e.g, a nucleic acid from a SARS-CoV-2 variant), the effector proteins trans cleavage activity can be initiated, and reporter nucleic acids can be cleaved, resulting in the detection of fluorescence indicative of, at least in part, the presence of the target nucleic acid. The cleaving of the reporter nucleic acid using the effector protein, in some embodiments, cleaves with an efficiency of 50% as measured by a change in a signal that is calorimetric, potentiometric, amperometric, optical (e.g, fluorescent, colorimetric, etc.), or piezo-electric, as non-limiting examples. In some cases, the cleavage efficiency is at least 40%, 50%, 60%, 70%, 80%, 90%, or 95% as measured by a change in a signal that is calorimetric, potentiometric, amperometric, optical (e.g, fluorescent, colorimetric, etc.), or piezo-electric, as non-limiting examples. In some cases, the method described herein detect a target nucleic acid (e.g, a nucleic acid from a SARS-CoV-2 variant) with an effector protein and a detector nucleic acid in a sample where the sample is contacted with the reagents for a predetermined length of time sufficient for trans cleavage of the single-stranded detector nucleic acid.
[00354] Some methods described herein comprise a) contacting the sample to i) a detector nucleic acid; and ii) a composition comprising a effector protein and a non-naturally occurring guide nucleic acid having a nucleotide sequence that is at least 80% identical to SEQ ID NO: 1, 2, 3, 4, 5, or 6 (or any one of SEQ ID NOS: 22-27 or 40-42) that hybridizes to a segment of the target nucleic acid, wherein the effector protein cleaves the detector nucleic acid upon hybridization of the non-naturally occurring guide nucleic acid to the segment of the coronavirus target nucleic acid; and b) assaying for a change in a signal, wherein the change in the signal is produced by cleavage of the detector nucleic acid. In some specific embodiments, the detector nucleic comprises acid a nucleotide sequence that is at least 62.5% identical to SEQ ID NO: 7, and flanked with a fluorescent dye on the 5’ end and a quencher on the 3’ end. In some specific embodiments, the detector nucleic acid comprises a nucleotide sequence that is at least 75% identical to SEQ ID NO: 7, and flanked with a fluorescent dye on the 5’ end and a quencher on the 3’ end. In some specific embodiments, the detector nucleic acid comprises a nucleotide sequence that is at least 87.5% identical to SEQ ID NO: 7, and flanked with a fluorescent dye on the 5’ end and a quencher on the 3’ end. In some specific embodiments, the non-naturally occurring guide nucleic acid having a nucleotide sequence that is at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 1, 2, 3, 4, 5, or 6, or any one of SEQ ID NOS: 22-27 or 40-42.
[00355] In some instances, the method further comprises assaying for a control sequence by contacting the control nucleic acid to a second detector nucleic acid and a composition comprising the programmable nuclease and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the control nucleic acid, wherein the programmable nuclease cleaves the detector nucleic acid upon hybridization of the non-naturally occurring guide nucleic acid to the segment of the control nucleic acid.
[00356] Contacting the sample with the effector protein and guide nucleic acid, in some embodiments, occurs at a temperature of at least about 25°C, at least about 30°C, at least about 35°C, at least about 40°C, at least about 50°C, or at least about 65°C. In some instances, the temperature is not greater than 80°C. In some instances, the temperature is about 25°C, about 30°C, about 35°C, about 40°C, about 45°C, about 50°C, about 55°C, about 60°C, about 65°C, or about 70°C. In some instances, the temperature is about 25°C to about 45°C, about 35°C to about 55°C, or about 55°C to about 65°C.
[00357] In some cases, methods of the disclosure detect a target nucleic acid (e.g. , the target nucleic acid indicative of, at least in part, a SARS-CoV-2 variant) in less than 60 minutes. In some cases, methods of the disclosure detect a target nucleic acid in less than about 120 minutes, less than about 110 minutes, less than about 100 minutes, less than about 90 minutes, less than about 80 minutes, less than about 70 minutes, less than about 60 minutes, less than about 55 minutes, less than about 50 minutes, less than about 45 minutes, less than about 40 minutes, less than about 35 minutes, less than about 30 minutes, less than about 25 minutes, less than about 20 minutes, less than about 15 minutes, less than about 10 minutes, less than about 5 minutes, less than about 4 minutes, less than about 3 minutes, less than about 2 minutes, or less than about 1 minute.
[00358] In some cases, the methods of detecting are performed in less than 10 hours, less than 9 hours, less than 8 hours, less than 7 hours, less than 6 hours, less than 5 hours, less than 4 hours, less than 3 hours, less than 2 hours, less than 1 hour, less than 50 minutes, less than 45 minutes, less than 40 minutes, less than 35 minutes, less than 30 minutes, less than 25 minutes, less than 20 minutes, less than 15 minutes, less than 10 minutes, less than 9 minutes, less than 8 minutes, less than 7 minutes, less than 6 minutes, or less than 5 minutes. In some cases, methods of detecting are performed in about 5 minutes to about 10 hours, about 10 minutes to about 8 hours, about 15 minutes to about 6 hours, about 20 minutes to about 5 hours, about 30 minutes to about 2 hours, or about 45 minutes to about 1 hour.
[00359] The results from the completed assay can be detected and analyzed in various ways. In some cases, signal produced by the reaction is visible by eye, and the results can be read by the user. In some cases, the signal is visualized by an imaging device or other device depending on the type of signal. Often, the imaging device is a digital camera, such a digital camera on a mobile device. The mobile device, in some embodiments, has a software program or a mobile application that can capture an image of the support medium, identify the assay being performed, detect the detection region and the detection spot, provide image properties of the detection spot, analyze the image properties of the detection spot, and provide a result. Alternatively, or in combination, the imaging device can capture fluorescence, ultraviolet (UV), infrared (IR), or visible wavelength signals. The imaging device, in some embodiments, has an excitation source to provide the excitation energy and captures the emitted signals. In some cases, the excitation source is a camera flash and optionally a filter. In some cases, the imaging device is used together with an imaging box that is placed over the support medium to create a dark room to improve imaging. The imaging box can be a cardboard box that the imaging device can fit into before imaging. In some instances, the imaging box has optical lenses, mirrors, filters, or other optical elements to aid in generating a more focused excitation signal or to capture a more focused emission signal. Often, the imaging box and the imaging device are small, handheld, and portable to facilitate the transport and use of the assay in remote or low resource settings.
[00360] The assay described herein can be visualized and analyzed by a mobile application (app) or a software program. Using the graphic user interface (GUI) of the app or program, in some embodiments, an individual takes an image of the support medium, including the detection region, barcode, reference color scale, and fiduciary markers on the housing, using a camera on a mobile device. The program or app reads the barcode or identifiable label for the test type, locates the fiduciary marker to orient the sample, reads the detectable signals, compares against the reference color grid, and determines the presence or absence of the target nucleic acid, which indicates the presence of the gene, virus, or the agent responsible for the disease. The mobile application, in some embodiments, presents the results of the test to the individual. The mobile application, in some embodiments, stores the test results in the mobile application. The mobile application, in some embodiments, communicates with a remote device and transfer the data of the test results. The test results, in some embodiments, are viewable remotely from the remote device by another individual, including a healthcare professional. A remote user, in some embodiments, accesses the results and uses the information to recommend action for treatment, intervention, cleanup of an environment.
[00361] In some cases, the method of disclosure is used as an initial screen for the presence of an SNP before whole genome sequencing. In other cases, the method of disclosure is used to replace whole genome sequencing. In other cases, the method of disclosure is used to detect specific mutations associated with neutralizing antibody evasion before physicians making a clinical decision of using certain antibody drugs to treat the infection. In other cases, the method of disclosure is used for contact tracing within communities.
[00362] a. Sample Preparation
[00363] Methods described herein, in some embodiments, comprise contacting a nasal swab or a throat swab from an individual. In some instances, the nasal swab is a nasopharyngeal swab. In some instances, the throat swab is an oropharyngeal swab. The procedures to collect a nasal swab is well known in the relevant field. Briefly, a tapered swab, in some embodiments, is used. When the individual is tilted to a certain angel, the swab is inserted into the individual’s nasal cavity parallel to the palate until resistance is met at turbinates. The swab, in some embodiments, is preserved into a sterile tube. The procedures to collect a throat swab are also well known in the relevant field. Briefly, in some embodiments, a swab is inserted into the mouth of the individual, and touches both tonsillar pillars and posterior oropharynx. In some embodiments, the swab is preserved into a sterile tube.
[00364] In some embodiments, methods described herein comprise preparing samples, including lysing the sample. In some instances, lysing the sample comprises contacting the sample to a lysis buffer.
[00365] A lysis reaction, in some embodiments, is performed at a range of temperatures. In some embodiments, a lysis reaction is performed at about room temperature. In some embodiments, a lysis reaction is performed at about 95°C. In some embodiments, a lysis reaction is performed at from 1 °C to 10 °C, from 4 °C to 8 °C, from 10 °C to 20 °C, from 15 °C to 25 °C, from 15 °C to 20 °C, from 18 °C to 25 °C, from 18 °C to 95 °C, from 20 °C to 37 °C, from 25 °C to 40 °C, from 35 °C to 45 °C, from 40 °C to 60 °C, from 50 °C to 70 °C, from 60 °C to 80 °C, from 70 °C to 90 °C, from 80 °C to 95 °C, or from 90 °C to 99 °C. In some embodiments, a lysis reaction is performed for about 5 minutes, about 15 minutes, or about 30 minutes. In some embodiments, a lysis reaction is performed for from 2 minutes to 5 minutes, from 3 minutes to 8 minutes, from 5 minutes to 15 minutes, from 10 minutes to 20 minutes, from 15 minutes to 25 minutes, from 20 minutes to 30 minutes, from 25 minutes to 35 minutes, from 30 minutes to 40 minutes, from 35 minutes to 45 minutes, from 40 minutes to 50 minutes, from 45 minutes to 55 minutes, from 50 minutes to 60 minutes, from 55 minutes to 65 minutes, from 60 minutes to 70 minutes, from 65 minutes to 75 minutes, from 70 minutes to 80 minutes, from 75 minutes to 85 minutes, or from 80 minutes to 90 minutes.
[00366] The lysis buffer disclosed herein, in some embodiments, comprises a viral lysis buffer. A viral lysis buffer, in some embodiments, lyses a coronavirus capsid in a viral sample (e.g, a sample collected from an individual suspected of having a coronavirus infection), releasing a viral genome. The viral lysis buffer, in some embodiments, is compatible with amplification (e.g, RT-LAMP amplification) of a target region of the viral genome. The viral lysis buffer, in some embodiments, is compatible with detection (e.g, a DETECTR reaction disclosed herein). A sample, in some embodiments, is prepared in a one- step sample preparation method comprising suspending the sample in a viral lysis buffer compatible with amplification, detection (e.g, a DETECTR reaction), or both. A viral lysis buffer compatible with amplification (e.g, RT-LAMP amplification), detection (e.g, DETECTR), or both, in some embodiments, comprises a buffer (e.g, Tris-HCl, phosphate, or HEPES), a reducing agent (e.g, N-Acetyl Cysteine (NAC), Dithiothreitol (DTT), [3- mercaptoethanol (BME), or tris(2-carboxyethyl)phosphine (TCEP)), a chelating agent (e.g, EDTA or EGTA), a detergent (e.g, deoxy cholate, NP-40 (Ipgal), Triton X-100, or Tween 20), a salt (e.g, ammonium acetate, magnesium acetate, manganese acetate, potassium acetate, sodium acetate, ammonium chloride, potassium chloride, magnesium chloride, manganese chloride, sodium chloride, ammonium sulfate, magnesium sulfate, manganese sulfate, potassium sulfate, or sodium sulfate), or a combination thereof. For example, a viral lysis buffer, in some embodiments, comprises a buffer and a reducing agent, or a viral lysis buffer comprises a buffer and a chelating agent. The viral lysis buffer, in some embodiments, is formulated at a low pH. For example, the viral lysis buffer, in some embodiments, is formulated at a pH of from about pH 4 to about pH 5. In some embodiments, the viral lysis buffer, in some embodiments, is formulated at a pH of from about pH 4 to about pH 9. The viral lysis buffer, in some embodiments, further comprises a preservative (e.g, ProClin 150). In some cases, the viral lysis buffer, in some embodiments, comprises an activator of the amplification reaction. For example, the buffer, in some embodiments, comprises primers, dNTPs, or magnesium (e.g., MgSOi. MgCh or MgOAc), or a combination thereof, to activate the amplification reaction. In some cases, an activator (e.g., primers, dNTPs, or magnesium) , in some embodiments, is added to the buffer following lysis of the coronavirus to initiate the amplification reaction.
[00367] b. RNA extraction
[00368] Methods described herein, in some embodiments, comprise RNA extraction. In some embodiments, the method described herein comprises obtaining a sample from a subject infected with or suspected to be infected with coronavirus. In some instances, the methods comprise extracting RNA from the sample. RNA can be extracted from the sample with conventional RNA extraction methods. In some instances, RNA is extracted with the Mag-Bind Viral DNA/RNA 96 kit (Omega Bio-Tek). In some instances, RNA is extracted with the HighPrep™ Viral DNA/RNA Kit (MagBio). In some instances, RNA is extracted with the AB MagPure Virus RNA Isolation Kit. In some instances, RNA is extracted with the GeneJET RNA Purification Kit (Thermo Fisher Scientific).
[00369] RNA, in some embodiments, is automatically extracted on a liquid handling platform. In some instances, RNA is automatically extracted on the KingFisher Flex (Thermo Fisher Scientific). In some instances, RNA is automatically extracted on Microlab® STAR (Hamilton). In some instances, RNA is automatically extracted on Microlab® NIMBUS (Hamilton). In some instances, RNA is automatically extracted on KingFisher® Duo Prime (Thermo Fisher Scientific). In some instances, RNA is automatically extracted on MagMAX® Express-96 (Applied Biosystems). In some instances, RNA is automatically extracted on BioSprint® 96 (QIAGEN).
[00370] c. Amplification
[00371] Methods described herein, in some embodiments, comprise amplifying a target nucleic acid for detection. In some embodiments, amplifying comprises changing the temperature of the amplification reaction, also known as thermal amplification (e.g., PCR). In some implementations, amplifying is performed at essentially one temperature, also known as isothermal amplification.
[00372] In some embodiments, amplifying comprises subjecting a target nucleic acid to an amplification reaction selected from transcription mediated amplification (TMA), helicase dependent amplification (HD A), or circular helicase dependent amplification (cHDA), strand displacement amplification (SDA), recombinase polymerase amplification (RPA), loop mediated amplification (LAMP), exponential amplification reaction (EXPAR), rolling circle amplification (RCA), ligase chain reaction (LCR), simple method amplifying RNA targets (SMART), single primer isothermal amplification (SPIA), multiple displacement amplification (MDA), nucleic acid sequence based amplification (NASBA), hinge-initiated primer-dependent amplification of nucleic acids (HIP), nicking enzyme amplification reaction (NEAR), and improved multiple displacement amplification (IMDA).
[00373] In some instances, the target nucleic acid is amplified by LAMP. In some instances, the target nucleic acid is amplified by RT-LAMP. LAMP primer sets can be designed to target distinct but nearby sequences around one or more mutations (e.g., SNP) that are indicative of, at least in part, a SARS-CoV-2 variant. In some instances, four primers, comprising forward inner primer (FIP) with an overhang designed to form dumbbell structures, backward inner primer (BIP) with an overhang designed to form dumbbell structures, forward outer primer (F3) for displacing the FIP linked complementary strands from templates, and backward outer primer (B3) for displacing the BIP linked complementary strands from templates, are designed to target one region. In other instances, six primers, comprising FIP, BIP, F3, B3, loop forward (LF), and loop backward (LB), are designed to target one region, wherein LF and LB are added for faster amplification. In some instances, one or more of the primer are degenerative primers.
[00374] In some instances, a set of FIP, BIP, F3, B3, LF, and LB are designed to amplify around an SNP that is indicative of, at least in part, a SARS-CoV 2 variant. In specific instances, a set of FIP, BIP, F3, B3, LF, and LB are designed to amplify a fragment of DNA that encodes around amino acid position 452 of the S protein. In specific instances, a set of FIP, BIP, F3, B3, LF, and LB are designed to amplify a fragment of DNA that encodes around amino acid position 484 of the S protein. In specific instances, a set of FIP, BIP, F3, B3, LF, and LB are designed to amplify a fragment of DNA that encodes around amino acid position 501 of the S protein. In specific instances, a set of FIP, BIP, F3, B3, LF, and LB are designed to amplify a fragment of DNA that encodes around amino acid position 18, amino acid position 95, amino acid position 144, amino acid position 614, amino acid position 653, amino acid position 681, amino acid position 655, amino acid position 796, or amino acid position 1219 of the S protein. In one instance, a set of FIP, BIP, F3, B3, LF, and LB has primer sequences shown in Tables 3A and 3B herein. [00375] Table 3A: LAMP Primer Sequences
Figure imgf000142_0001
[00376] Table 3B: LAMP Primer Sequences
Figure imgf000142_0002
Figure imgf000143_0001
indicates LAMP primers that contain degenerate nucleotides
[00377] Provided herein are compositions that, in some embodiments, further comprise reagents for amplification, the uses thereof, e.g., in systems, and methods for assaying for a SARS-CoV-2 variant. Non-limiting examples of reagents for amplifying a nucleic acid include polymerases, primers, and nucleotides. Nucleic acid amplification of a target nucleic acid, in some embodiments, improves at least one of sensitivity, specificity, or accuracy of the assay in detecting the target nucleic acid. In some cases, amplification of the target nucleic acid increases the concentration of the target nucleic acid in the sample relative to the concentration of nucleic acids that do not correspond to the target nucleic acid. [00378] In some embodiments, amplification of the target nucleic acid comprises modifying the sequence of the target nucleic acid. For example, in some embodiments, amplification is used to insert a PAM sequence into a target nucleic acid that lacks a PAM sequence.
[00379] In some embodiments, the nucleic acid amplification reaction is carried out with reagents comprising amplification primers described herein, a DNA polymerase, dNTPs, and appropriate buffer. In some specific embodiments, the buffer comprises Tris-HCl, (NH4)2SO4, KC1, MgSO4, and Tween® 20. In one embodiment, the concentration of MgSCti is 6mM.
[00380] In some embodiments, amplification of the target nucleic acid comprises modifying the sequence of the target nucleic acid. For example, in some embodiments, amplification is used to insert a PAM sequence into a target nucleic acid that lacks a PAM sequence.
[00381] Amplifying, in some embodiments, takes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, or 60 minutes. Sometimes, the nucleic acid amplification is performed for 1 to 60, 5 to 55, 10 to 50, 15 to 45, 20 to 40, or 25 to 35 minutes. Amplifying, in some embodiments, is performed at a temperature of around 20- 45°C. Amplifying, in some embodiments, is performed at a temperature of less than about 20°C, less than about 25°C, less than about 30°C, 35°C, less than about 37°C, less than about 40°C, or less than about 45°C. The nucleic acid amplification reaction, in some embodiments, is performed at a temperature of at least about 20°C, at least about 25°C, at least about 30°C, at least about 35°C, at least about 37°C, at least about 40°C, or at least about 45°C.
[00382] In some cases, the amplification reaction is performed on at least 10, 100, 1,000, 5,000, 10,000, 15,000 or 10,000 copies of the target nucleic acid. In some cases, at least 10,000 copies of the target nucleic acid are input into the amplification reaction.
[00383] In some embodiments, the nucleic acid amplification reaction is carried out with reagents comprising amplification primers described herein, a DNA polymerase, dNTPs, and appropriate buffer. In some specific embodiments, the buffer comprises Tris-HCl, (NH4)2SO4, KC1, MgSO4, and Tween® 20. In one embodiment, the concentration of MgSCti is 6mM.
[00384] d. Reverse transcription [00385] Methods described herein, in some embodiments, comprise reverse transcribing the coronavirus target nucleic acid, the amplification product, or a combination thereof. In some instances, the reverse transcribing comprises contacting the sample to reagents for reverse transcription. In some specific instances, the reagents for reverse transcription comprise a reverse transcriptase, an oligonucleotide primer, and dNTPs. In one instance, the reverse transcriptase described herein is WarmStart RTx reverse transcriptase. In one instance, the reverse transcriptase described herein is MultiScribe™ reverse transcriptase. In one instance, the reverse transcriptase described herein is QuantiTect reverse transcriptase. In one instance, the reverse transcriptase described herein is GoScript™ reverse transcriptase. In one instance, the reverse transcriptase described herein is UltraScript 2.0 reverse transcriptase.
[00386] In some instances, the contacting the sample to reagents for reverse transcription occurs prior to the contacting the sample to the detector nucleic acid to the detector nucleic acid and the composition, prior to the contacting the sample to the reagents for amplification, or prior to both.
[00387] In some instances, the contacting the sample to reagents for reverse transcription occurs concurrent to the contacting the sample to the detector nucleic acid to the detector nucleic acid and the composition, concurrent to the contacting the sample to the reagents for amplification, or concurrent to both.
[00388] e. Generatins and assaying signals
[00389] Methods described herein, in some embodiments, comprise contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof.
[00390] Methods described herein, in some embodiments, further comprise assaying for a change in a signal produced by cleavage of a reporter nucleic acid wherein the effector protein trans cleaves the reporter nucleic acid upon hybridization of the non-naturally occurring guide nucleic acid to the segment of the target nucleic acid or the amplification product thereof.
[00391] i. Effector Proteins
[00392] Provided herein are effector proteins and uses thereof, e.g., in compositions, systems, and methods for assaying for a SARS-CoV-2 variant. [00393] Several effector proteins are consistent with the methods of the disclosure. For example, CRISPR/Cas enzymes are effector proteins used in the methods and systems disclosed herein. CRISPR/Cas enzymes can include any of the known Classes and Types of CRISPR/Cas enzymes. Effector proteins disclosed herein include Class 1 CRISPR/Cas enzymes, such as the Type I, Type IV, or Type III CRISPR/Cas enzymes. Effector proteins disclosed herein also include the Class 2 CRISPR/Cas enzymes, such as the Type II, Type V, and Type VI CRISPR/Cas enzymes.
[00394] In some embodiments, the Type V CRISPR/Cas enzyme is a Casl2 effector protein. Type V CRISPR/Cas enzymes (e.g, Casl2 or Casl4) lack an HNH domain. A Casl2 nuclease of the disclosure cleaves a nucleic acids via a single catalytic RuvC domain. The RuvC domain is within a nuclease, or “NUC” lobe of the protein, and the Casl2 nucleases further comprise a recognition, or “REC” lobe. The REC and NUC lobes are connected by a bridge helix and the Casl2 proteins additionally include two domains for PAM recognition termed the PAM interacting (PI) domain and the wedge (WED) domain. (Murugan et al. , Mol Cell. 2017 Oct 5; 68(1): 15-25). In some embodiments, the programmable Casl2 effector protein described herein is Casl2a. In some embodiments, the programmable Casl2 effector protein described herein is Casl2c. In some embodiments, the programmable Casl2 effector protein described herein is Casl2d. In some embodiments, the programmable Casl2 effector protein described herein is Casl2e. In some embodiments, the programmable Casl2 effector protein described herein is Casl2b. In some embodiments, the programmable Casl2 effector protein described herein is Casl2h. In some embodiments, the programmable Casl2 effector protein described herein is Casl2i. In some embodiments, the programmable Casl2 effector protein described herein is a small effector such as Casl2g.
[00395] ii. Guide Nucleic Acids
[00396] Provided herein are compositions comprising an effector protein and a guide nucleic acid and uses thereof, e.g., in systems, and methods for assaying for a SARS-CoV-2 variant.
[00397] In some embodiments, the guide nucleic acid described herein targets (e.g, is capable of binding to and/or is complementary to) a target nucleic acid. In some instances, the target nucleic acid is near a neighboring PAM sequence that is recognizable by the effector proteins described herein. In some instances, the engineered guide RNA comprises a CRISPR RNA (crRNA) that is at least partially complementary to a target nucleic acid. In some instances, the engineered guide RNA comprises a trans-activating crRNA (tracrRNA), at least a portion of which interacts with the effector protein. The tracrRNA, in some embodiments, hybridizes to a portion of the guide RNA that does not hybridize to the target nucleic acid. In some instances, at least a portion of a crRNA sequence and at least a portion of a tracrRNA sequence are provided as a single guide RNA (sgRNA). In some instances, compositions comprise a crRNA and tracrRNA that function together as two separate, unlinked molecules.
[00398] In some instances, the guide nucleic acid targets (e.g, is capable of binding to and/or is complementary to) a target nucleic acid indicative of, at least in part, a SARS-CoV- 2 variant. In some instances, the target nucleic acid is from a SARS-CoV-2 variant and comprises a mutation relative to a wild-type SARS-CoV-2 or to a strain of SARS-CoV-2 that comprises no mutations at the same locus. In some instances, the mutation is a single nucleotide polymorphism (SNP). In some instance, the mutation (e.g, an SNP) can be used to distinguish the SARS-CoV-2 variant from a wild-type or from another SARS-CoV-2 variant. In some instances, the target nucleic acid is a RNA, DNA, or a synthetic nucleic acid. In some instances, the guide nucleic acid targets a target nucleic acid which comprises a segment of an S (Spike) gene of SARS-CoV-2. In some cases, the S protein plays a key role in the receptor recognition and cell membrane fusion process, which is essential for the infection and transmission capabilities of SARS-CoV-2. In some instances, the guide nucleic acid targets a target nucleic acid comprising certain SNP(s) in the S gene that encodes S protein. In some specific instances, the guide nucleic acid targets a portion of the S gene encoding around amino acid position 18, amino acid position 95, amino acid position 144, amino acid position 614, amino acid position 653, amino acid position 681, amino acid position 655, amino acid position 796, or amino acid position 1219 of the Spike protein of SARS-CoV-2 (see NCBI Reference Sequence: YP_009724390.1).
[00399] In some embodiments, the guide nucleic acid is a guide nucleic acid represented in Table 4A or Table 4B.
[00400] Table 4A: Guide Nucleic Acid Sequences
Figure imgf000147_0001
Figure imgf000148_0001
[00401] Table 4B: Additional E484 Mutant Guide Nucleic Acid Sequences
Figure imgf000148_0002
[00402] In some embodiments, the guide nucleic acid described herein targets around position 452 in the Spike protein and hybridizes to a wild-type segment of sequence. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 1. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 1. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 1. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 1. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 1. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 22. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 22. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 22. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 22. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 22. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 22. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 22.
[00403] In some embodiments, the guide nucleic acid described herein targets around position 484 in the Spike protein and hybridizes to a wild-type segment of sequence. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 2. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 2. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 2. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 2. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 2. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 2. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 23. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 23. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 23. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 23. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 23. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 23.
[00404] In some embodiments, the guide nucleic acid described herein targets around position 501 in the Spike protein and hybridizes to a wild-type segment of sequence. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 3. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 3. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 3. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 3. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 3. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 3. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 24. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 24. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 24. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 24. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 24. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 24.
[00405] In some embodiments, the guide nucleic acid described herein targets around position 452 in the Spike protein and hybridizes to a mutant segment of sequence. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 4. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 4. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 4. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 4. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 4. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 4. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 25. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 25. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 25. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 25. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 25. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 25.
[00406] In some embodiments, the guide nucleic acid described herein targets around position 484, and hybridizes to a mutant segment of sequence. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 5. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 5. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 5. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 5. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 5. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 5. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 26. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 26. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 26. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 26. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 26. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 26. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 40. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 40. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 40. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 40. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 40. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 40. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 4E In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 4E In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 4E In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 4E In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 4E In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 41. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 42. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 42. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 42. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 42. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 42. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 42.
[00407] In some embodiments, the guide nucleic acid described herein targets around position 501, and hybridizes to a mutant segment of sequence. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 6. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 6. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 6. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 6. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 6. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 6. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 27. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 27. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 27. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 27. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 6. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 6.
[00408] In some embodiments, the guide nucleic acids described herein comprise UAAUUUCUACUAAGUGUAGAU (SEQ ID NO: 126) on the 5’ end to be recognized by CasDxl, as described in the Example Section.
[00409] In some embodiments, the guide nucleic acids described herein comprise UAAUUUCUACUAAGUGUAGAU (SEQ ID NO: 126) on the 5’ end to be recognized by LbCasl2a, as described in the Example Section
[00410] In some embodiments, the guide nucleic acids described herein comprise UAAUUUCUACUCUUGUAGAU (SEQ ID NO: 127) on the 5’ end to be recognized by AsCasl2a, as described in the Example Section.
[00411] Hi. Reporter Nucleic Acids
[00412] Provided herein are systems comprising an effector protein, a guide nucleic acid and a reporter nucleic acid and uses thereof, e.g., in systems, and methods for assaying for a SARS-CoV-2 variant.
[00413] In some instances, a reporter comprises a single-stranded nucleic acid and a detection moiety, wherein the single-stranded nucleic acid is capable of being cleaved by the activated effector protein, thereby generating a detectable signal. In some cases, the reporter comprises deoxyribonucleotides. In other cases, the reporter comprises ribonucleotides. In some cases, the reporter comprises at least one deoxyribonucleotide and at least one ribonucleotide. In some cases, the reporter comprises a single-stranded nucleic acid comprising at least one ribonucleotide residue at an internal position that functions as a cleavage site.
[00414] In some instances, the reporter comprises a protein capable of generating a signal. A signal, in some embodiments, is a calorimetric, potentiometric, amperometric, optical (e.g, fluorescent, colorimetric, etc.), or a piezo-electric signal. In some cases, the reporter comprises a detection moiety. Suitable detectable labels and/or moieties contemplated for providing a signal include, but are not limited to, an enzyme, a radioisotope, a member of a specific binding pair; a fluorophore; a fluorescent protein; a quantum dot; and the like.
[00415] Suitable fluorescent proteins include, but are not limited to, green fluorescent protein (GFP) or variants thereof, blue fluorescent variant of GFP (BFP), cyan fluorescent variant of GFP (CFP), yellow fluorescent variant of GFP (YFP), enhanced GFP (EGFP), enhanced CFP (ECFP), enhanced YFP (EYFP), GFPS65T, Emerald, Topaz (TYFP), Venus, Citrine, mCitrine, GFPuv, destabilized EGFP (dEGFP), destabilized ECFP (dECFP), destabilized EYFP (dEYFP), mCFPm, Cerulean, T-Sapphire, CyPet, YPet, mKO, HcRed, t- HcRed, DsRed, DsRed2, DsRed-monomer, J-Red, dimer2, t-dimer2(12), mRFPl, pocilloporin, Renilla GFP, Monster GFP, paGFP, Kaede protein and kindling protein, Phycobiliproteins and Phycobiliprotein conjugates including B-Phycoerythrin, R- Phycoerythrin and Allophycocyanin. Suitable enzymes include, but are not limited to, horse radish peroxidase (HRP), alkaline phosphatase (AP), beta-galactosidase (GAL), glucose-6- phosphate dehydrogenase, beta-N-acetylglucosaminidase, P-glucuronidase, invertase, Xanthine Oxidase, firefly luciferase, and glucose oxidase (GO).
[00416] In some cases, the reporter comprises a detection moiety. In some instances, the reporter comprises a cleavage site, wherein the detection moiety is located at a first site on the reporter, wherein the first site is separated from the remainder of reporter upon cleavage at the cleavage site. In some cases, the detection moiety is 3’ to the cleavage site. In some cases, the detection moiety is 5’ to the cleavage site. Sometimes the detection moiety is at the 3’ terminus of the nucleic acid of a reporter. In some cases, the detection moiety is at the 5’ terminus of the nucleic acid of a reporter.
[00417] In some cases, the reporter comprises a detection moiety and a quenching moiety. In some instances, the reporter comprises a cleavage site, wherein the detection moiety is located at a first site on the reporter and the quenching moiety is located at a second site on the reporter, wherein the first site and the second site are separated by the cleavage site. Sometimes the quenching moiety is a fluorescence quenching moiety. In some cases, the quenching moiety is 5’ to the cleavage site and the detection moiety is 3’ to the cleavage site. In some cases, the detection moiety is 5’ to the cleavage site and the quenching moiety is 3’ to the cleavage site. Sometimes the quenching moiety is at the 5’ terminus of the nucleic acid of a reporter. Sometimes the detection moiety is at the 3’ terminus of the nucleic acid of a reporter. In some cases, the detection moiety is at the 5’ terminus of the nucleic acid of a reporter. In some cases, the quenching moiety is at the 3’ terminus of the nucleic acid of a reporter.
[00418] In some embodiments, the reporter is a reporter represented in Table 5. [00419] Table 5: Exemplary Single Stranded Detector Nucleic Acid
Figure imgf000156_0001
/5Alex594N/: 5' Alexa Fluor 594 (NHS Ester) (Integrated DNA Technologies)
/3IAbRQSp/: 3' Iowa Black RQ (Integrated DNA Technologies)
[00420] In one embodiment, the detector nucleic acid has a sequence of SEQ ID NO: 7. In some cases, the detector nucleic acid comprises a sequence that is at least 87.5% identical to SEQ ID NO: 7. In one embodiment, the detector nucleic acid comprises a sequence that is at least 75% identical to SEQ ID NO: 7. In one embodiment, the detector nucleic acid comprises a sequence that is at least 62.5% identical to SEQ ID NO: 7. In one embodiment, the detector nucleic acid comprises a sequence that is at least 50% identical to SEQ ID NO: 7.
[00421] Suitable fluorophores, in some embodiments, provide a detectable fluorescence signal in the same range as 6-Fluorescein (Integrated DNA Technologies), IRDye 700 (Integrated DNA Technologies), TYE 665 (Integrated DNA Technologies), Alex Fluor 594 (Integrated DNA Technologies), or ATTO TM 633 (NHS Ester) (Integrated DNA Technologies). Non-limiting examples of fluorophores are fluorescein amidite, 6-Fluorescein, IRDye 700, TYE 665, Alex Fluor 594, or ATTO TM 633 (NHS Ester). The fluorophore may be an infrared fluorophore. The fluorophore may emit fluorescence in the range of 500 nm and 720 nm. In some cases, the fluorophore emits fluorescence at a wavelength of 700 nm or higher. In other cases, the fluorophore emits fluorescence at about 665 nm. In some cases, the fluorophore emits fluorescence in the range of 500 nm to 520 nm, 500 nm to 540 nm, 500 nm to 590 nm, 590 nm to 600 nm, 600 nm to 610 nm, 610 nm to 620 nm, 620 nm to 630 nm, 630 nm to 640 nm, 640 nm to 650 nm, 650 nm to 660 nm, 660 nm to 670 nm, 670 nm to 680 nm, 690 nm to 690 nm, 690 nm to 700 nm, 700 nm to 710 nm, 710 nm to 720 nm, or 720 nm to 730 nm. In some cases, the fluorophore emits fluorescence in the range 450 nm to 750 nm, 500 nm to 650 nm, or 550 to 650 nm.
[00422] A quenching moiety, in some embodiments, is chosen based on its ability to quench the detection moiety. A quenching moiety, in some embodiments, is a non- fluorescent fluorescence quencher. A quenching moiety, in some embodiments, quenches a detection moiety that emits fluorescence in the range of 500 nm and 720 nm. A quenching moiety, in some embodiments, quenches a detection moiety that emits fluorescence in the range of 500 nm and 720 nm. In some cases, the quenching moiety quenches a detection moiety that emits fluorescence at a wavelength of 700 nm or higher. In other cases, the quenching moiety quenches a detection moiety that emits fluorescence at about 660 nm or about 670 nm. In some cases, the quenching moiety quenches a detection moiety that emits fluorescence in the range of 500 to 520, 500 to 540, 500 to 590, 590 to 600, 600 to 610, 610 to 620, 620 to 630, 630 to 640, 640 to 650, 650 to 660, 660 to 670, 670 to 680, 690 to 690, 690 to 700, 700 to 710, 710 to 720, or 720 to 730 nm. In some cases, the quenching moiety quenches a detection moiety that emits fluorescence in the range 450 nm to 750 nm, 500 nm to 650 nm, or 550 to 650 nm. A quenching moiety, in some embodiments, quenches fluorescein amidite, 6-Fluorescein, IRDye 700, TYE 665, Alex Fluor 594, or ATTO TM 633 (NHS Ester). A quenching moiety, in some embodiments, is Iowa Black RQ, Iowa Black FQ or IRDye QC-1 Quencher. A quenching moiety, in some embodiments, quenches fluorescein amidite, 6-Fluorescein (Integrated DNA Technologies), IRDye 700 (Integrated DNA Technologies), TYE 665 (Integrated DNA Technologies), Alex Fluor 594 (Integrated DNA Technologies), or ATTO TM 633 (NHS Ester) (Integrated DNA Technologies). A quenching moiety may be Iowa Black RQ (Integrated DNA Technologies), Iowa Black FQ (Integrated DNA Technologies) or IRDye QC-1 Quencher (LiCor). Any of the quenching moieties described herein, in some embodiments, is from any commercially available source, may be an alternative with a similar function, a generic, or a non-trade name of the quenching moieties listed.
[00423] In some cases, the reporter is present in at least 1.5 fold, at least 2 fold, at least 3 fold, at least 4 fold, at least 5 fold, at least 6 fold, at least 7 fold, at least 8 fold, at least 9 fold, at least 10 fold, at least 11 fold, at least 12 fold, at least 13 fold, at least 14 fold, at least 15 fold, at least 16 fold, at least 17 fold, at least 18 fold, at least 19 fold, at least 20 fold, at least 30 fold, at least 40 fold, at least 50 fold, at least 60 fold, at least 70 fold, at least 80 fold, at least 90 fold, at least 100 fold, from 1.5 fold to 100 fold, from 2 fold to 10 fold, from 10 fold to 20 fold, from 20 fold to 30 fold, from 30 fold to 40 fold, from 40 fold to 50 fold, from 50 fold to 60 fold, from 60 fold to 70 fold, from 70 fold to 80 fold, from 80 fold to 90 fold, from 90 fold to 100 fold, from 1.5 fold to 10 fold, from 1.5 fold to 20 fold, from 10 fold to 40 fold, from 20 fold to 60 fold, or from 10 fold to 80 fold excess of total nucleic acids in a sample.
[00424] f. Determination of variant calls [00425] Methods described herein, in some embodiments, comprise determining a variant call of a sample. In some cases, determining the variant call comprises determining whether the sample comprises a wild-type SARS-CoV-2 or a SARS-CoV-2 variant. In specific cases, determining the variant call of the SARS-CoV-2 variant comprises detecting one or more S- gene mutation(s) relevant to a wild-type SARS-CoV-2. In specific cases, the SARS-CoV-2 variant is any one of Alpha, Beta, Gamma, Delta, Epsilon, Kappa, Omicron, or Zeta SARS- CoV-2 variants.
[00426] In some instances, determining the variant call of Alpha SARS-CoV-2 variant comprises detecting an S-gene mutation in amino acid position 501 from N to Y (N501 Y). In specific cases, determining the variant call of Alpha SARS-CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid position 501 of S protein. In certain cases, determining the variant call of Alpha SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 3 or 6. In certain cases, determining the variant call of Alpha SARS- CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 24 or 27.
[00427] In some instances, determining the variant call of Beta SARS-CoV-2 variant comprises detecting S-gene mutations in amino acid positions 484 from E to K and 501 from N to Y (E484K and N501Y). In specific cases, determining the variant call of Beta SARS- CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid positions 484 and 501 of S protein. In certain cases, determining the variant call of Beta SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 2, 3, 5, or 6. In certain cases, determining the variant call of Beta SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 23, 24, 26, or 27. [00428] In some instances, determining the variant call of Gamma SARS-CoV-2 variant comprises detecting S-gene mutations in amino acid positions 484 from E to K and 501 from N to Y (E484K and N501Y). In specific cases, determining the variant call of Gamma SARS- CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid positions 484 and 501 of S protein. In certain cases, determining the variant call of Gamma SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 2, 3, 5, or 6. In certain cases, determining the variant call of Gamma SARS-CoV-2 variant comprises using the non- naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 23, 24, 26, or 27.
[00429] In some instances, determining the variant call of Delta SARS-CoV-2 variant comprises detecting an S-gene mutation in amino acid position 452 from L to R (L452R). In specific cases, determining the variant call of Delta SARS-CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid position 452 of S protein. In certain cases, determining the variant call of Delta SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 1 or 4. In certain cases, determining the variant call of Delta SARS- CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 22 or 25.
[00430] In some instances, determining the variant call of Epsilon SARS-CoV-2 variant comprises detecting an S-gene mutation in amino acid position 452 from L to R (L452R). In specific cases, determining the variant call of Epsilon SARS-CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid position 452 of S protein. In certain cases, determining the variant call of Epsilon SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 1 or 4. In certain cases, determining the variant call of Epsilon SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 22 or 25.
[00431] In some instances, determining the variant call of Kappa SARS-CoV-2 variant comprises detecting an S-gene mutation in amino acid position 452 from L to R (L452R). In specific cases, determining the variant call of Kappa SARS-CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid position 452 of S protein. In certain cases, determining the variant call of Kappa SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 1 or 4. In certain cases, determining the variant call of Kappa SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 22 or 25.
[00432] In some instances, determining the variant call of Omicron SARS-CoV-2 variant comprises detecting an S-gene mutation in amino acid position 484 from E to A (E484A). In specific cases, determining the variant call of Omicron SARS-CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid position 484 of S protein. In certain cases, determining the variant call of Omicron SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 2 or 42.
[00433] In some instances, determining the variant call of Zeta SARS-CoV-2 variant comprises detecting an S-gene mutation in amino acid position 484 from E to K (E484K). In specific cases, determining the variant call of Zeta SARS-CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid position 484 of S protein. In certain cases, determining the variant call of Zeta SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 2 or 5. In certain cases, determining the variant call of Zeta SARS- CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 23 or 26.
[00434] In some instances, the data analysis pipeline to determine a variant call of a sample comprises 1) obtaining the maximum fluorescence value and the minimum fluorescence value from a well containing the sample, an effector protein, a reporter nucleic acid, and a non-naturally occurring guide nucleic acid that targets the target nucleic acid from wild-type SARS-CoV-2; 2) dividing the maximum fluorescence value by the corresponding minimum fluorescence value from 1); 3) obtaining the maximum fluorescence value and the minimum fluorescence value from a well with the sample, an effector protein, a reporter nucleic acid, and a non-naturally occurring guide nucleic acid that targets the target nucleic acid from a SARS-CoV-2 variant. In some instances, the non-naturally occurring guide nucleic acid are designed for detecting mutations L452R, E484K, E484Q, E484A, and N501Y. In specific instances, the non-naturally occurring guide nucleic acid comprises a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 22, 23, 24, 25, 26, 27, 40, 41, or 42; 4) dividing the maximum fluorescence value by the corresponding minimum fluorescence value from 3). In specific instances, the ratio of maximum fluorescence value and the minimum fluorescence value is defined as fluorescence yield, and it can be calculated by the formula: Fy = max(F)/min(F). Therefore, from step 2) above, Fy (WT) is obtained. From step 4) above, Fy (Mutant) (or Fy(M)) is obtained.
[00435] Next, in some instances, the data analysis pipeline to determine a variant calling of a sample comprises transforming the normalized signals described herein (e.g., fluorescence yield). In specific instances, the transformation is applying a logarithmic value to normalized signal (e.g., fluorescence yield). In one instance, the transformation is applying a logarithmic value with a base of 2 to normalized signal (e.g., fluorescence yield). Therefore, log2 Fy (WT) and log2 Fy (M) are obtained. In some specific instances, an allele discrimination plot can be generated with the transformed signal (see FIGS. 4A-4C or FIG. 11D as non-limiting examples). In other specific instances, a binary logarithmic value can be applied to scaled signals, and the ratio of the WT and Mutant transformed values are plotted against the average of the WT and Mutant transformed values, a mean average (MA) plot can be generated (see FIGS. 8A-8H or FIGS. 12A-12M as non-limiting examples).
[00436] Next, in some instances, the data analysis pipeline to determine a variant calling of a sample comprises comparing the transformed signals. In some specific instances where the transformed signals from WT are higher than the transformed signals from Mutant, the sample is determined as WT. In one instance, the data analysis pipeline can be expressed as log2(Fy(WT)) > log2(Fy(M)) —> Wild Type. In some specific instances where the transformed signals from WT are lower than the transformed signals from Mutant, the sample is determined as Mutant that is indictive of variant. In one instance, the data analysis pipeline can be expressed as log2(Fy(WT)) < log2(Fy(M)) —> Mutant.
[00437] In cases where more than one mutant at a particular position was being analyzed for, the signals of each mutant can be compared to the wild-type signal. If there exists a mutant, then among (n) comparisons for n mutants and one wild-type, one of the comparisons may yield a mutant call. In one instance, the data analysis pipeline can be expressed as log2(Fy(WT)) > log2(Fy(Ml)) Wild Type, log2(Fy(WT)) > log2(Fy(M2)) Wild Type, log2(Fy(WT)) > log2(Fy(M3)) Wild Type, log2(Fy(WT)) < log2(Fy(M4)) Mutant (4).
[00438] In cases where there is a tie in the above logic between mutant and wild-type, then a tie breaker comparison can be used to yield a final result. In one instance, the tie breaker analysis pipeline can be expressed as log2(Fy(WT)) > log2(Fy(Ml)) Wild Type, log2(Fy(WT)) < log2(Fy(M2)) Mutant (2), log2(Fy(WT)) > log2(Fy(M3)) Wild Type, log2(Fy(WT)) < log2(Fy(M4)) Mutant (4), log2(Fy(M2)) < log2(Fy(M4)) Mutant (4).
[00439] Separately, in some instances, a no-target control (NTC) can be processed concurrently with the sample, and the maximum and the minimum fluorescence values can be scaled and transformed as described herein for a sample. In specific instances, logarithmic ratio between the scaled signals and the logarithmic mean of the scaled signals can be defined as Contrast and Size. In some specific instances where the Contrast and the Size of the sample are within the lower and upper bounds of the NTC that is concurrently processed, the sample is determined as No Call. In one instance, the data analysis pipeline can be expressed as Cmin(NTC-snp)<=C(sample-snp)<=Cmax(NTC-snp) —> NoCall. In one instance, the data analysis pipeline can be expressed as Smin(NTC-snp)<=S(sample-snp)<=Smax(NTC-snp) NoCall (see “NoCalls” and “NTC” in FIG. 1 ID as a non-limiting example).
[00440] In some instances, the quality of the determination of a variant calling can be visualized by the separation of WT and Mutant signals in the allele discrimination plot described herein. In other instances, the quality of the determination of a variant calling can be visualized by the separation of WT and Mutant signals in the MA plot described herein (see FIGS. 8A-8H or FIGS. 12A-12M as non-limiting examples).
[00441] III. Multiplexing.
[00442] The compositions, systems, and methods described herein, in some implementations, are multiplexed in a number of ways. These methods of multiplexing are, for example, consistent with methods and reagents disclosed herein for detection of a target nucleic acid within the sample.
[00443] Multiplexing, in some embodiments, are either spatial multiplexing wherein multiple different target nucleic acids are detected at the same time, but the reactions are spatially separated. Often, the multiple target nucleic acids are detected using the same programmable nuclease, but different guide nucleic acids. The multiple target nucleic acids sometimes are detected using the different programmable nucleases. Sometimes, multiplexing can be single reaction multiplexing wherein multiple different target acids are detected in a single reaction volume. Often, a single population of programmable nucleases is used in single reaction multiplexing. Sometimes, at least two different programmable nucleases are used in single reaction multiplexing. For example, multiplexing can be enabled by immobilization of multiple categories of detector nucleic acids within a fluidic system, to enable detection of multiple target nucleic acids within a single sample.
[00444] IV. Kits.
[00445] Disclosed herein are kits, reagents, methods, and systems for use in detecting a SARS-CoV-2 variant.
[00446] In some instances, such kits include a package, carrier, or container that is compartmentalized to receive one or more containers such as vials, tubes, and the like, each of the container(s) comprising one of the separate elements to be used in a method described herein. Suitable containers include, for example, test wells, bottles, vials, and test tubes. In one embodiment, the containers are formed from a variety of materials such as glass, plastic, or polymers. [00447] The kit or systems described herein contain packaging materials. Examples of packaging materials include, but are not limited to, pouches, blister packs, bottles, tubes, bags, containers, bottles, and any packaging material suitable for intended mode of use.
[00448] A kit typically includes labels listing contents and/or instructions for use, and package inserts with instructions for use. A set of instructions will also typically be included. In one embodiment, a label is on or associated with the container. In some instances, a label is on a container when letters, numbers or other characters forming the label are attached, molded or etched into the container itself; a label is associated with a container when it is present within a receptacle or carrier that also holds the container, e.g, as a package insert. In one embodiment, a label is used to indicate that the contents are to be used for a specific therapeutic application. The label also indicates directions for use of the contents, such as in the methods described herein.
[00449] After packaging the formed product and wrapping or boxing to maintain a sterile barrier, the product may be terminally sterilized by heat sterilization, gas sterilization, gamma irradiation, or by electron beam sterilization. Alternatively, the product may be prepared and packaged by aseptic processing.
[00450] EXAMPLES
[00451] The following examples are illustrative and non-limiting to the scope of the methods, reagents, systems, and kits described herein.
[00452] Example 1 - Identification of Genetic Phenotypes in Synthetic SARS-CoV-2 S-
Gene Fragments
[00453] Systems and methods for identifying genetic phenotypes in accordance with an embodiment of the present disclosure were developed and tested using samples of synthetic gene fragments encoding regions of the SARS-CoV-2 S-gene with either wild-type (WT) or mutant (MUT) sequences at amino acid positions 452, 484, and 501. Wild-type and mutant synthetic gene fragments were first PCR amplified using NEB 2x Phusion Master Mix following the manufacturer’s protocol. The amplified product was cleaned using AMPure XP beads following manufacturers protocol at a 0.7x concentration, eluted in nuclease-free water, and normalized to 10 nM.
[00454] A signal dataset was generated using a wild-type sample containing the wild-type synthetic gene fragments and a mutant sample containing the mutant synthetic gene fragments, in which each sample was extracted for their respective nucleic acids and partitioned into replicates (e.g., three wild-type amplification replicates and three mutant amplification replicates). Amplification replicates were amplified by reverse transcriptase loop-mediated isothermal amplification (RT-LAMP), and each amplification replicate was further partitioned into replicate wells in a microtiter plate, for each of the three amino acid positions 452, 484, and 501 (see, for example, the schematic in FIGS. 4A-4B). Replicate wells were processed using comparative programmable nuclease-based (DETECTR) reactions, in which amplified synthetic gene fragments were interrogated using a CasDxl programmable nuclease, one or more reporters, and a guide sequence corresponding to either the wild-type sequence or the mutant sequence of the SARS-CoV-2 S-gene.
[00455] Briefly, two LAMP primer sets, each containing 6 primers, were designed to target the L452R, E484K and N501Y mutations in the SARS-CoV-2 Spike (S) protein. Multiplexed RT-LAMP was performed using a final reaction volume of 50 pl, which consisted of 8 ul nucleic acid template, 5 pl of L452R primer set, 5 pl of E484K/N501Y primer set, 17 pl of nuclease-free water, 1 pl of SYTO-9 dye, and 14 pl of LAMP mastermix. Each of the primer sets consisted of 1.6 pM each of inner primers FIP and BIP, 0.2 pM each of outer primers F3 and B3, and 0.8 pM each of loop primers LF and LB. The LAMP mastermix contained 6 mM of MgS04, isothermal amplification buffer at lx final concentration, 1.5 mM of dNTP mix, 8 units of Bst 2.0 WarmStart DNA Polymerase, and 0.5 ul of WarmStart RTx Reverse Transcriptase. Plates were incubated at 65°C for 40 minutes in a real-time Quantstudio™ 5 PCR instrument. Fluorescent signals were collected every 30 seconds. 40nM Cas protein was incubated with 40nM gRNA in lx buffer for 30 min at 37°C. lOOnM ssDNA reporter (/5Alex594N/TTATTATT/3IAbRQSp/, IDT) was added to the RNA- protein complex. 18pL of this DETECTR® master mix was combined with 2pL target amplicon. The DETECTR® assays were monitored for 30 min at 37°C in a plate reader (Tecan).
[00456] Each sample was thus interrogated using a wild-type/mutant comparison in replicates of three, for each of the three amino acid positions, and for each of the three amplification replicates. Reporting signals for each well were obtained based on fluorescence intensities generated by cleavage of the reporter by the CasDxl programmable nuclease upon recognition of the target sequence by the guide sequence.
[00457] To evaluate the signal dataset, the fluorescence intensities of each well were normalized by scaling the maximum fluorescence values to the minimum fluorescence values for the respective well to generate fluorescence yields. Scaled signals for wells interrogated with the wild-type guide sequence were then compared with scaled signals for wells interrogated with the mutant guide sequence, for each variant amino acid position per sample in the plate. Specifically, after scaling, a logarithmic transformation was applied to the generated fluorescence yields for a respective well, thus obtaining a logarithmic ratio calculated in the form of log2(Fy(FAl)) / log2(Fy(SAl)),
[00458] where Fy(FAl) is the corresponding fluorescence yield for a first well interrogated with the wild-type guide sequence in the comparative assay, and Fy(SAl) is the corresponding fluorescence yield for a second well of the same sample type interrogated with the mutant guide sequence in the comparative assay. A well was deemed to be wild-type (e.g, contain an aliquot of the wild-type synthetic gene fragments) if the resulting ratio indicated a greater value for log2(Fy(FAl)) than for log2(Fy(SAl)), whereas a well was deemed to be mutant (e.g, contain an aliquot of the mutant synthetic gene fragments) if the resulting ratio indicated a lower value for log2(Fy(FAl)) than for log2(Fy(SAl)).
[00459] Additionally, comparative assay replicates were designated as no-call if the logarithmic ratio between the WT and MUT fluorescence yields fell within the maximum fluorescence yield and minimum fluorescence yield generated for a plurality of control wells containing no target samples (NTCs). Comparative assay replicates were further designated as no-call if the logarithmic mean of the WT and MUT fluorescence yields fell within the range of the maximum and minimum logarithmic means for NTCs, where logarithmic means were calculated in the form of
[log2(Fy(FAl) + Fy(SAl))]/2.
[00460] The assignment of no-call with this method was based on the assumption that an NTC will have neither WT nor MUT signals, thereby making no-calls indistinguishable from NTC samples.
[00461] FIGS. 5A, 5B, and 5C illustrate the identification of differentiated phenotypes WT and MUT (“M”) using the systems and methods disclosed herein. FIG. 5A illustrates allele discrimination plots that show separation of the WT and MUT (“M”) signals, which are concordant with the genotype of the originating samples (WT, M, and NTC). FIG. 5B illustrates further separation of the populations after application of a binary logarithmic value to each scaled signal in each well. These transformed values provide clear and improved differentiation for making mutation calls when the ratio of the WT and MUT transformed values are plotted against the average of the WT and MUT transformed values on a Mean Average (MA) plot. This analysis resulted in 100% concordance for SNP identity at positions 452, 484, and 501, which was consistent when observed as individual clusters of wells as illustrated in FIG. 5C. These results, obtained using synthetic gene fragment samples, provide robust validation of the presently disclosed systems and methods for use in biological and/or clinical contexts.
[00462] Example 2 -Rapid SARS-CoV-2 Variant Detection with CRISPR
[00463] Three different DNA-targeting CRISPR-Cas effectors with trans-cutting activity - CasDxl (Mammoth Biosciences), LbCasl2a (NEB) and AsCasl2a (IDT) - were evaluated for activity on SARS-CoV-2 variants.
[00464] SNP differentiation capabilities on synthetic guide nucleic acids
[00465] Guide RNAs (gRNAs) with CasDxl and LbCasl2a were initially screened for activity on synthetic gene fragments encoding regions of the SARS-CoV-2 S-gene with either wild-type (WT, SEQ ID NO: 20) or mutant (MUT, SEQ ID NO. 21) sequences at amino acid positions 452, 484, and 501. CasDxl, LbCasl2a, and AsCasl2a were further evaluated with their cognate gRNAs on synthetic gene fragments with respect to SNP differentiation capabilities.
[00466] WT synthetic S-gene fragment (SEQ ID NO: 20):
[00467] tctgctttactaatgtctatgcagattcatttgtaattagaggtgatgaagtcagacaaatcgctccagggcaaactggaa agattgctgattataattataaattaccagatgattttacaggctgcgttatagcttggaattctaacaatcttgattctaaggttggtggtaatt ataattacctgtatagattgtttaggaagtctaatctcaaaccttttgagagagatatttcaactgaaatctatcaggccggtagcacacctt gtaatggtgttgaaggttttaattgttactttcctttacaatcatatggtttccaacccactaatggtgttggttaccaaccatacagagtagta gtactttcttttgaacttctacatgca
[00468] Mutant synthetic S-gene fragment including mutations on amino acid positions 452, 484, and 501 (SEQ ID NO: 21):
[00469] Tctgctttactaatgtctatgcagattcatttgtaattagaggtgatgaagtcagacaaatcgctccagggcaaactgga aatattgctgattataattataaattaccagatgattttacaggctgcgttatagcttggaattctaacaatcttgattctaaggttggtggtaat tataattaccggtatagattgtttaggaagtctaatctcaaaccttttgagagagatatttcaactgaaatctatcaggccggtagcacacct tgtaatggtgttaaaggttttaattgttactttcctttacaatcatatggtttccaacccacttatggtgttggttaccaaccatacagagtagta gtactttcttttgaacttctacatgca [00470] A schematic of CRISPR-Cas gRNA design for SARS-CoV-2 S gene mutations is illustrated in FIG. 6A, including a region of the SARS-CoV-2 S-gene and wild-type or mutant sequences at amino acid positions 452, 484, and 501.
[00471] S-gene fragment for SARS-CoV-2, forward strand (SEQ ID NO: 118):
[00472] SEQ ID: 118: ggugguaauuauaauuaccuguauagauuguuuaggaagucuaaucucaaaccuuuugagagagauauuucaacugaaa ucuaucaggccgguagcacaccuuguaaugguguugaagguuuuaauuguuacuuuccuuuacaaucauaugguuucc aacccacuaaugguguugguuaccaacca
[00473] S-gene fragment for SARS-CoV-2, reverse strand (SEQ ID NO: 119):
[00474] SEQ ID: 119: ccaccauuaauauuaauggacauaucuaacaaauccuucagauuagaguuuggaaaacucucucuauaaaguugacuuua gauaguccggccaucguguggaacauuaccacaacuuccaaaauuaacaaugaaaggaaauguuaguauaccaaagguug ggugauuaccacaaccaaugguuggu
[00475] Mutant S-gene fragment subsequence including mutation at amino acid position 452 (SEQ ID NO: 120):
[00476] SEQ ID: 120: ccGguauagauuguuuagga
[00477] Wild-type S-gene fragment subsequence spanning amino acid position 452 (SEQ ID NO: 121):
[00478] SEQ ID: 121: aauggacauaucuaacaaau
[00479] Mutant S-gene fragment subsequence including mutation at amino acid position 484 (SEQ ID NO: 122):
[00480] SEQ ID: 122: acauuaccacaaUuuccaaa
[00481] Wild-type S-gene fragment subsequence spanning amino acid position 484 (SEQ ID NO: 123):
[00482] SEQ ID: 123: acauuaccacaacuuccaaa
[00483] Mutant S-gene fragment subsequence including mutation at amino acid position 501 (SEQ ID NO: 124):
[00484] SEQ ID: 124: aacccacuUaugguguuggu [00485] Wild-type S-gene fragment subsequence spanning amino acid position 501 (SEQ ID NO: 125):
[00486] SEQ ID: 125: caacccacuaaugguguugg
[00487] SNP differentiation capabilities on heat inactivated viral cultures
[00488] Additionally, SNP differentiation capabilities on heat-inactivated viral cultures were tested using the full SARS-CoV-2 CRISPR-CasDxl based DETECTR® workflow, which involved a multiplexed RT-LAMP amplification followed by the CRISPR-based DETECTR® readout that enabled SNP differentiation. FIG. 7 illustrates the SARS-CoV-2 CRISPR-CasDxl based DETECTR® workflow. As shown in FIG. 7, the sample (e.g., a RNA extraction) was used as an input to DETECTR, which was visualized by a fluorescent reader. RNA encoding SARS-CoV-2 S-gene was amplified using an isothermal amplification method such as RT-LAMP. Amplified samples were detected using a Casl2 programmable nuclease complexed with gRNAs directed to SARS-CoV-2 S-gene sequence. The Casl2 programmable nuclease cleaved an ssDNA reporter nucleic acid upon complex formation with the target nucleic acid. The results of the DETECTR assay were then compared with Whole Genome Sequencing (WGS) results.
[00489] Development and testing of data analysis pipelines
[00490] To create a development environment for data analysis pipelines analyzing the COVID-19 CRISPR-CasDxl based DETECTR® assay, a blinded dataset was generated with the SARS-CoV-2 CRISPR-CasDxl based DETECTR® assay on 93 SARS-COV-2 positive clinical samples (previously characterized by sequencing) and corresponding SNP DETECTR® controls.
[00491] Materials and Methods
[00492] Synthetic gene fragments
[00493] Wild-type and mutant synthetic gene fragments (Twist) were PCR amplified using NEB 2x Phusion Master Mix following the manufacturer’s protocol. The amplified product was cleaned using AMPure XP beads following manufacturers protocol at a 0.7X concentration. The product was eluted in nuclease-free water and normalized to 10 nM. The LAMP primers used herein are shown in Table 3A and were synthesized by Eurofins Genomics. The guide RNAs used herein are shown in Tables 4A and were synthesized by Dharmacon or Synthego. The reporter used herein is shown in Table 5 and was synthesized by IDT. The synthetic gene fragments (SEQ ID Nos. 20 and 21) were synthesized by Twist Biosciences.
[00494] Clinical sample acquisition and extraction
[00495] De-identified residual SARS-CoV-2 RT-PCR positive nasopharyngeal and/or oropharyngeal (NP/OP) swab samples in universal transport media (UTM) or viral transport media (VTM) were obtained from the UCSF Clinical Microbiology Laboratory. All samples were stored in a biorepository according to protocols approved by the UCSF Institutional Review Board (protocol number 10-01116, 11-05519) until processed.
[00496] All NP/OP swab samples obtained from the UCSF Clinical Microbiology Laboratory were pretreated with DNA/RNA Shield (Zymo Research, # R1100-250) at a 1 : 1 ratio. The Mag-Bind Viral DNA/RNA 96 kit (Omega Bio-Tek, # M6246-03) on the KingFisher Flex (Thermo Fisher Scientific, # 5400630) was used for viral RNA extraction using an input volume of 200 pl of diluted NP/OP swab sample and an elution volume of 100 pl. The Taqpath™ COVID-19 RT-PCR kit (Thermo Fisher Scientific) was used to determine the N gene cycle threshold values.
[00497] Heat-inactivated culture acquisition and extraction
[00498] Heat-inactivated cultures of SARS-CoV-2 variants being monitored (VBM), variants of concern (VOC), or variants of interest (VOI) were provided by the California Department of Public Health (CDPH).
[00499] RNA from heat-inactivated SARS-CoV-2 VBM/V OC/V OI isolates were extracted using the EZ1 Virus Mini Kit v2.0 (Qiagen, # 955134) on the EZ1 Advanced XL (Qiagen, # 9001875) according to the manufacturer’s instructions. For each culture, six replicate LAMP reactions were pooled into a single sample. DETECTR® was performed on a 1: 10 dilution of the 10,000 cp/rxn LAMP amplification products.
[00500] COVID-19 Variant DETECTR® assay
[00501] COVID- 19 Variant DETECTR® assay was performed with RT-LAMP followed by a CRISPR-Cas-based assay.
[00502] Two LAMP primer sets, each containing 6 primers, were designed to target the L452R, E484K and N501Y mutations in the SARS-CoV-2 Spike (S) protein (see Table 3A). Sets of LAMP primers were designed from a 350 bp target sequence spanning the 3 mutations using Primer Explorer V5 (available on the Internet at primerexplorer.jp/e/). Candidate primers were manually evaluated for inclusion using the OligoCalc online oligonucleotide properties calculator (see W. A. Kibbe, Nucleic Acids Res 35, W43-46 (2007)) while ensuring that there was no overlap with either primers from the other set or guide RNA target regions that included the L452R, E484K, and N501Y mutations.
[00503] Multiplexed RT-LAMP was performed using a final reaction volume of 50 pl, which consisted of 8 pl RNA template, 5 pl of L452R primer set (Eurofins Genomics), 5 pl of E484K/N501Y primer set, 17pl of nuclease-free water, Ipl of SYTO-9 dye (ThermoFisher Scientific), and 14pl of LAMP mastermix. Each of the primer sets consisted of 1.6pM each of inner primers FIP and BIP, 0.2 pM each of outer primers F3 and B3, and 0.8 pM each of loop primers LF and LB. The LAMP mastermix contained 6 mM of MgSOi. isothermal amplification buffer at IX final concentration, 1.5 mM of dNTP mix (NEB), 8 units of Bst 2.0 WarmStart DNA Polymerase (NEB), and 0.5 ul of WarmStart RTx Reverse Transcriptase (NEB). Plates were incubated at 65°C for 40 minutes in a real-time Quantstudio™ 5 PCR instrument. Fluorescent signals were collected every 60 seconds.
[00504] 40nM CasDxl (Mammoth Biosciences), LbCasl2a (EnGen® Lba Casl2a, NEB) or AsCasl2a (Alt-R® A.s. Casl2a, IDT) protein targeting the WT or MUT SNP at L452R, E484K, or N501Y was incubated with 40nM gRNA in IX buffer (MBuffer3 for CasDxl, NEBuffer r2.1 for LbCasl2a and AsCasl2a) for 30 min at 37°C. gRNAs with an extra sequence of UAAUUUCUACUAAGUGUAGAU (SEQ ID NO: 126) on the 5’end were used with both CasDxl and LbCasl2a, whereas gRNAs with an extra sequence of UAAUUUCUACUCUUGUAGAU (SEQ ID NO: 127) on the 5’end were used with AsCasl2a (see Table 4A). lOOnM ssDNA reporter (SEQ ID NO: 7) was added to the RNA- protein complex. 18pL of this DETECTR® master mix was combined with 2pL target amplicon. The DETECTR® assays were monitored for 30 min at 37°C in a plate reader (Tecan).
[00505] Digital PCR
[00506] For digital PCR, samples were evaluated at 3 dilutions (1 TOO; 1 : 1,000; and 1:10,000) using the ApexBio Covid-19 Multiplex Digital PCR Detection Kit (Stilla Technologies) according to the manufacturer’s protocol. The controls (positive and negative, the Kit Controls, and an internal control) were run with the samples in duplicate. The dilutions were used to determine the most accurate concentration which was determined from the N gene concentration. [00507] Sequencing methods
[00508] Experiments with sequencing steps were carried out. Complementary DNA (cDNA) synthesis from RNA via reverse transcription and tiling multiplexed amplicon PCR were performed using SARS-CoV-2 primers version 3 according to the Artic protocol (see Plitnick, J. et al., J Clin Microbiol 59, e0064921 (2021) and Quick, J. et al., Nat Protoc 12, 1261-1276 (2017)). Libraries were constructed by ligating adapters to the amplicon products using NEBNext Ultra II DNA Library Prep Kit for Illumina (New England Biolabs, # E7645L), barcoding using NEBNext Multiplex Oligos for Illumina (New England Biolabs, # E6440L), and purification with AMPure XP (Beckman-Coulter, # 63880). Final pooled libraries were sequenced on either Illumina NextS eq 550 or Novaseq 6000 as 1x300 singleend reads (300 cycles).
[00509] SARS-CoV-2 viral genome assembly and variant analyses were performed using an in-house bioinformatics pipeline. Briefly, sequencing reads generated by Illumina sequencers (NextSeq 550 or NovaSeq 6000) were demultiplexed and converted to FASTQ files using bcl2fastq (v2.20.0.422). Raw FASTQ files were first screened for SARS-CoV-2 sequences using BLASTn (BLAST+ package 2.9.0) alignment against the Wuhan-Hu- 1 SARS-CoV-2 viral reference genome (NC_045512). Reads containing adapters, the ARTIC primer sequences, and low-quality reads were filtered using BBDuk (version 38.87) and then mapped to the NC_045512 reference genome using BBMap (version 38.87). Variants were called with Call Variants and a depth cutoff of 5 was used to generate the final assembly. Pangolin software (version 3.0.6, see A. O'Toole et al., Virus Evol 7, veab064 (2021); and A. Rambaut et al., Nat Microbiol 5, 1403-1407 (2020)) was used to identify the lineage. Using a custom in-house script, consensus FASTA files generated by the genome assembly pipeline were scanned to confirm L452R, E484K, and N501 Y mutations.
[00510] Eleven samples were re-extracted as described above for the NP/OP swab samples and evaluated by viral WGS as described above. The samples were then thawed and amplified using the LAMP protocol described above and evaluated using the COVID-19 Variant DETECTR® assay as described above.
[00511] DETECTR® data analysis pipeline
[00512] Quality control metric for the LAMP reaction
[00513] Prior to processing DETECTR® data from the clinical samples, data indicating the success or failure of the samples to amplify in the LAMP reaction were collected. The absolute truth was based on visual inspection of LAMP curves. This absolute truth was used to develop thresholds for the LAMP reactions. The positive and negative controls from the LAMP reactions were used to derive the thresholds to qualify the samples. Two sets of thresholds were used: time threshold and fluorescence rate threshold. The positive LAMP controls were assumed to represent an ideal sample and displayed a classic sigmoidal rise of fluorescence over time and the NTC represented the background fluorescence. It was hypothesized that a sample would ideally have positive control like fluorescence kinetics. However due to the presence of high background in some samples, a mean value between controls for each plate was chosen as threshold. After this, the fluorescence values at a time threshold of 18 minutes were collected. The time point is of importance here to rule out those samples that would amplify closer to the endpoint, signifying the LAMP intermediates to be the majority contributors of the rise in the signal and not the actual sample itself. A score was assigned for each sample which was calculated as a ratio of rate of fluorescence rate threshold to the rate of fluorescence value at 18 minutes for each sample. The hypothesis was that if this ratio of rate of fluorescence between controls and samples is less than 1, then samples had failed to reach the minimum fluorescence required to be called out as amplified and if the ratio is greater than or equal to 1, then samples had amplified sufficiently. To identify the exact score value for a qualitative QC metric, an ROC analysis was done on scores and the absolute truth (see FIG. 11 A).
[00514] Data analysis for CRISPR-based SNP calling
[00515] Each well had a guide specific to the mutant or the wild-type SNP. The comparison was important to assign a genotypic call to the sample. The DETECTR® reactions across the plate were not comparable to each other. For this purpose, the endpoint fluorescence intensities were normalized in each well to its own minimum intensity, which is defined as fluorescence yield. The fluorescence yield can be compared across wells in a plate under the assumption that each well will have a similar minimum fluorescence starting point. Irrespective of the highest levels of the fluorescence intensities observed across samples, without being limited to any one theory of operation, the yield for a given target will generally remain the same assuming that similar concentrations of samples/target are being compared. This aids in normalizing the signal and comparing replicates across the wells in the same plate.
Fy = max(F)/min(F) [00516] Without being limited to any one theory of operation, the wild-type and mutant target guides on NTC generally do not show any change in intensity over time. Thus, the fluorescence yield for NTC remains constant across replicates and plates, and moreover is close to 1.
Fy(NTC) = 1
[00517] On the contrary, if a sample has a fluorescence yield ~ 1, then it qualifies for a No Call.
[00518] Variant calling was performed using the following general rules: 1. No template controls were assigned NTC; 2. If the contrast of the sample for a SNP was between minimum and maximum contrast for the plate, then the sample was assigned a NoCall; and 3. If the size of the sample was lower than the size of the NTC on the plate, then the sample was assigned a NoCall.
Cmin(NTC-snp)<=C(sample-snp)<=Cmax(NTC-snp) - NoCall
Smin(NTC-snp)<=S(sample-snp)<=Smax(NTC-snp) - NoCall log2(Fy(WT)) > log2(Fy(M)) - Wild Type log2(Fy(WT)) < log2(Fy(M)) Mutant
[00519] SNP calls
[00520] The following procedure was used to evaluate the concordance between sequencing and DETECTR® technologies for genotypic classification of the clinical cohort dataset.
[00521] First, all samples and SNPs for which both sequencing and DETECTR® data were present in the distributed files were considered by matching the SNP IDs and sample names. This included cleaning and curing the dataset which had failed LAMP reactions and identifying WT and MUT based on the spacer fluorescent. This yielded a preliminary data set containing 279 calls across 3 SNPs against 93 samples. After eliminating samples that had failed to amplify in the LAMP reaction but were assigned a genotype, the resulting final analysis data consisted of 272 calls (Wild-type, Mutant and NoCall) spread across 3 SNPs and 91 samples. For each of the 3 SNPs in the analysis data set, both sequencing and DETECTR® genotypes (including NoCalls and LAMP Fails) were identified and recorded for each of the 93 patients. The 91 patients included the individuals for whom actual sequencing data was available.
[00522] Statistical Analysis
[00523] SNP Calls
[00524] Statistical analysis for the experiments described herein was performed as follows. For each SNP in the analysis of genotypic calls, a variety of statistics evaluating the concordance between genotype calls were computed on the two different technologies. The concordant and discordant genotypes were visualized through contingency tables. For each SNP, there are three possible genotypes (Wild-type, Mutant and No Call). The concordance rates were calculated without the samples that failed the LAMP reaction (see Table 7 and FIGS. 14A-14D). The 2X2 cross tables classify all three SNPs across all the samples between sequencing and DETECTR® technologies (see Table 7 and FIGS. 14A-14D). The data transformation and statistical analysis was done in R (see R Core Team. R: A Language and Environmental for Statistical Computing, (Vienna, Austria, 2018), available on the Internet at R-proj ect.org, date accessed: 11/26/21).
[00525] Results
[00526] Identifying the optimal CRISPR-Casl2 enzyme for SNP detection
[00527] To determine the optimal CRISPR-Casl2 enzyme for SNP detection, three different CRISPR-Cas effectors were evaluated with trans-cutting activity: CasDxl, LbCasl2a, and AsCasl2a. Guide RNAs (gRNAs) with CasDxl and LbCasl2a were initially screened for activity on synthetic gene fragments encoding regions of the SARS-CoV-2 S- gene with either wild-type (WT, SEQ ID NO: 20) or mutant (MUT, SEQ ID NO: 21) sequences at amino acid positions 452, 484, and 501 (see FIG. 6A and FIG. 6F). From this initial activity screen, the top-performing gRNAs were identified for each S-gene variant encoding either L452R, E484K or N501Y (see FIG. 6B). Further evaluation of CasDxl, LbCasl2a and AsCasl2a with their cognate gRNAs on synthetic gene fragments is illustrated in FIGS. 6B-E. For example, in FIGS. 6C-E, each figure corresponds to one of the three effector proteins. The first column of each figure shows the fluorescent read out from the CRISPR assay performed on a wild-type fragment, with a wild-type guide versus a mutant guide. The second column of each figure shows the fluorescent read out from the CRISPR performed on a mutant fragment, with a wild-type guide versus a mutant guide. The third column of each figure shows the fluorescent read out from the CRISPR assay performed on a control, with a wild-type guide versus a mutant guide. As shown in FIG. 6B, CasDxl showed clear SNP differentiation between wild-type (WT) and mutant (MUT) sequences on all three S-gene variants (see FIG. 6C). LbCasl2a was capable of differentiating SNPs at positions 452 and 484, and AsCasl2a was able to differentiate the SNP at position 452 (see FIGS. 6D- 6E).
[00528] Next, SNP differentiation capabilities were tested on heat-inactivated viral cultures using the full COVID- 19 Variant DETECTR® assay, consisting of RNA extraction, multiplexed RT-LAMP amplification (see FIG. 6F), and CRISPR-Casl2 detection with guide RNAs targeting part of the spike receptor-binding domain (RBD) (see FIG. 6A). The LAMP primer design incorporated two sets of six primers each, with both sets generating overlapping spike RBD amplicons that spanned the L452R, E484K, and N501Y mutations. A redundant LAMP design was adopted for two reasons: first, this approach was shown to improve detection sensitivity in initial experiments; second, the goal was to increase assay robustness given the continual emergence of escape mutations in the spike RBD throughout the course of the pandemic (see Harvey et al., Nat Rev Microbiol 19, 409-424 (2021)). The tested viral cultures included an ancestral SARS-CoV-2 lineage (WA-1) containing the wildtype spike protein (D614) targeted by the approved mRNA (BNT162b2 from Pfizer or mRNA-1273 from Modema) (see K. S. Corbett et al., Nature 586, 567-571 (2020); F. P. Polack et al., N Engl J Med 383, 2603-2615 (2020)) and DNA adenovirus vector (Ad26.COV2.S from Johnson and Johnson) (see R. Bos et al., NPJ Vaccines 5, 91 (2020)) vaccines, variants being monitored (VBMs) that were previously classified as variants of concern (VOCs) or variants of interest (VOIs), including Alpha (B.1.1.7), Beta (B.1.351), Gamma (P.l), Epsilon (B.1.427 and B.1.429), Kappa (B.1.617.1), and Zeta (P.2) lineages, and the current VOC Delta (B.1.617.2) lineage (see Table 6 for combination of mutations at each position for each strain). Heat-inactivated viral culture samples representing the seven SARS-CoV-2 lineages were quantified by digital droplet PCR across a 4-log dynamic range and used to evaluate the analytical sensitivity of the pre-amplification step. RT-LAMP amplification was evaluated using six replicates from each viral culture. Consistent amplification was observed for all seven SARS-CoV-2 lineages with 10,000 copies of target input per reaction (200,000 copies/mL) (see FIG. 6G), which is comparable to the target input of more than 200,000 copies/mL viruses (less than 30 Ct value) required for sequencing workflows used in SARS-CoV-2 variant surveillance (see e.g., F. de Mello Malta et al., Sci Rep 11, 7122 (2021); Igloi et al., Emerg Infect Dis 27, 1323-1329 (2021)). [00529] Table 6: Interpretation table summarizing the SARS-CoV-2 mutations in this study associated with the corresponding lineage classification
Figure imgf000177_0001
[00530] To evaluate the specificity of the different Casl2 enzymes, amplified material from each viral culture was pooled and the and SNPs resulting in the L452(R), E484(K) and N501(Y) mutations were detected using CasDxl, LbCal2a and AsCasl2a. FIG. 6B shows a representative heatmap shows the expected pattern of wild-type (WT) and mutational (MUT) calls for each of the SNPs resulting in L452R, E484K and N501 Y. Specifically, for each combination of effector protein and either wild-type or mutant guide, FIG. 6H shows the heatmap results for each reaction carried out in this assay, including for each combination of effector proteins, variant, and corresponding mutant or wild type guide. The results of the fluorescent assay are also shown in the fluorescent read-out curves of FIGS. 6I-6K plotting raw fluorescence over time for the wild-type SARS-CoV-2 (Column 1), each variant (Columns 2-7) and controls (Columns 8-12). Similar to the results found using gene fragments, CasDxl correctly identified the wild-type (WT) and mutational (MUT) targets at positions 452, 484 and 501 in each LAMP-amplified, heat-inactivated viral culture (FIG. 6H and FIGS. 6I-6K). In comparison, LbCasl2a was capable of differentiating WT from MUT at position 501 on LAMP-amplified viral cultures but showed much higher background for the WT target at position 452 and higher background for both WT and MUT targets at position 484 (FIG. 6H and FIGS. 6I-6K). Additionally, AsCasl2a was able to differentiate WT from MUT targets at position 452 albeit with substantial background but was unable to differentiate WT from MUT targets at positions 484 and 501 (FIG. 6H and FIGS. 6I-6K). From these data, it was concluded that CasDxl provided more consistent and accurate calls for the L452R, E484K and N501Y mutations.
[00531] Data analysis pipeline for calling COVID-19 Variant SNPs [00532] The SARS-CoV-2 CRISPR-CasDxl based DETECTR® assay was proceeded using only the high-fidelity CasDxl enzyme. To develop a data analysis pipeline for calling SARS-CoV-2 SNP mutations and assigning lineage classifications with the COVID- 19 Variant DETECTR® assay (see Table 6 and FIG. 10F), data collected from SNP synthetic gene fragment controls (n=279) that included all mutational combinations of 452, 484 and 501 (see Materials and Methods) were used. Based on the control sample data, allele discrimination plots were generated (see C. Broccanello et al., Plant Methods 14, 28 (2018); F. E. McGuigan, S. H. Ralston, Psychiatr Genet 12, 133-136 (2002)) to define boundaries that separate the WT and MUT signals (see FIG. 10A). Clear differentiation between WT and MUT signals was observed when plotting the ratio against the average of the WT and MUT transformed values on a mean average (MA) plot (see C. Broccanello et al., Plant Methods 14, 28 (2018); F. E. McGuigan, S. H. Ralston, Psychiatr Genet 12, 133-136 (2002)) (see FIG. 10B), with 100% concordance for SNP identity at positions 452, 484, and 501 for the control samples.
[00533] Performance evaluation of the COVID-19 Variant DETECTR® assay using clinical samples
[00534] Next, a blinded dataset consisting of 93 COVID-19 positive clinical samples (previously analyzed by viral WGS) was assembled and the SNP controls were run in parallel. These samples were extracted, amplified in triplicate RT-LAMP reactions (see FIGS. 8A-8H), and processed further as triplicate CasDxl reactions for each LAMP replicate (see FIGS. 9A-9M). A total of nine replicates were thus generated for each sample to detect WT or MUT SNPs at positions 452, 484, and 501. The DETECTR® data analysis pipeline was then applied to each sample to provide a final lineage categorization (see Table 6, FIGS. 10C-10D and FIG. 10F). For a biological RT-LAMP replicate to be designated as either WT or MUT, the same call needed to be made from all three technical CasDxl replicates (see FIGS. 4A-4C). A final SNP mutation call was made based on more than or equal to 1 of the same calls from the three biological replicates, with replicates that were designated as a No Call ignored (see FIGS. 4A-4C and FIG. 11B). After excluding two samples that were considered invalid because the fluorescence intensity from RT-LAMP amplification did not reach a pre-established threshold determined using receiver-operator characteristic (ROC) curve analysis (see FIGS. 8A-8H and FIG. 11A), a total of 807 CasDxl signals from the 91 remaining clinical samples were evaluated, generating up to 9 replicates for each clinical sample (see FIGS. 4A-4C). Differentiation of WT and MUT signals according to the allele discrimination plots was more pronounced at positions 484 and 501 than position 452 (see FIG. 11C), whereas the MA plots, generated by transforming the data onto M (log ratio) and A (mean average) scales, showed clear separation of WT and MUT calls for all three positions (see FIG. 11D). The variant calls made on each sample were consistent with the difference in median values of the log-transformed signals as determined using the data analysis pipeline (see FIGS. 12A-12M).
[00535] The viral WGS results were then unblinded to evaluate the accuracy of the DETECTR® assay for SNP calls and lineage classification. There were 14 discordant SNP calls out of 272 (94.9% SNP concordance) distributed among 11 clinical samples out of 91 (see FIGS. 13A-13D, FIG. 13E, and FIGS. 13H-13I). Among the 11 discordant samples, one sample (COVID-31) was designated a ‘no call’ at position 452 by viral WGS and thus lacked a comparator, two samples were designated a ‘no call’ due to flat WT and MUT curves (COVID-41 and COVID-73), four samples had similar WT and MUT curve amplitudes, suggesting a mixed population (COVID-03, COVID-56, COVID-61 and COVID-81) (see FIGS. 13A-13D), and four samples had SNP assignments discordant with those from viral WGS (COVID-12, COVID-13, COVID-20 and COVID-63) (see FIGS. 13A-13D).
[00536] Given that the comparison data had been collected over an extended time period, it was suspected that sample stability issues arising from aliquoting and multiple freeze-thaw cycles may have accounted for the observed discrepancies. To further investigate this possibility, the 11 discordant clinical samples were re-extracted from original respiratory swab matrix and re-analyzed by running both viral WGS and the DETECTR® assay in parallel. Re-testing of the samples resulted in nearly complete agreement between the two methods, with the exception of two SNPs that were identified as E484Q in two samples by WGS but were incorrectly called E484 (WT) by the DETECTR® assay (see Table 7, FIG. 13F, and FIGS. 13H-13I). Thus, based on discrepancy testing, the positive predictive agreement (PPA) between the DETECTR® assay and viral WGS at all three WT and MUT SNP positions was 100% (272 of 272, p<2.2e-16 by Fisher’s Exact Test) (see Table 7). The corresponding negative predictive agreement (NPA) was 91.4% as the E484Q mutation for two SNPs was incorrectly classified as WT. Nevertheless, the final viral lineage classification for the 91 samples after discrepancy testing showed 100% agreement with viral WGS (see FIGS. 13G, 13J, and 14A-14D) [00537] Table 7 : Final Positive Predictive Agreement (PPA), Negative Predictive Agreement (NPA) and concordance values for each WT and MUT SNP from the evaluation of the DETECTR® assay against the SARS-CoV-2 WGS comparator assay after discordant samples were resolved
Figure imgf000180_0001
Figure imgf000180_0002
Figure imgf000180_0003
Figure imgf000180_0004
Figure imgf000181_0001
[00538] Discussion
[00539] In this example, a CRISPR based DETECTR® assay was developed for the detection of SARS-CoV-2 variants. Three CRISPR-Casl2 enzymes were evaluated. Based on a head-to-head comparison of these enzymes, clear differences in performance were observed with CasDxl demonstrating the highest fidelity was able to reliably detect all three of the targeted SNPs. A data analysis pipeline was developed to differentiate between WT and MUT signals with the COVID-19 Variant DETECTR® assay, yielding an overall SNP concordance of 100% (272/272 total SNP calls) and 100% agreement with lineage classification compared to viral WSG. Taken together, these findings show robust agreement between the COVID- 19 Variant DETECTR® assay and viral WGS for identification of SNP mutations and variant categorization. Thus, the COVID- 19 Variant DETECTR® assay provided a faster and simpler alternative to sequencing-based methods for COVID- 19 variant surveillance.
[00540] Although CRISPR-based diagnostic assays have been demonstrated for the detection of SARS-CoV-2 variants, these studies have limitations in coverage of circulating lineages and in the extent of clinical sample evaluation. For example, the miSHERLOCK variant assay uses LbCasl2a (NEB) to detect N501Y, E484K and Y144Del covering eight lineages (WA-1, Alpha, Beta, Gamma, Eta, Iota, Mu and Zeta) and was tested only on contrived samples (RNA spiked into human saliva) (see Puig et al., Sci Adv 7, (2021)). Additionally, the SHINEv2 assay uses LwaCasl3a to detect 69/70Del, K417N/T, L452R and 156/157Del + R158G covering eight lineages (WA-1, Alpha, Beta, Gamma, Delta, Epsilon, Kappa and Mu) and was tested with only the 69/70Del gRNAs on 20 Alpha-positive NP clinical samples (see Arizti-Sanz et al., medRxiv, (2021)). In comparison, the COVID-19 Variant DETECTR® assay disclosed herein uses CasDxl to detect N501Y, E484K and L452R covering eleven lineages (WA-1, Alpha, Beta, Gamma, Delta, Epsilon, Eta, Iota, Kappa, Mu and Zeta) and 91 clinical samples representing seven out of the eleven lineages were tested with successful detection of all seven.
[00541] In the near term, the COVID-19 Variant DETECTR® assay described herein can be served as an initial screen for the presence of a rare or novel variant (e.g., carrying both L452R and E484K or carrying all three SNPs) that could be reflexed to viral WGS. As the sequencing capacity for most clinical and public health laboratories is limited, the COVID- 19 Variant DETECTR® assay would thus enable rapid identification of variants circulating in the community to support outbreak investigation and public health containment efforts. In addition, identification of specific mutations associated with neutralizing antibody evasion, such as E484K, could inform patient care with regards to the use of monoclonal antibodies that remain effective in treating the infection. As the virus continues to mutate and evolve, the COVID- 19 Variant DETECTR® assay can be readily reconfigured by validating new pre-amplification LAMP primers and gRNAs that target emerging mutations with clinical and epidemiological significance. For example, the newly emerging Omicron variant, containing at least 30 mutations in the spike protein and 11 mutations in the spike RGD region targeted by the assay, could be detected by increasing degeneracy in the LAMP primers and adding at least one gRNA to be able to distinguish this variant from the others. Over the longer term, a validated CRISPR assay that combines SARS-CoV-2 detection with variant identification would be useful as a tool for simultaneous COVID-19 diagnosis in individual patients and surveillance for infection control and public health purposes.
[00542] Example 3 - SARS-CoV-2 SNP Calling: L452R, E484K, and N501 Y
[00543] The disclosure provides methods for determining L452R, E484K, and N501Y versus wild-type calls within the spike gene of SARS-CoV-2 using an interpretive algorithm for SNP calling in conjunction with a CRISPR-Casl2 based assay. The spike variants L452R, E484K, N501Y are described in Zhang el al., 2021, “Ten emerging SARS-CoV-2 spike variants exhibit variable infectivity, animal tropism, and antibody neutralization,” Commun Biol 4, 1196, which is hereby incorporated by reference.
[00544] In accordance with the interpretive algorithm, a signal dataset was obtained that comprised, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting fluorescent signals for a respective one or more nucleic acid molecules in a biological sample that map to position 452, 484 or 501 of the spike protein arising from RT-LAMP amplification of the CRISPR-Casl2 based DETECTR® assay.
[00545] The plurality of wells comprised a first set of nine wells representing the wild type allele for position 452 of the spike protein. Each well in the first set of nine wells included a first plurality of guide nucleic acids that have the wild type allele for position 452 of the SARS-CoV-2 spike protein. [00546] The plurality of wells further comprised a second set of nine wells representing the L452R allele for the SARS-CoV-2 spike protein. Each well in the second set of nine wells included a second plurality of guide nucleic acids that have the L452R allele for the SARS- CoV-2 spike protein.
[00547] Each respective well in the first set of wells and each respective well in the second set of wells contained a corresponding aliquot of nucleic acid derived from a biological sample for which the determination of a L452R versus a wild-type call within the spike gene of SARS-CoV-2 was sought, as well as a Casl2 protein with trans-cutting activity.
[00548] Each corresponding plurality of reporting fluorescent signals in the signal dataset comprised, for each respective time point in a set of 30 time points, a respective fluorescent reporting signal in the form of a corresponding discrete attribute value arising from RT- LAMP amplification. Each respective time point in the corresponding plurality of reporting signals represented a different 60 second time interval within a 30 minute monitoring time period of the RT-LAMP amplification.
[00549] A determination was made, for each respective well in the plurality of wells, of a corresponding fluorescent signal yield for the respective well using the corresponding plurality of reporting signals for the respective well across the 30 time points. For each well, the signal yield was the maximum signal yield observed across the 30 times points divided by the minimum signal yield observed across the 30 times points. This allowed signal yields from individual wells to be compared to each other.
[00550] Then, a determination was made, for each respective well in the first set of nine wells, of a respective candidate call identity based on relative intensity metric between the corresponding first signal yield for the respective well in the first set of nine wells and a corresponding second signal yield for a corresponding well in the second set of nine wells, thereby obtaining a plurality of candidate call identities. The relative intensity metric has the form: log2(Fy(WT)) / log2(Fy(M)),
[00551] where Fy(WT) was the first corresponding signal yield representing the subject wild type allele, and Fy(M) was the subject mutant allele of the SARS-CoV-2 spike protein. In the case of the first and second set of nine wells, WT was the wild type allele for position 452 of the SARS-CoV-2 spike protein whereas M was the L452R allele for the SARS-CoV-2 spike protein of the SARS-CoV-2 spike protein. There were three possible candidate call identities for each well in the first set of nine wells: wild type for position 452 of the SARS- CoV-2 spike protein, L452R for the SARS-CoV-2 spike protein, or No Call. No Call arises when the relative intensity metric for a given well was between the maximum control signal yield and the minimum control signal yield arising from RT-LAMP assays of control wells on the common plate that are free of nucleic acid derived from the biological sample.
[00552] The first set of nine wells was binned into three bins, each bin consisting of three of the wells from the first set of nine wells. For each respective bin in the set of three bins, a respective first concordance vote across the candidate call identities for wells in the respective bin was made, thereby generating three bin votes, one for each bin. The respective first concordance vote generated a respective bin vote based on a common candidate call identity that was shared by all of the wells in the subset of wells for the respective bin. A respective bin vote was no-call when at least one well in the three wells in the respective bin has a candidate call identity of no-call.
[00553] A second concordance vote was made across the three bin votes for the first set of nine wells, thereby obtaining the mutation call that was one of L452R or wild type for position 452 of the SARS-CoV-2 spike protein for the biological sample. The second concordance vote generated this mutation call based on a common bin vote that was shared by at least two of the three bins.
[00554] Corresponding to L452R, additional sets of nine wells in the common plate respectively represented the wild type allele for position 484 within the spike gene of SARS- CoV-2, the E484K mutation in the spike gene of SARS-CoV-2, the wild type allele for position 501 within the spike gene of SARS-CoV-2, and the N501Y mutation in the spike gene of SARS-CoV-2. Such sets of nine wells were processed in the same manner as the first and second sets of nine wells for the respective wild type allele for position 452 and the L452R allele for the SARS-CoV-2 spike protein of the SARS-CoV-2 spike protein described above.
[00555] Example 4 - SARS-CoV-2 Omicron Variant Detection with High-Fidelity CRISPR-Casl2 Enzyme
[00556] Laboratory tests for the accurate and rapid identification of SARS-CoV-2 variants can potentially guide the treatment of COVID-19 patients and inform infection control and public health surveillance efforts. This example presents the development and validation of a rapid COVID- 19 variant DETECTR® assay incorporating loop-mediated isothermal amplification (LAMP) followed by CRISPR-Casl2 based identification of single nucleotide polymorphism (SNP) mutations in the SARS-CoV-2 spike (S) gene. The assay targeted the L452R, E484K/Q/A, and N501 Y mutations that are associated with nearly all circulating viral lineages and identifies the two currently-circulating variants of concern, Delta and Omicron. In a comparison of three different Casl2 enzymes, the newly identified enzyme CasDxl was able to accurately identify all targeted SNP mutations under the conditions tested. An analysis pipeline for CRISPR-based SNP identification from 139 clinical samples yielded an overall SNP concordance of 98% and agreement with SARS-CoV-2 lineage classification of 138/139 compared to viral whole-genome sequencing. It was also shown that detection of the single E484A mutation was necessary and sufficient to accurately identify Omicron from other major circulating variants in patient samples. These findings demonstrate the utility of CRISPR-based DETECTR® as a faster and simpler diagnostic than sequencing for SARS-CoV-2 variant identification in clinical and public health laboratories.
[00557] CasDxl (Mammoth Biosciences) was evaluated for activity on SARS-CoV-2 Omicron variants using methods substantially similar to those described in Examples 2 and 3.
[00558] SNP differentiation capabilities on synthetic guide nucleic acids
[00559] Guide RNAs (gRNAs) with CasDxl were screened for activity on synthetic gene fragments encoding regions of the SARS-CoV-2 S-gene with either wild-type (WT, SEQ ID NO: 20), E484K mutant (K484, SEQ ID NO: 21), E484A mutant (A484, SEQ ID NO: 43), and E484Q mutant (Q484 SEQ ID NO: 44) sequences at amino acid positions 452, 484, and 501. CasDxl was further evaluated with its cognate gRNAs on synthetic gene fragments with respect to SNP differentiation capabilities.
[00560] WT synthetic S-gene fragment (SEQ ID NO: 20):
[00561] tctgctttactaatgtctatgcagattcatttgtaattagaggtgatgaagtcagacaaatcgctccagggcaaactggaa agattgctgattataattataaattaccagatgattttacaggctgcgttatagcttggaattctaacaatcttgattctaaggttggtggtaatt ataattacctgtatagattgtttaggaagtctaatctcaaaccttttgagagagatatttcaactgaaatctatcaggccggtagcacacctt gtaatggtgttgaaggttttaattgttactttcctttacaatcatatggtttccaacccactaatggtgttggttaccaaccatacagagtagta gtactttcttttgaacttctacatgca
[00562] Mutant synthetic S-gene fragment including mutations on amino acid positions 452, 484 (K484), and 501 (SEQ ID NO: 21):
[00563] tctgctttactaatgtctatgcagattcatttgtaattagaggtgatgaagtcagacaaatcgctccagggcaaactggaa atattgctgattataattataaattaccagatgattttacaggctgcgttatagcttggaattctaacaatcttgattctaaggttggtggtaatt ataattaccggtatagattgtttaggaagtctaatctcaaaccttttgagagagatatttcaactgaaatctatcaggccggtagcacacctt gtaatggtgttaaaggttttaattgttactttcctttacaatcatatggtttccaacccacttatggtgttggttaccaaccatacagagtagta gtactttctttgaacttctacatgca
[00564] Mutant synthetic S-gene fragment including mutations on amino acid position 484 (A484) (SEQ ID NO: 43):
[00565] tctgctttactaatgtctatgcagattcatttgtaattagaggtgatgaagtcagacaaatcgctccagggcaaactggaa aGattgctgattataattataaattaccagatgattttacaggctgcgttatagcttggaattctaacaatcttgattctaaggttggtggtaat tataattaccggtatagattgtttaggaagtctaatctcaaaccttttgagagagatatttcaactgaaatctatcaggccggtagcacacct tgtaatggtgttgCaggttttaattgttactttcctttacaatcatatggtttccaacccacttatggtgttggttaccaaccatacagagtagt agtactttcttttgaacttctacatgca
[00566] Mutant synthetic S-gene fragment including mutations on amino acid position 484 (Q484) (SEQ ID NO: 44):
[00567] tctgctttactaatgtctatgcagattcatttgtaattagaggtgatgaagtcagacaaatcgctccagggcaaactggaa aGattgctgattataattataaattaccagatgattttacaggctgcgttatagcttggaattctaacaatcttgattctaaggttggtggtaat tataattaccggtatagattgtttaggaagtctaatctcaaaccttttgagagagatatttcaactgaaatctatcaggccggtagcacacct tgtaatggtgttCaaggttttaattgttactttcctttacaatcatatggtttccaacccacttatggtgttggttaccaaccatacagagtagt agtactttcttttgaacttctacatgca
[00568] Materials and Methods
[00569] Synthetic gene fragments
[00570] Wild type and mutant synthetic gene fragments (Twist) were PCR amplified using NEB 2x Phusion Master Mix following the manufacturer’s protocol. The amplified product was cleaned using AMPure XP beads following manufacturers protocol at a 0.7X concentration. The product was eluted in nuclease-free water and normalized to 10 nM. The LAMP primers used herein are shown in Tables 3 A and 3B and were synthesized by Eurofins Genomics. The guide RNAs used herein are shown in Tables 4A and 4B and were synthesized by Dharmacon or Synthego. The reporter used herein is shown in Table 5 and was synthesized by IDT. The synthetic gene fragments (SEQ ID Nos. 20, 21, 43, and 44) were synthesized by Twist Biosciences.
[00571] Clinical sample acquisition and extraction
[00572] De-identified residual SARS-CoV-2 RT-PCR positive nasopharyngeal and/or oropharyngeal (NP/OP) swab samples in universal transport media (UTM) or viral transport media (VTM) were obtained from the UCSF Clinical Microbiology Laboratory. All samples were stored in a biorepository according to protocols approved by the UCSF Institutional Review Board (protocol number 10-01116, 11-05519) until processed.
[00573] All NP/OP swab samples obtained from the UCSF Clinical Microbiology Laboratory were pretreated with DNA/RNA Shield (Zymo Research, # R1100-250) at a 1 : 1 ratio. The Mag-Bind Viral DNA/RNA 96 kit (Omega Bio-Tek, # M6246-03) on the KingFisher Flex (Thermo Fisher Scientific, # 5400630) was used for viral RNA extraction using an input volume of 200 pl of diluted NP/OP swab sample and an elution volume of 100 pl. The Taqpath™ COVID-19 RT-PCR kit (Thermo Fisher Scientific) was used to determine the N gene cycle threshold values.
[00574] Heat-inactivated culture acquisition and extraction
[00575] Heat-inactivated cultures of SARS-CoV-2 variants being monitored (VBM), variants of concern (VOC), or variants of interest (VOI) were provided by the California Department of Public Health (CDPH).
[00576] RNA from heat-inactivated SARS-CoV-2 VBM/V OC/V OI isolates were extracted using the EZ1 Virus Mini Kit v2.0 (Qiagen, # 955134) on the EZ1 Advanced XL (Qiagen, # 9001875) according to the manufacturer’s instructions. For each culture, six replicate LAMP reactions were pooled into a single sample. DETECTR® was performed on a 1: 10 dilution of the 10,000 cp/rxn LAMP amplification products.
[00577] COVID-19 Variant DETECTR® assay
[00578] COVID- 19 Variant DETECTR® assay was performed with RT-LAMP followed by a CRISPR-Cas-based assay.
[00579] Two LAMP primer sets, each containing 6 primers, were designed to target the L452R, E484K and N501Y mutations in the SARS-CoV-2 Spike (S) protein (see Table 3A). Sets of LAMP primers were designed from a 350 bp target sequence spanning the 3 mutations using Primer Explorer V5 (available on the Internet at primerexplorer.jp/e/). Candidate primers were manually evaluated for inclusion using the OligoCalc online oligonucleotide properties calculator (see W. A. Kibbe, Nucleic Acids Res 35, W43-46 (2007)) while ensuring that there was no overlap with either primers from the other set or guide RNA target regions that included the L452R, E484(K), and N501 Y mutations. [00580] Multiplexed RT-LAMP was performed using a final reaction volume of 50 pl, which consisted of 8 pl RNA template, 5 pl of L452R primer set (Eurofins Genomics), 5 pl of E484K/N501Y primer set, 17pl of nuclease-free water, Ipl of SYTO-9 dye (ThermoFisher Scientific), and 14pl of LAMP mastermix. Each of the primer sets consisted of 1.6pM each of inner primers FIP and BIP, 0.2 pM each of outer primers F3 and B3, and 0.8 pM each of loop primers LF and LB. The LAMP mastermix contained 6 mM of MgSOi. isothermal amplification buffer at IX final concentration, 1.5 mM of dNTP mix (NEB), 8 units of Bst 2.0 WarmStart DNA Polymerase (NEB), and 0.5 ul of WarmStart RTx Reverse Transcriptase (NEB). Plates were incubated at 65°C for 40 minutes in a real-time Quantstudio™ 5 PCR instrument. Fluorescent signals were collected every 60 seconds.
[00581] Degenerate multiplexed RT-LAMP was performed using a final reaction volume of 65 pl, which consisted of 9.6 pl RNA template, 10 pl of L452R degenerate primer set (Eurofins Genomics), 10 pl of E484(K/Q/A)/N501Y degenerate primer set, 14.1 pl of nuclease-free water, 1.3 pl of SYTO-9 dye (ThermoFisher Scientific), and 20 pl of LAMP mastermix. Two degenerate LAMP primer sets, each containing 6 primers modified from the original LAMP primer sets (see Table 3 A) with degenerate nucleotides, were designed to capture the L452R, E484K, E484Q, E484A, and N501Y mutations in the SARS-CoV-2 Spike (S) protein (see Table 3B). The primer set and mastermix assembly, the incubation and data collection were described above.
[00582] 40nM CasDxl (Mammoth Biosciences) protein targeting the WT or MUT SNP at L452R, E484K, E484Q, E484A, or N501 Y was incubated with 40nM gRNA in IX buffer (MBuffer3 for CasDxl) for 30 min at 37°C. gRNAs with an extra sequence of UAAUUUCUACUAAGUGUAGAU (SEQ ID NO: 126) on the 5 ’end were used with CasDxl (see Tables 4A and 4B). lOOnM ssDNA reporter (SEQ ID NO: 7) was added to the RNA-protein complex. 18pL of this DETECTR® master mix was combined with 2pL target amplicon. The DETECTR® assays were monitored for 30 min at 37°C in a plate reader (Tecan).
[00583] Sequencing methods
[00584] Experiments with sequencing steps were carried out as described in Example 2. Complementary DNA (cDNA) synthesis from RNA via reverse transcription and tiling multiplexed amplicon PCR were performed using SARS-CoV-2 primers version 3 according to the Artic protocol (see Plitnick, J. et al., J Clin Microbiol 59, e0064921 (2021); and Quick, J. et al., Nat Protoc 12, 1261-1276 (2017)). Libraries were constructed by ligating adapters to the amplicon products using NEBNext Ultra II DNA Library Prep Kit for Illumina (New England Biolabs, # E7645L), barcoding using NEBNext Multiplex Oligos for Illumina (New England Biolabs, # E6440L), and purification with AMPure XP (Beckman-Coulter, # 63880). Final pooled libraries were sequenced on either Illumina MiSeq or NextSeq 550 as 2x150 single-end reads (300 cycles).
[00585] SARS-CoV-2 viral genome assembly and variant analyses were performed using an in-house bioinformatics pipeline. Briefly, sequencing reads generated by Illumina sequencers (MiSeq or NextSeq 550) were demultiplexed and converted to FASTQ files using bcl2fastq (v2.20.0.422). Raw FASTQ files were first screened for SARS-CoV-2 sequences using BLASTn (BLAST+ package 2.9.0) alignment against the Wuhan-Hu-1 SARS-CoV-2 viral reference genome (NC_045512). Reads containing adapters, the ARTIC primer sequences, and low-quality reads were filtered using BBDuk (version 38.87) and then mapped to the NC_045512 reference genome using BBMap (version 38.87). Variants were called with CallVariants and iVar (version 1.3.1) and a depth cutoff of 5 was used to generate the final assembly. Pangolin software (version 3.1.17, see A. O'Toole et al., Virus Evol 7, veab064 (2021); and A. Rambaut et al., Nat Microbiol 5, 1403-1407 (2020)) was used to identify the lineage. Using a custom in-house script, consensus FASTA files generated by the genome assembly pipeline were scanned to confirm L452R, E484K, E484Q, E484A, and N501Y mutations.
[00586] Discordant sample retesting
[00587] The discordant samples for Examples 2 and 3 (n=16) were re-extracted as described above for the NP/OP swab samples and evaluated by viral WGS as described above. The extracted nucleic acids were then thawed (incurring an additional freeze/thaw as needed) and amplified using the LAMP protocol described above and evaluated using the COVID- 19 Variant DETECTR® assay as described above.
[00588] DETECTR® data analysis pipeline
[00589] Quality Control Metric for the LAMP Reaction
[00590] Prior to processing DETECTR® data from the clinical samples, data indicating the success or failure of the samples to amplify in the LAMP reaction were collected. The absolute truth was based on visual inspection of LAMP curves. This absolute truth was used to develop thresholds for the LAMP reactions. The positive and negative controls from the LAMP reactions were used to derive the thresholds to qualify the samples. Two sets of thresholds were used: time threshold and fluorescence rate threshold. The positive LAMP controls were assumed to represent an ideal sample and displayed a classic sigmoidal rise of fluorescence over time and the NTC represented the background fluorescence. It was hypothesized that a sample would ideally have positive control like fluorescence kinetics. However, due to the presence of high background in some samples, a mean value between controls for each plate was chosen as threshold. After this, the fluorescence values at a time threshold of 18 minutes were collected. The time point is of importance to rule out those samples that would amplify closer to the endpoint, signifying the LAMP intermediates to be the majority contributors of the rise in the signal and not the actual sample itself. A score was assigned for each sample which was calculated as a ratio of rate of fluorescence rate threshold to the rate of fluorescence value at 18 minutes for each sample. The hypothesis was that if this ratio of rate of fluorescence between controls and samples was less than 1, then the samples had failed to reach the minimum fluorescence required to be called out as amplified. If the ratio was greater than or equal to 1, then the samples had amplified sufficiently. To identify the exact score value for a qualitative QC metric, an ROC analysis was done on scores and the absolute truth (see FIG. 11 A).
[00591] Data analysis for CRISPR-based SNP calling
[00592] As described in Examples 2 and 3, each well had a guide specific to the mutant or the wild-type SNP. The comparison was important to assign a genotypic call to the sample. The DETECTR® reactions across the plate were not comparable to each other. For this purpose, the endpoint fluorescence intensities were normalized in each well to its own minimum intensity, which was defined as fluorescence yield. The fluorescence yield was compared across wells in a plate under the assumption that each well would have a similar minimum fluorescence starting point. Irrespective of the highest levels of the fluorescence intensities observed across samples, without being limited to any one theory of operation, the yield for a given target will generally remain the same assuming that similar concentrations of samples/target are being compared. This aided in normalizing the signal and comparing replicates across the wells in the same plate.
Fy = max(F)/min(F)
[00593] Without being limited to any one theory of operation, the wild-type and mutant target guides on NTC generally do not show any change in intensity over time. Thus, the fluorescence yield for NTC remains constant across replicates and plates, and moreover is close to 1.
Fy(NTC) = 1
[00594] On the contrary, if a sample has a fluorescence yield ~ 1, then it qualifies for a No Call.
[00595] Variant calling was performed using the following general rules: 1. No template controls were assigned NTC; 2. If the contrast of the sample for a SNP was between minimum and maximum contrast for the plate, then the sample was assigned a NoCall; and 3. If the size of the sample was lower than the size of the NTC on the plate, then the sample was assigned a NoCall.
Cmin(NTC-snp)<=C(sample-snp)<=Cmax(NTC-snp) -* NoCall
Smin(NTC-snp)<=S(sample-snp)<=Smax(NTC-snp) — NoCall log2(Fy(WT)) > log2(Fy(M)) - Wild Type log2(Fy(WT)) < log2(Fy(M)) -Mutant
[00596] In cases where more than one mutant at a particular position was being analyzed for, such as in the case of the Omicron variant of SARS-CoV-2, the signals of each mutant was compared with the wild-type signal. If there existed a mutant, then among (n) comparisons for n mutants and one wild type, one of the comparisons would yield a mutant call. log2(Fy(WT)) > log2(Fy(Ml)) - Wild Type log2(Fy(WT)) > log2(Fy(M2)) - Wild Type log2(Fy(WT)) > log2(Fy(M3)) - Wild Type log2(Fy(WT)) < log2(Fy(M4)) -Mutant (4)
[00597] If there was a tie in the above logic between mutant and wild-type, then a tie breaker comparison would yield a final result. log2(Fy(WT)) > log2(Fy(Ml)) - Wild Type log2(Fy(WT)) < log2(Fy(M2)) -Mutant (2) log2(Fy(WT)) > log2(Fy(M3)) - Wild Type log2(Fy(WT)) < log2(Fy(M4)) -Mutant (4) log2(Fy(M2)) < log2(Fy(M4)) —Mutant (4)
[00598] SNP calls
[00599] The following procedure, similar to that described in Examples 2 and 3, was used to evaluate the concordance between sequencing and DETECTR® technologies for genotypic classification of the clinical cohort dataset.
[00600] First, all samples and SNPs for which both sequencing and DETECTR® data were present in the distributed files were considered by matching the SNP IDs and sample names. This included cleaning and curing the dataset which had failed LAMP reactions and identifying WT and MUT based on the spacer fluorescent. Samples that had failed to amplify in the LAMP reaction but were assigned a genotype were eliminated as described in Example 2. From the second cohort of 48 samples, 102 calls were made. These 48 samples were initially assay for SNP calls at position 484 and the variant was determined from there if possible as described herein (e.g., presence of the E484A mutant at position 484 indicated the Omicron variant). When the variant could not be called based on the SNP call of position 484, SNP calls for positions 452 and 501 were combined with that of position 484 to make the final variant call. The resulting final analysis data consisted of 102 calls (WT, K484, A484, Q484, R452, Y501, or No Call) spread across 3 SNPs and 48 samples. For each of the 3 SNPs in the analysis data set, both sequencing and DETECTR® genotypes (including NoCalls and LAMP Fails) were identified and recorded for each of the 48 patients. The 48 patients included the individuals for whom actual sequencing data was available.
[00601] Statistical Analysis
[00602] SNP calls
[00603] Statistical analysis for the experiments described herein were done as follows. For each SNP in the analysis of genotypic calls, a variety of statistics evaluating the concordance between genotype calls were computed on the two different technologies. The concordant and discordant genotypes were visualized through contingency tables. For each SNP, there are three possible genotypes (Wild-type, Mutant and No Call). For the SNP at position 484, the Mutant genotype may be any one of K484, A484, or Q484. The concordance rates were calculated without the samples that failed the LAMP reaction (see Table 7 and FIGS. 14A- 14D). The 2X2 cross tables classify all three SNPs (at positions 452, 484, 501) across all the samples between sequencing and DETECTR® technologies (see Table 7 and FIGS. 14A- 14D). The data transformation and statistical analysis was done in R (see R Core Team. R: A Language and Environmental for Statistical Computing, (Vienna, Austria, 2018), available on the Internet at R-proj ect.org, date accessed: 11/26/21).
[00604] Results
[00605] Update of the COVID- 19 Variant DETECTR® assay for distinct 484 mutant SNP detection
[00606] In November 2021, a new SARS-CoV-2 variant was identified and almost immediately designated a variant of concern, called Omicron. The Omicron variant carries an exceptionally high number of mutations (>30) within the S-gene and has been shown to have enhanced transmissibility and immune evasion. The record number of COVID- 19 cases globally from Omicron and loss of activity by certain therapeutic antibodies underscores the need for rapid and targeted identification of SARS-CoV-2 variants. Although the TaqPath PCR assay with S-gene Target Failure (SGTF) has functioned as a screen that can be reflexed to sequencing to identify the Omicron variant, the SGTF assay alone cannot differentiate between Omicron BA.1 and Alpha and cannot identify emerging variants that lack the SGTF, such as the Omicron BA.2 sublineage. The COVID-19 variant DETECTR® assay described in Example 2 was reconfigured for the identification of Omicron by targeting the E484A mutation, which alone differentiates Omicron from all other current VBM/VOI/VOC. Given that E484-related mutations are present in multiple circulating variants and have a strong effect on reducing antibody neutralization, a panel of CasDxl gRNAs to detect all relevant mutations (E, K, Q and A) at amino acid position 484 were also developed (see Tables 4A and 4B). Guide RNAs (gRNAs) with CasDxl were screened for activity on synthetic gene fragments encoding regions of the SARS-CoV-2 S-gene with either wild-type (WT, SEQ ID NO: 20), E484K mutant (K484, SEQ ID NO: 21), E484A mutant (A484, SEQ ID NO: 43), and E484Q mutant (Q484 SEQ ID NO: 44) sequences at amino acid positions 452, 484, and 501 (see FIG. 15A). From this initial activity screen, the top-performing gRNAs were identified for each S-gene variant encoding either E484K, E484Q, or E484A (see FIG. 15B). CasDxl was further evaluated with its cognate gRNAs on synthetic gene fragments with respect to SNP differentiation capabilities is illustrated in FIG. 15B.
[00607] Given the highly mutated Omicron S-gene (see FIG. 15A), the original LAMP primer set used in Example 2 (Table 3 A) was tested against a degenerate LAMP primer set (Table 3B) which incorporated degenerate nucleotides within the LAMP primers, as it was suspected that the original LAMP primer set may not have sufficient sensitivity to amplify the targeted spike RGD region. As in Example 2, the degenerative LAMP primer design incorporated two sets of six primers each, with both sets generating overlapping spike RBD amplicons that spanned the L452R, E484K/Q/A, and N501Y mutations. FIG. 16 shows a comparison of RT-LAMP primers for processing Omicron clinical samples. Omicron clinical samples (1802), WT (1804) control RNA, and Alpha (1806) control RNAs were amplified using the original RT-LAMP primer set and degenerate RT-LAMP primer set. The degenerate primers (see Table 3B) amplified both the controls (WT and Alpha) and Omicron samples, whereas the original primer set (see Table 3A) only amplified the control samples (WT and Alpha).
[00608] Within weeks of the first Omicron case in the US, an additional set of 48 clinical samples beyond those of Example 2 were procured and tested. These samples were blinded and processed with the updated DETECTR® assay workflow, which included sample extraction, followed by amplification with the degenerate LAMP primers and detection with each of the 484-specific CasDxl gRNAs. Once processed, a result with mutations K484, Q484, or A484 was called Beta/Eta/Gamma/Iota/Mu/Zeta, Kappa, or Omicron, respectively, as shown in Tables 8 and 9 and FIGS. 15C and 17A-17B.
[00609] Table 8: SARS-CoV-2 Lineage Classification Table Based on 484 Mutations
Figure imgf000194_0001
[00610] Development of an updated data analysis pipeline for calling COVID-19 Variant
SNPs [00611] To develop an updated data analysis pipeline for calling SARS-CoV-2 SNP mutations and assigning lineage classifications with the COVID-19 Variant DETECTR® assay (see Table 8 and FIGS. 17A-17B), data collected from the additional clinical samples (n=48) were used. Allele discrimination plots were generated as described in Example 2 to define boundaries that separate the WT and MUT signals. Clear differentiation between WT and MUT signals was observed when plotting the ratio against the average of the WT and MUT transformed values on a mean average (MA) plot as described in Example 2 for SNP identity at positions 452, 484, and 501 for the control samples. In cases of SNP 484 (n=484), additional mutants were characterized against WT, the ratio of MUT(l) vs. MUT(2), etc. was generated as described above to make the final MUT(l) or MUT(2) call.
[00612] Performance evaluation of the updated COVID- 19 Variant DETECTR® assay using clinical samples
[00613] A blinded dataset consisting of 48 COVID-19 positive clinical samples (previously analyzed by viral WGS) was assembled and the SNP controls were run in parallel. These samples were extracted, amplified in triplicate RT-LAMP reactions, and processed further as triplicate CasDxl reactions for each LAMP replicate. A total of nine replicates were thus generated for each sample to detect WT (E484) or MUT (K484, Q484, or A484) SNPs at positions 484. The DETECTR® data analysis pipeline was then applied to each sample to provide a 484 SNP categorization (see Tables 8 and 9 and FIGS. 15C and 17A-17B). For a biological RT-LAMP replicate to be designated as either WT or MUT, the same call needed to be made from all three technical CasDxl replicates (see FIG. 4A-4C). A final 484 SNP mutation call was made based on more than or equal to 1 of the same calls from the three biological replicates, with replicates that were designated as a No Call ignored (see FIGS. 4A-4C). Once processed, a result with mutations K484, Q484, or A484 was called Beta/Eta/Gamma/Iota/Mu/Zeta, Kappa, or Omicron, respectively, as shown in Tables 8 and 9 and FIGS. 15C and 17A-17B. If the result was associated with E484 (WT), the assay was run using WT and MUT gRNAs at positions 452 and 501 and the DETECTR® data analysis pipeline was applied to provide a final lineage categorization (see Tables 8 and 9 and FIGS. 15C and 17A-17B). Using this workflow, 36 out of 48 total clinical samples were detected: 18/48 resulted as E484 (WT) and were subsequently tested with 452 and 501 gRNAs (3/18 called WT, 6/18 called Alpha and 9/18 called Delta), 4/48 resulted as K484 (called Beta/Eta/Gamma/Iota/Mu/Zeta), 2/48 resulted as Q484 (called Kappa), and 12/48 resulted as A484 (called Omicron) (see FIG. 15C and 17B-15C). The remaining 12/48 clinical samples neither amplified nor showed any DETECTR® signal and were thus called “Not Detected” (see FIG. 18).
[00614] The viral WGS results were unblinded to evaluate the accuracy of the updated DETECTR® assay for SNP calls and lineage classification. Unblinding of samples COVID- 92 through COVID-127 revealed five discordant samples: COVID-103, COVID-108, COVID-109, COVID-112 and COVID-122 (FIG. 17D). All five discordant samples were reextracted from the original patient sample and re-processed with WGS and COVID Variant DETECTR®. After repeat testing, three samples (COVID-103, COVID-108, COVID-109) showed 100% concordance between WGS and DETECTR®, with both methods resulting in “No Call” at position 452. Notably, these samples were also part of the original set of 91 samples (COVID-20, COVID-63, COVID-73) from Example 2 that were previously concordant at position 452, suggesting a decrease in sample integrity likely resulting from multiple freeze/thaw cycles incurred during several re-extractions. Sample COVID-112 was called an Omicron by DETECTR® based on its A484 SNP call, which was confirmed by WGS. Finally, sample COVID-122 could not be amplified by RT-LAMP, also suggesting a loss in sample integrity. Following this discrepancy analysis, an overall SNP concordance of 94.7%, and 100% NPA was demonstrated for this set of 48 samples (Table 9).
[00615] Table 9 : Overall SNP concordance values for the 484 SNP from the evaluation of the DETECTR® assay against the SARS-CoV-2 WGS comparator assay
Figure imgf000196_0001
[00616] Discussion
[00617] In this example, a CRISPR based DETECTR® assay was developed for the detection of SARS-CoV-2 variants. A data analysis pipeline was developed to differentiate between WT and MUT signals with the COVID-19 Variant DETECTR® assay, yielding an overall SNP concordance of 97.9% (373/381 total SNP calls when combined with Example 2) and 99.3% (138/139 when combined with Example 2) agreement with lineage classification compared to viral WSG. Taken together, these findings show robust agreement between the COVID- 19 Variant DETECTR® assay and viral WGS for identification of SNP mutations and variant categorization. Thus, the COVID- 19 Variant DETECTR® assay provided a faster and simpler alternative to sequencing-based methods for COVID- 19 variant diagnostics and surveillance.
[00618] Although CRISPR-based diagnostic assays have been demonstrated for the detection of SARS-CoV-2 variants, these studies have limitations regarding coverage of circulating lineages, the extent of clinical sample evaluation, and/or assay complexity. For example, the miSHERLOCK variant assay uses LbCasl2a (NEB) with RPA preamplification to detect N501Y, E484K and Y144Del covering eight lineages (WA-1, Alpha, Beta, Gamma, Eta, Iota, Mu and Zeta) and was tested only on contrived samples (RNA spiked into human saliva) (see Puig et al., Sci Adv 7, (2021)). The SHINEv2 assay uses LwaCasl3a with RPA pre-amplification to detect 69/70Del, K417N/T, L452R and 156/157Del + R158G covering eight lineages (WA-1, Alpha, Beta, Gamma, Delta, Epsilon, Kappa and Mu) and was tested with only the 69/70Del gRNAs on 20 Alpha-positive NP clinical samples (see Arizti-Sanz et al., medRxiv, (2021)). Finally, the mCARMEN variant identification panel (VIP) uses 26 crRNA pairs with either the LwaCasl3a or LbaCasl3a and PCR pre-amplification to identify all current circulating lineages including Omicron; however, the VIP requires the Fluidigm Biomark HD system or similar, more complex instrumentation for streamlined execution (see Welch et al., medRxiv, (2021)). In comparison, the COVID- 19 Variant DETECTR® assay disclosed herein in Examples 2 and 4 uses CasDxl to detect N501Y, E484K/Q/A, and L452R covering all current circulating lineages and tested on 139 clinical samples representing eight lineages (WA-1, Alpha, Gamma, Delta, Epsilon, Iota, Mu, Omicron). Furthermore, specific Omicron identification was accomplished using only the E484 WT and A484 MUT guides.
[00619] The results of Examples 2-4 show that the choice of Cas enzyme may be important to maximize the accuracy of CRISPR-based diagnostic assays and may need to be tailored to the site that is being targeted. As configured in Example 4 with the L452R, E484K/Q/A and N501Y SNP targets, the COVID- 19 variant DETECTR® assay was capable of distinguishing the Alpha, Delta, Kappa and Omicron variants, but did not resolve the remaining VBMs or VOIs. However, given the rapid emergence and shifts in the distribution of variants over time, it appears likely that tracking of key mutations, many of which are suspected to arise from convergent evolution, rather than tracking of variants, may be more important for surveillance as the pandemic continues. The data analysis pipeline developed for CRISPR-based SNP calling described herein can readily incorporate additional targets and may offer a blueprint for automated interpretation of fluorescent signal patterns. [00620] In the near term, the COVID-19 Variant DETECTR® assay described herein can be served as an initial screen for circulating variants and/or a distinct pattern from a rare or novel variant by interrogating the key 452, 484, and 501 positions that could be reflexed to viral WGS. As the sequencing capacity for most clinical and public health laboratories is limited, the COVID- 19 Variant DETECTR® assay would thus enable rapid identification of variants circulating in the community to support outbreak investigation and public health containment efforts. Identification of specific mutations associated with neutralizing antibody evasion could inform patient care with regards to the use of monoclonal antibodies that remain effective in treating the infection. As the virus continues to mutate and evolve, the COVID-19 Variant DETECTR® assay can be readily reconfigured by validating new preamplification LAMP primers and gRNAs that target emerging mutations with clinical and epidemiological significance. For example, newly emerging variants may contain additional mutations that may not be captured with the resolution of the L452R, E484K/Q/A, and N501Y mutations described in Examples 2-4 or which may result in variable performance at a particular SNP compared to others, and could be detected by incorporation of additional gRNA(s) to provide specific and redundance coverage and/or improve identification of specific lineages. Additionally, incorporation of an additional N-gene target to the assay may improve the limit of detection of the DETECTR® assay, which currently relies entirely on a multiplexed and degenerate S-gene LAMP primer design, optionally to facilitate simultaneous detection and SNP/variant identification. Over the longer term, a validated CRISPR assay that combines SARS-CoV-2 detection with variant identification may offer a faster and simpler alternative to sequencing and would be useful as a tool for simultaneous COVID-19 diagnosis in individual patients and surveillance for infection control and public health purposes.
[00621] CONCLUSION
[00622] All references cited herein are incorporated herein by reference in their entirety and for all purposes to the same extent as if each individual publication or patent or patent application was specifically and individually indicated to be incorporated by reference in its entirety for all purposes.
[00623] Plural instances may be provided for components, operations or structures described herein as a single instance. Finally, boundaries between various components, operations, and data stores are somewhat arbitrary, and particular operations are illustrated in the context of specific illustrative configurations. Other allocations of functionality are envisioned and may fall within the scope of the implementation(s). In general, structures and functionality presented as separate components in the example configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements fall within the scope of the implementation(s).
[00624] It will also be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first subject could be termed a second subject, and, similarly, a second subject could be termed a first subject, without departing from the scope of the present disclosure. The first subject and the second subject are both subjects, but they are not the same subject.
[00625] The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the description of the invention and the appended claims, the singular forms “a,” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
[00626] As used herein, the term “if’ may be construed to mean “when” or “upon” or “in response to determining” or “in response to detecting,” depending on the context. Similarly, the phrase “if it is determined” or “if [a stated condition or event] is detected” may be construed to mean “upon determining” or “in response to determining” or “upon detecting (the stated condition or event (” or “in response to detecting (the stated condition or event),” depending on the context. As used herein, the term “about” in reference to a number or range of numbers is understood to mean the stated number and numbers +/- 10% thereof, or 10% below the lower listed limit and 10% above the higher listed limit for the values listed for a range. [00627] The foregoing description included example systems, methods, techniques, instruction sequences, and computing machine program products that embody illustrative implementations. For purposes of explanation, numerous specific details were set forth in order to provide an understanding of various implementations of the inventive subject matter. It will be evident, however, to those skilled in the art that implementations of the inventive subject matter may be practiced without these specific details. In general, well-known instruction instances, protocols, structures and techniques have not been shown in detail.
[00628] The foregoing description, for the purpose of explanation, has been described with reference to specific implementations. However, the illustrative discussions above are not intended to be exhaustive or to limit the implementations to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The implementations were chosen and described in order to best explain the principles and their practical applications, to thereby enable others skilled in the art to best utilize the implementations and various implementations with various modifications as are suited to the particular use contemplated.

Claims

What is claimed is:
1. A method for determining a mutation call for a target locus in a biological sample, the method comprising: at a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors:
(A) obtaining a signal dataset comprising, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus, wherein: the signal dataset represents a plurality of time points, the plurality of wells comprises a first set of wells representing a first allele for the target locus, wherein each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus, the plurality of wells further comprises a second set of wells representing a second allele for the target locus, wherein each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus, each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value, and each respective well in the first set of wells and each respective well in the second set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample,
(B) determining, for each respective well in the plurality of wells, a corresponding signal yield for the respective well using the corresponding plurality of reporting signals for the respective well across the plurality of time points;
(C) determining, for each respective well in the first set of wells, a respective candidate call identity based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding second signal yield for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities; and
(D) performing a voting procedure across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus.
2. The method of claim 1, wherein the determining (B) further comprises, for each respective well in the plurality of wells, normalizing the respective plurality of reporting signals for the respective well by scaling a maximum discrete attribute value by a minimum discrete attribute value in the corresponding plurality of reporting signals for the respective well, thereby obtaining the corresponding signal yield for the respective well.
3. The method of claim 1 or 2, wherein, for each respective well in the first set of wells, the corresponding signal yield from the corresponding well in the second set of wells originates from a common biological replicate of the biological sample.
4. The method of any one of claims 1-3, wherein the respective candidate call identity is selected from the group consisting of first allele, second allele, and no-call.
5. The method of any one of claims 1-4, wherein the respective candidate call identity is obtained using a relative intensity metric between the corresponding first signal yield and the corresponding second signal yield.
6. The method of claim 5, wherein the relative intensity metric has the form log2(Fy(FAl)) / log2(Fy(SAl)), wherein:
Fy(FAl) is the first corresponding signal yield, and Fy(SAl) is the second corresponding signal yield.
7. The method of claim 6, wherein, when the relative intensity metric is greater than 1, the candidate call identity is first allele.
8. The method of claim 6 or 7, wherein, when the relative intensity metric is less than 1, the candidate call identity is second allele.
9. The method of any one of claims 5-8, wherein the signal dataset further comprises, for each respective control well in a plurality of control wells in the common plate, a corresponding plurality of control reporting signals, wherein: the plurality of control wells are free of nucleic acid derived from the biological sample, the plurality of control wells comprises a first set of control wells representing the first allele for the target locus, wherein each well in the first set of control wells includes a first plurality of guide nucleic acids that have the first allele of the target locus, the plurality of control wells comprises a second set of control wells representing the second allele for the target locus, wherein each well in the second set of control wells includes a second plurality of guide nucleic acids that have the second allele of the target locus, and each corresponding plurality of control reporting signals comprises, for each respective time point in the plurality of time points, a respective control reporting signal in the form of a corresponding discrete attribute value, and wherein: the method further comprises: determining, for each respective control well in the plurality of control wells, a corresponding control signal yield for the respective well using the corresponding plurality of control reporting signals for the respective well across the plurality of time points, thereby obtaining a plurality of control signal yields including a maximum control signal yield and a minimum control signal yield for the common plate, and evaluating whether a respective candidate call identity in the plurality of candidate call identities is a no-call using the maximum control signal yield and the minimum control signal yield.
10. The method of claim 9, wherein the evaluating comprises performing a comparison of
(i) the relative intensity metric between the corresponding first and second signal yields and
(ii) the plurality of control signal yields, wherein, when the relative intensity metric between the corresponding first and second signal yields falls within a range bounded by the maximum control signal yield and the minimum control signal yield, the respective candidate call identity is deemed a no-call.
11. The method of claim 9 or 10, wherein each respective control signal yield is determined by normalizing a maximum discrete attribute value in the corresponding plurality of control reporting signals for the corresponding control well by a minimum discrete attribute value in the corresponding plurality of control reporting signals for the corresponding control well.
12. The method of any one of claims 9-11, further comprising: obtaining a central tendency metric for the corresponding first signal yield and the corresponding second signal yield; and determining, for each respective control well in the first set of control wells, a respective control central tendency metric, thereby obtaining a plurality of control central tendency metrics including a maximum control central tendency metric and a minimum control central tendency metric, and wherein: the evaluating comprises performing a comparison between the central tendency metric of the corresponding first and second signal yields and the control central tendency metric for the first set of control wells, wherein, when the central tendency metric of the corresponding first and second signal yields falls within a range bounded by the maximum control central tendency metric and the minimum control central tendency metric, the respective candidate call identity is deemed a no-call.
13. The method of claim 12, wherein the central tendency metric is calculated as
[log2(Fy(FAl) + Fy(SAl))]/2, wherein,
Fy(FAl) is the corresponding first signal yield, and Fy(SAl) is the corresponding second signal yield.
14. The method of claim 12 or 13, wherein the control central tendency metric is a log average calculated as
[log2(Fc(FAl) + Fc(SAl))]/2 wherein,
Fc(FAl) is a first control signal yield for the respective control well in the first set of control wells, and
Fc(SAl) is a second control signal yield for a corresponding control well in the second set of control wells.
15. The method of any one of claims 1-14, further comprising, prior to the performing (D), binning the first set of wells into a plurality of bins, wherein each respective bin comprises a respective subset of wells originating from a common respective biological replicate of the biological sample, and wherein the voting procedure further comprises:
(i) performing, for each respective bin in the plurality of bins, a respective first concordance vote across the candidate call identities for the subset of wells in the respective
202 bin, thereby generating a plurality of bin votes, and
(ii) applying a second concordance vote across the plurality of bin votes, thereby obtaining the mutation call for the target locus.
16. The method of claim 15, wherein each respective bin vote in the plurality of bin votes is selected from the group consisting of first allele, second allele, and no-call.
17. The method of claim 15 or 16, wherein a respective first concordance vote generates a respective bin vote based on a common candidate call identity that is shared by at least a majority of the subset of wells for the respective bin.
18. The method of claim 15 or 16, wherein a respective first concordance vote generates a respective bin vote based on a common candidate call identity that is shared by all of the wells in the subset of wells for the respective bin.
19. The method of claim 15 or 16, wherein a respective bin vote is no-call when at least one well in the subset of wells for the respective bin has a candidate call identity of no-call.
20. The method of any one of claims 15-19, wherein the second concordance vote generates a mutation call based on a common bin vote that is shared by at least a majority of the bins in the plurality of bins.
21. The method of any one of claims 15-19, wherein the second concordance vote generates a mutation call based on a common bin vote that is shared by all of the bins in the plurality of bins.
22. The method of any one of claims 15-21, wherein each respective bin in the plurality of bins corresponds to a different respective biological replicate of an amplification reaction, and the respective subset of wells for the respective bin is partitioned from the respective biological replicate.
23. The method of any one of claims 15-22, wherein the plurality of bins comprises between 2 and 10 bins.
203
24. The method of any one of claims 15-23, wherein each respective bin in the plurality of bins comprises between 2 and 10 wells in the respective subset of wells.
25. The method of any one of claims 1-24, wherein the signal dataset is obtained by a procedure comprising: amplifying a first plurality of nucleic acids derived from the biological sample, thereby generating a plurality of amplified nucleic acids; and for each respective well in the first set of wells and each respective well in the second set of wells: partitioning, from the plurality of amplified nucleic acids, the respective corresponding aliquot of nucleic acid; contacting the respective corresponding aliquot of nucleic acid with at least a programmable nuclease, a guide nucleic acid, and one or more reporters; and cleaving the one or more reporters using the programmable nuclease if the first allele or the second allele, respectively, is present, thereby generating a reporting signal.
26. The method of claim 25, wherein the programmable nuclease is a trans-cleaving programmable nuclease.
27. The method of claim 25 or 26, wherein the programmable nuclease targets DNA or RNA.
28. The method of any one of claims 25-27, wherein the programmable nuclease is a Cas nuclease.
29. The method of claim 28, wherein the Cas nuclease is selected from the group consisting of a Casl2, Casl3, Cas 14, or CasPhi.
30. The method of claim 28 or 29, wherein the Cas nuclease is LbCasl2a, AsCasl2a, or CasDxl.
31. The method of any one of claims 25-30, wherein, for each respective well in the first set of wells, the programmable nuclease is programmed for the target locus using a corresponding guide nucleic acid in the first plurality of guide nucleic acids that have the first
204 allele of the target locus, and wherein the cleaving the one or more reporters using the programmable nuclease is initiated upon recognition of the target locus by the corresponding guide nucleic acid.
32. The method of claim 31, wherein the first plurality of guide nucleic acids is RNA.
33. The method of claim 31 or 32, wherein the first plurality of guide nucleic acids hybridizes to a wild-type sequence for the target locus.
34. The method of any one of claims 25-33, wherein each respective reporter in the one or more reporters is single-stranded DNA.
35. The method of any one of claims 25-33, wherein each respective reporter in the one or more reporters is single-stranded RNA.
36. The method of any one of claims 25-35, wherein each respective reporter in the one or more reporters comprises a fluorescent reporter linked to a quencher.
37. The method of any one of claims 25-36, wherein the amplifying is performed using isothermal amplification, loop-mediated isothermal amplification (LAMP), reverse transcriptase loop-mediated isothermal amplification (RT-LAMP), recombinase polymerase amplification (RPA), reverse transcriptase recombinase polymerase amplification (RT-RPA), polymerase chain reaction (PCR), or reverse transcriptase polymerase chain reaction (RT- PCR).
38. The method of any one of claims 25-37, wherein the amplifying is performed using a multiplexed amplification reaction.
39. The method of any one of claims 25-38, further comprising, prior to the amplifying, partitioning the first plurality of nucleic acids derived from the biological sample into a plurality of amplification replicates, wherein the amplifying generates a respective plurality of amplified nucleic acids for each respective amplification replicate.
40. The method of claim 39, further comprising, prior to the contacting, partitioning the respective plurality of amplified nucleic acids for each respective amplification replicate into
205 a plurality of detection replicates, wherein each respective detection replicate in the plurality of detection replicates represents a respective well in the plurality of wells.
41. The method of claim 39 or 40, further comprising applying an amplification threshold filter to each respective amplification replicate, wherein, when the respective plurality of amplified nucleic acids fails to satisfy an amplification threshold, the respective amplification replicate is removed from the plurality of amplification replicates.
42. The method of any one of claims 1-41, wherein the first allele is wild-type, the second allele is mutant, and the mutation call is wild-type or mutant.
43. The method of any one of claims 1-42, wherein the first allele is a first mutant, and the second allele is a second mutant.
44. The method of any one of claims 1-43, wherein each respective reporting signal is obtained from a detection moiety.
45. The method of claim 44, wherein each respective reporting signal is a fluorescence emission by a fluorophore, and the corresponding discrete attribute value is a fluorescence intensity.
46. The method of claim 44 or 45, wherein the corresponding discrete attribute value for each respective reporting signal is detected using a plate reader.
47. The method of any one of claims 1-43, wherein the respective reporting signal is a lateral flow readout.
48. The method of claim 47, wherein the lateral flow readout is detected manually by visual inspection.
49. The method of claim 47, wherein the lateral flow readout is detected using an image analysis algorithm.
206
50. The method of any one of claims 1-49, wherein the target locus is selected from the group consisting of: a single nucleotide variant, a multi-nucleotide variant, an indel, a DNA rearrangement, and a copy number variation.
51. The method of any one of claims 1-49, wherein the target locus is selected from the group consisting of: a restriction fragment length polymorphism (RFLP), a random amplified polymorphic DNA (RAPD), an amplified fragment length polymorphism (AFLP), a variable number tandem repeat (VNTR), an oligonucleotide polymorphism (OP), a single nucleotide polymorphism (SNP), an allele specific associated primer (ASAP), an inverse sequence- tagged repeat (ISTR), an inter-retrotransposon amplified polymorphism (IRAP), and a simple sequence repeat (SSR or microsatellite).
52. The method of any one of claims 1-51, wherein the target locus is selected from a database.
53. The method of any one of claims 1-52, wherein the target locus is a gene.
54. The method of any one of claims 1-53, wherein the target locus comprises DNA or RNA.
55. The method of claim 54, wherein the target locus comprises DNA, and the one or more nucleic acid molecules in the biological sample that map to the target locus is obtained from a transcription reaction of a DNA molecule for the target locus.
56. The method of any one of claims 1-55, wherein the plurality of time points comprises between 20 and 2000 time points.
57. The method of any one of claims 1-56, wherein the plurality of time points is obtained over a duration of between 30 seconds and 1 hour.
58. The method of any one of claims 1-57, wherein the biological sample is a clinical sample.
59. The method of any one of claims 1-58, wherein the biological sample is a bodily fluid.
207
60. The method of claim 59, wherein the bodily fluid is sputum, saliva, nasopharyngeal fluid, oropharyngeal fluid or blood.
61. The method of any one of claims 1-60, wherein the biological sample is obtained from a nasopharyngeal or oropharyngeal swab.
62. The method of any one of claims 1-61, wherein the biological sample is obtained from a subject that is infected with a pathogen.
63. The method of claim 62, wherein the pathogen is a virus, bacteria, fungus, or parasite.
64. The method of claim 62 or 63, wherein the pathogen is severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
65. The method of any one of claims 1-64, further comprising heat-inactivating the biological sample.
66. The method of any one of claims 1-65, further comprising isolating the one or more nucleic acid molecules from the biological sample.
67. The method of any one of claims 1-66, wherein the one or more nucleic acid molecules comprises synthetic nucleic acid molecules.
68. A computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors, the one or more programs comprising instructions for performing a method comprising:
(A) obtaining a signal dataset comprising, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus, wherein: the signal dataset represents a plurality of time points, the plurality of wells comprises a first set of wells representing a first allele for the target locus, wherein each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus,
208 the plurality of wells further comprises a second set of wells representing a second allele for the target locus, wherein each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus, each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value, and each respective well in the first set of wells and each respective well in the second set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample,
(B) determining, for each respective well in the plurality of wells, a corresponding signal yield for the respective well using the corresponding plurality of reporting signals for the respective well across the plurality of time points;
(C) determining, for each respective well in the first set of wells, a respective candidate call identity based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding second signal yield for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities; and
(D) performing a voting procedure across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus.
69. A non-transitory computer readable storage medium storing one or more programs configured for execution by a computer, the one or more programs comprising instructions for carrying out a method comprising:
(A) obtaining a signal dataset comprising, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus, wherein: the signal dataset represents a plurality of time points, the plurality of wells comprises a first set of wells representing a first allele for the target locus, wherein each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus, the plurality of wells further comprises a second set of wells representing a second allele for the target locus, wherein each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus,
209 each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value, and each respective well in the first set of wells and each respective well in the second set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample,
(B) determining, for each respective well in the plurality of wells, a corresponding signal yield for the respective well using the corresponding plurality of reporting signals for the respective well across the plurality of time points;
(C) determining, for each respective well in the first set of wells, a respective candidate call identity based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding second signal yield for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities; and
(D) performing a voting procedure across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus.
70. A method for determining a mutation call for a target locus in a biological sample, the method comprising: at a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors:
(A) amplifying a first plurality of nucleic acids derived from the biological sample, thereby obtaining a plurality of amplified nucleic acids;
(B) for each respective well in a plurality of wells in a common plate, partitioning, from the plurality of amplified nucleic acids, a respective corresponding aliquot of nucleic acid derived from the biological sample; and contacting the respective corresponding aliquot of nucleic acid with at least a programmable nuclease, a guide nucleic acid, and a plurality of reporters;
(C) obtaining a signal dataset comprising, for each respective well in the plurality of wells in the common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules derived from the biological sample that map to the target locus, wherein: the signal dataset represents a plurality of time points,
210 the plurality of wells comprises a first set of wells representing a first allele for the target locus, wherein each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus, the plurality of wells further comprises a second set of wells representing a second allele for the target locus, wherein each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus, and each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value obtained from a cleavage of one or more respective reporters, in the plurality of reporters, by the programmable nuclease,
(D) determining, for each respective well in the plurality of wells, a corresponding signal yield for the respective well using the corresponding plurality of reporting signals for the respective well across the plurality of time points;
(E) determining, for each respective well in the first set of wells, a respective candidate call identity based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding second signal yield for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities; and
(F) performing a voting procedure across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus.
71. The method of any one of claims 1-67, further comprising, for each candidate second allele in a plurality of candidate second alleles, repeating the A) obtaining, B) determining, C) determining, and D) performing thereby obtaining a plurality of candidate mutation calls for the target locus, and the method further comprises performing a mutation call voting procedure across the plurality of candidate mutation calls for the target locus, thereby obtaining a final mutation call for the target locus that is one of the candidate mutation calls or the first allele.
72. The method of claim 71, wherein the first allele is a wild-type allele and each respective candidate second allele in the plurality of candidate second alleles is a different respective mutant allele for the target locus.
211
73. The method of claim 71 or 72, wherein, when a single respective candidate mutation call in the plurality of candidate mutation calls corresponds to a respective candidate second allele in the plurality of candidate second alleles, the respective candidate mutation call is determined as the final mutation call for the target locus.
74. The method of claim 71 or 72, wherein, when a sub-plurality of candidate mutation calls in the plurality of candidate mutations calls corresponds to a respective sub-plurality of candidate second alleles in the plurality of candidate second alleles, the obtaining the final mutation call for the target locus is based upon a comparison of signal yields for each pair of candidate second alleles in the sub-plurality of candidate second alleles.
74. The method of any one of claims 71-74 wherein the plurality of candidate second alleles consists of between 2 and 10 candidate second alleles.
75. The method of any one of claims 1-67, wherein the second allele is a first candidate second allele in a plurality of candidate second alleles, the plurality of wells further comprises, for each respective candidate second allele in the plurality of candidate second alleles beyond the first candidate second allele, a respective additional set of wells representing the candidate second allele for the target locus, wherein each well in the respective additional set of wells includes a plurality of guide nucleic acids that have the respective candidate second allele of the target locus, each respective well in each respective additional set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample wherein the C) determining and D) performing is performed for each candidate second allele in the plurality of candidate second alleles, thereby obtaining a plurality of candidate mutation calls for the target locus, and the method further comprises performing a mutation call voting procedure across the plurality of candidate mutation calls for the target locus, thereby obtaining a final mutation call for the target locus that is one of the candidate mutation calls or the first allele.
76. The method of claim 75, wherein the first allele is a wild-type allele and each respective candidate second allele in the plurality of candidate second alleles is a different respective mutant allele for the target locus.
212
77. The method of claim 75 or 76, wherein, when a single respective candidate mutation call in the plurality of candidate mutation calls corresponds to a respective candidate second allele in the plurality of candidate second alleles, the respective candidate mutation call is determined as the final mutation call for the target locus.
78. The method of claim 75 or 76, wherein, when a sub-plurality of candidate mutation calls in the plurality of candidate mutations calls corresponds to a respective sub-plurality of candidate second alleles in the plurality of candidate second alleles, the obtaining the final mutation call for the target locus is based upon a comparison of signal yields for each pair of candidate second alleles in the sub-plurality of candidate second alleles.
79. The method of any one of claims 75-78, wherein the plurality of candidate second alleles consists of between 2 and 10 candidate second alleles.
80. A method for determining a mutation call for a target locus in a biological sample, the method comprising: at a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors:
(A) obtaining a signal dataset comprising, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus, wherein: the signal dataset represents a plurality of time points, the plurality of wells comprises a first set of wells representing a first allele for the target locus, wherein each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus, the plurality of wells further comprises, for each respective candidate second allele in a set of candidate second alleles, a respective additional set of wells representing the respective candidate second allele for the target locus, wherein each well in the respective additional set of wells includes a corresponding plurality of guide nucleic acids that have the respective candidate second allele for the target locus, each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value, and
213 each respective well in the first set of wells and each respective well in each respective additional set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample,
(B) determining, for each respective well in the plurality of wells, a corresponding signal yield for the respective well using the corresponding plurality of reporting signals for the respective well across the plurality of time points;
(C) determining, for each respective well in the first set of wells for each respective candidate second allele in the set of candidate second alleles, a respective candidate call identity based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding respective second signal yield for a corresponding well in the respective additional set of wells for the respective candidate second allele, thereby obtaining a plurality of candidate call identities; and
(D) performing a voting procedure across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus that is one of the candidate second alleles in the set of candidate second alleles or the first allele.
81. The method of claim 80, wherein the first allele is a wild-type allele and each respective candidate second allele in the set of candidate second alleles is a different respective mutant allele for the target locus.
82. The method of claims 80 or 81, wherein the set of candidate second alleles consists of between 2 and 10 candidate second alleles.
83. The method of claim 80, wherein the set of candidate second alleles consists of a single second allele.
84. A method of assaying for a SARS-CoV-2 variant in an individual, the method comprising:
(A) collecting a nasal swab or a throat swab from the individual;
(B) optionally extracting a target nucleic acid comprising a segment of a SARS-CoV- 2 Spike (“S”) gene from the nasal swab or the throat swab;
214 (C) amplifying the target nucleic acid to produce an amplification product, wherein amplifying the target nucleic acid comprises contacting the target nucleic acid to a plurality of LAMP amplification primers;
(D) contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising (i) an effector protein and (ii) a guide nucleic acid comprising a nucleotide sequence that is at least 90% identical to any one of SEQ ID NOS: 1 to 6, 22 to 27, or 40 to 42;
(E) assaying for a change in a signal produced by cleavage of a reporter nucleic acid, wherein the effector protein trans cleaves the reporter nucleic acid upon hybridization of the guide nucleic acid to the target nucleic acid, a segment thereof, or an amplification product thereof; and
(F) determining an SNP call of the sample.
85. The method of claim 84, wherein the nasal swab is a nasopharyngeal swab.
86. The method of claim 84, wherein the throat swab is an oropharyngeal swab.
87. The method of any one of claims 84-86, wherein amplifying the target nucleic acid comprises contacting the target nucleic acid to at least one reagent for amplification.
88. The method of any one of claims 84-87, wherein the amplifying comprises loop mediated amplification (LAMP).
89. The method of any one of claims 87-88, wherein the at least one reagent for amplification comprises a polymerase, dNTPs, or a combination thereof.
90. The method of any one of claims 84-89, wherein the plurality of LAMP amplification primers comprise an FIP primer, a BIP primer, an F3 primer, a B3 primer, an LF primer, and an LB primer.
91. The method of any one of claims 84-90, wherein the amplification primers comprise SEQ ID NOS: 8-13, SEQ ID NOS: 14-19, SEQ ID NOS: 28-33, and/or SEQ ID NOS: 34-39.
215
92. The method of any one of claims 84-91, wherein the amplifying comprises reverse transcription-L AMP .
93. The method of any one of claims 84-92, comprising lysing the sample.
94. The method of claim 93, wherein lysing the sample comprises contacting the sample to a lysis buffer.
95. The method of any one of claims 1-67 or 70-94, wherein determining an SNP call comprises determining whether the segment of the S-gene comprises one or more S-gene mutation(s) relative to a reference wild-type SARS-CoV-2 S-gene.
96. The method of claim 95, wherein the reference wild-type SARS-CoV-2 gene is from the USA-WA1/2020 sequence.
97. The method of claim 95 or 96, wherein the one or more S-gene mutations is a single nucleotide polymorphism (SNP).
98. The method of any one of claims 95-97, wherein the one or more S-gene mutations is associated with one or more Spike protein mutations.
99. The method of claim 98, wherein the one or more Spike protein mutations is (i) a mutation in amino acid position 484 from E to K (E484K), (ii) a mutation in amino acid position 501 from N to Y (N501Y), (iii) a mutation in amino acid position 452 from L to R (L452R), (iv) a mutation in amino acid position 484 from E to Q (E484Q), (v) a mutation in amino acid position 484 from E to A (E484A), or a combination thereof.
100. The method of any one of claims 84-99, wherein determining an SNP call of the sample comprises comparing the signal produced by the cleavage of the reporter nucleic acid by a composition comprising a guide nucleic acid comprising a nucleotide sequence that is least 90% identical to any one of SEQ ID NOS; 1-3 or 22-24 to a signal produced by contacting a composition comprising a guide nucleic acid comprising a nucleotide sequence that is at least 90% identical to any one of SEQ ID NOS: 4-6, 25-27, or 40-42 to a nucleic acid identical to the target nucleic acid.
216
101. The method of any one of claim 1-67 or 70-100, comprising determining a variant call of the sample.
102. The method of claim 101, wherein determining the variant call comprises determining whether the sample comprises a wild-type SARS-CoV-2 or a SARS-CoV-2 variant.
103. The method of claim 100 or 102, wherein the SARS-CoV-2 variant is any one of an Alpha, Beta, Gamma, Delta, Epsilon, Kappa, Omicron, or Zeta SARS-CoV-2 variant.
104. The method of any one of claims 101-103, wherein determining the variant call comprises determining one or more SNP calls of the sample.
105. The method of any one of claims 101-104, wherein determining whether the sample comprises an Alpha SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an N501Y mutation.
106. The method of any one of claims 101-105, wherein determining whether the sample comprises a Beta SARS-CoV-2 variant comprises detecting S-gene mutations associated with E484K and N501Y mutations.
107. The method of any one of claims 101-106, wherein determining whether the sample comprises a Gamma SARS-CoV-2 variant comprises detecting S-gene mutations associated with E484K and N501 Y mutations.
108. The method of any one of claims 101-107, wherein determining whether the sample comprises a Mu SARS-CoV-2 variant comprises detecting S-gene mutations associated with E484K and N501Y mutations.
109. The method of any one of claims 101-108, wherein determining whether the sample comprises a Delta SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an L452R mutation.
217
110. The method of any one of claims 101-109, wherein determining whether the sample comprises a Epsilon SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an L452R mutation.
111. The method of any one of claims 101-110, wherein determining whether the sample comprises a Kappa SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an L452R mutation.
112. The method of any one of claims 101-111, wherein determining whether the sample comprises an Omicron SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an E484A mutation.
113. The method of any one of claims 101-112, wherein determining whether the sample comprises a Zeta SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an E484K mutation.
114. The method of any one of claims 84-113, wherein the effector protein is a Type V Cas effector protein.
115. The method of claim 114, wherein the Type V effector protein is a Cas 12 effector protein, a Cas 13 effector protein, a Cas 14 effector protein, or a CasPhi effector protein.
116. The method of one of claims 84-115, wherein the guide nucleic acid comprises a nucleotide sequence that is at least 95% identical to one of SEQ ID NO: 1 to 6, 22 to 27, or 40 to 42.
117. The method of any one of claims 84-116, wherein the guide nucleic acid comprises a nucleotide sequence that is at least 98% identical to one of SEQ ID NO: 1 to 6, 22 to 27, or 40 to 42.
118. The method of any one of claims 84-117, wherein the guide nucleic acid comprises a nucleotide sequence of one of SEQ ID NO: 1 to 6, 22 to 27, or 40 to 42.
218
119. The method of any one of claims 84- 118, wherein the reporter nucleic acid comprises a nucleotide sequence that is at least 75% identical to SEQ ID NO: 7, wherein the nucleotide sequence is flanked with a fluorescent dye on the 5’ end and a quencher on the 3’ end.
120. The method of any one of claims 84-119, wherein the reporter nucleic acid has a nucleotide sequence of SEQ ID NO: 7 flanked with a fluorescent dye on the 5’ end and a quencher on the 3’ end.
121. A method of assaying for a SARS-CoV-2 variant in an individual, the method comprising:
(A) amplifying a target nucleic acid from a biological sample of the individual, the target nucleic acid comprising a segment of a SARS-CoV-2 Spike (“S”) gene to produce an amplification product, wherein amplifying the target nucleic acid comprises contacting the target nucleic acid to a plurality of LAMP amplification primers;
(B) contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising (i) an effector protein and (ii) a guide nucleic acid comprising a nucleotide sequence that is at least 90% identical to any one of SEQ ID NOS: 1 to 6, 22 to 27, or 40 to 42;
(C) assaying for a change in a signal produced by cleavage of a reporter nucleic acid, wherein the effector protein trans cleaves the reporter nucleic acid upon hybridization of the guide nucleic acid to the target nucleic acid, a segment thereof, or an amplification product thereof;
(D) determining an SNP call of the sample; and
(E) optionally extracting a target nucleic acid from a biological sample obtained from the individual, prior to (A).
PCT/US2022/080553 2021-11-29 2022-11-29 Systems and methods for identifying genetic phenotypes using programmable nucleases WO2023097325A2 (en)

Applications Claiming Priority (10)

Application Number Priority Date Filing Date Title
US202163283936P 2021-11-29 2021-11-29
US202163283930P 2021-11-29 2021-11-29
US63/283,936 2021-11-29
US63/283,930 2021-11-29
US202263305872P 2022-02-02 2022-02-02
US202263305934P 2022-02-02 2022-02-02
US63/305,934 2022-02-02
US63/305,872 2022-02-02
US202263390572P 2022-07-19 2022-07-19
US63/390,572 2022-07-19

Publications (2)

Publication Number Publication Date
WO2023097325A2 true WO2023097325A2 (en) 2023-06-01
WO2023097325A3 WO2023097325A3 (en) 2023-07-06

Family

ID=86540404

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2022/080553 WO2023097325A2 (en) 2021-11-29 2022-11-29 Systems and methods for identifying genetic phenotypes using programmable nucleases

Country Status (1)

Country Link
WO (1) WO2023097325A2 (en)

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2893040B1 (en) * 2012-09-04 2019-01-02 Guardant Health, Inc. Methods to detect rare mutations and copy number variation
EP3931313A2 (en) * 2019-01-04 2022-01-05 Mammoth Biosciences, Inc. Programmable nuclease improvements and compositions and methods for nucleic acid amplification and detection

Also Published As

Publication number Publication date
WO2023097325A3 (en) 2023-07-06

Similar Documents

Publication Publication Date Title
US11584929B2 (en) Methods and compositions for analyzing nucleic acid
Kostyusheva et al. CRISPR-Cas systems for diagnosing infectious diseases
US20220119788A1 (en) Programmable nuclease improvements and compositions and methods for nucleic acid amplification and detection
EA027558B1 (en) Process for multiplex nucleic acid identification
US20230159992A1 (en) High throughput single-chamber programmable nuclease assay
US20240102084A1 (en) Compositions and methods for detection of a nucleic acid
JP2010521156A (en) System and method for detection of HIV drug resistant variants
US20220364159A1 (en) Compositions for detection of dna and methods of use thereof
No et al. Comparison of targeted next-generation sequencing for whole-genome sequencing of Hantaan orthohantavirus in Apodemus agrarius lung tissues
US20220049241A1 (en) Programmable nucleases and methods of use
WO2022133108A2 (en) Methods and compositions for performing a detection assay
EP4156910A2 (en) Programmable nuclease diagnostic device
Li et al. Applications of the CRISPR-Cas system for infectious disease diagnostics
Qian et al. Clustered regularly interspaced short palindromic Repeat/Cas12a mediated multiplexable and portable detection platform for GII genotype Porcine Epidemic Diarrhoea Virus Rapid diagnosis
WO2023097325A2 (en) Systems and methods for identifying genetic phenotypes using programmable nucleases
Wongpalee et al. Highly specific and sensitive detection of Burkholderia pseudomallei genomic DNA by CRISPR-Cas12a
Nan et al. VarLOCK-sequencing independent, rapid detection of SARS-CoV-2 variants of concern for point-of-care testing, qPCR pipelines and national wastewater surveillance
Zhao et al. A novel strategy for the detection of SARS-CoV-2 variants based on multiplex PCR-MALDI-TOF MS
Zhou et al. Sensitive and specific exonuclease III-assisted recombinase-aided amplification colorimetric assay for rapid detection of nucleic acids
Zhang et al. Visualized Genotyping from “Sample to Results” Within 25 Minutes by Coupling Recombinase Polymerase Amplification (RPA) With Allele-Specific Invasive Reaction Assisted Gold Nanoparticle Probes Assembling
Nafea et al. Application of next-generation sequencing to identify different pathogens
WO2024040112A2 (en) Signal amplification assays for nucleic acid detection
WO2023122508A9 (en) Programmable nuclease-based assay improvements
Hsu et al. Application of a non–amplification-based technology to detect invasive fungal pathogens
Pichon et al. Analysis and annotation of genome-wide DNA methylation patterns in two nonhuman primate species using the Infinium Human Methylation 450K and EPIC BeadChips

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22899584

Country of ref document: EP

Kind code of ref document: A2