WO2023154891A2 - Methods and compositions for detecting guanitoxin producing bacteria - Google Patents

Methods and compositions for detecting guanitoxin producing bacteria Download PDF

Info

Publication number
WO2023154891A2
WO2023154891A2 PCT/US2023/062430 US2023062430W WO2023154891A2 WO 2023154891 A2 WO2023154891 A2 WO 2023154891A2 US 2023062430 W US2023062430 W US 2023062430W WO 2023154891 A2 WO2023154891 A2 WO 2023154891A2
Authority
WO
WIPO (PCT)
Prior art keywords
gene
seq
sequence
length
guanitoxin
Prior art date
Application number
PCT/US2023/062430
Other languages
French (fr)
Other versions
WO2023154891A3 (en
Inventor
Stella T. LIMA
Marli F. FIORE
Bradley S. Moore
Timothy R. FALLON
Jonathan R. CHEKAN
Shaun M.k. MCKINNIE
Original Assignee
The Regents Of The University Of California
University Of Sao Paulo
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Regents Of The University Of California, University Of Sao Paulo filed Critical The Regents Of The University Of California
Publication of WO2023154891A2 publication Critical patent/WO2023154891A2/en
Publication of WO2023154891A3 publication Critical patent/WO2023154891A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6888Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
    • C12Q1/689Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for bacteria

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Provided herein, inter alia, are compositions and methods for detecting guanitoxin producing bacteria in an aqueous liquid. The methods provided herein include detecting one or more more guanitoxin biosynthetic genes in the aqueous liquid. Compositions provided herein include one or more nucleic acids at least partially complementary to a guanitoxin biosynthetic gene.

Description

METHODS AND COMPOSITIONS FOR DETECTING GUANITOXIN PRODUCING BACTERIA
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to U.S. Provisional Application No. 63/267,862, filed February 11, 2022, which is hereby incorporated by reference in its entirety and for all purposes.
STATEMENT AS TO RIGHTS TO INVENTIONS MADE UNDER FEDERALLY SPONSORED RESEARCH AND DEVELOPMENT
[0002] This invention was made with government support under ES032056, awarded by the National Institutes of Health (NIH). The government has certain rights in the invention.
REFERENCE TO A "SEQUENCE LISTING," A TABLE, OR A COMPUTER PROGRAM LISTING APPENDIX SUBMITTED AS AN ASCII FILE
[0003] The contents of the electronic sequence listing (048537-651001WO ST26. xml; Size 137,677 bytes; and Date of Creation: February 9, 2023) is hereby incorporated by reference in its entirety.
BACKGROUND OF THE INVENTION
[0004] Freshwater is essential for drinking and agriculture, yet potable watersheds are increasingly impacted by the undesirable high-density growth of algae and/or cyanobacteria.1, 3 Harmful algal blooms (HABs) are symptomatic of ecosystem imbalance, often caused by the varied environmental changes that demonstrate human interference and climate change. HABs are a major issue in marine, brackish, and freshwater systems worldwide. HABs are hazardous and sometimes fatal to human and animal populations, either through toxicity, or by creating ecological conditions, such as oxygen depletion, which can kill fish and other economically or ecologically important organisms. Understanding, monitoring, and remediating harmful algal/cyanobacterial blooms (HABs/cyanoHABs) and their associated toxins is essential to reducing their societal impact.4 Recent scientific and technological advances continueto improve an environmental cyanoHAB detection and prediction.5 7 However, the vast cyanotoxin structural chemodiversity creates challenges in their comprehensive detection and quantification using standard analytical chemistryassays. In contrast, quantitative molecular biological detection of biosynthetic genes via polymerase chain reaction (PCR) provides a multiplexable and cost-effective monitoring strategy to identify the toxic potential of harmful algal blooms (HABs) independent of active toxin synthesis.8
[0005] The biosynthetic gene clusters (BGCs) for important freshwater cyanotoxins like a microcystin,9 a cylindrosperm opsin, 10 a saxitoxin,11 and an anatoxin-a12 have been defined and applied towards detection over the past decades. However, the biosynthetic pathway and genes for a guani toxin (1), the only known natural organophosphate neurotoxin, have yet to be described. Previously known as the anatoxin-a(s),13 a guanitoxin is an irreversible inhibitor of acetylcholinesterase,14 sharing an identical mechanism of action with organophosphates like the synthetic chemical warfare agent sarin and the banned pesticide parathion (FIG. 1 A). The guanitoxin induces an acute neurological toxicity that may lead to rapid death, showing comparable lethality (LD50 = 20 pg/kg i.p.)15 to the saxitoxin, the most potent known cyanotoxin. Sporadic detection in the Americas,15 Europe,16 and Middle East17 coupled with harmful algal bloom-related animal deaths consistent with exposure suggests the guanitoxin might be an under-recognized threat in global watersheds.
[0006] While its unique pharmacology21 and chemical structure22 have been known for decades, the guanitoxin remains largely unmonitored in the environment due to its incompatibility with commonly used analytical detection methods and chemical instability.23 Although previous L-arginine derived metabolites have been isolated from the guanitoxin producing cyanobacteria and incorporated in vivo via stable isotope labeling experiments (FIG. IB),24,25 a lackof knowledge regarding the guanitoxin biosynthesis and accessibility to its stable metabolites as standards has hampered an understanding of its environmental significance.
[0007] Disclosed herein, inter alia, are solutions to these and other problems in the art.
BRIEF SUMMARY OF THE INVENTION
[0008] Provided herein, inter alia, are methods and compositions for detecting guanitoxin producing bacteria in an aqueous liquid. In embodiments, the methods include detecting one or more guanitoxin biosynthetic genes in the aqueous liquid, wherein the one or more guanitoxin biosynthetic gene is GntB, GntC, GntD, GntG, GntE, GntF, GntA, GntI, GntJ, GntT, or a combination thereof. In embodiments, the methods and compositions include one or more nucleic acid each at least partially complementary to a portion of a guanitoxin biosynthetic gene.
[0009] In an aspect is provided a method of detecting guanitoxin-producing bacteria in an aqueous liquid, the method including detecting one or more guanitoxin biosynthetic genes in the aqueous liquid, wherein the one or more guanitoxin biosynthetic genes are GntA, GntB, GntC, GntD, GntE, GntF, GntG, GntH, GntI, GntJ, GntT, or a combination thereof.
[0010] In another aspect is provided a kit for detecting guanitoxin-producing bacteria in an aqueous liquid, the kit including one or more nucleic acids each at least partially complementary to a portion of one or more guanitoxin biosynthetic genes, wherein the one or more guanitoxin biosynthetic genes are GntA, GntB, GntC, GntD, GntE, GntF, GntG, GntH, GntI, GntJ, GntT, or a combination thereof.
[0011] In an aspect is provided a composition including one or more nucleic acids each independently including a sequence having at least 80% identity to any one of SEQ ID NO: 1 to SEQ ID NO:22, wherein each nucleic acid of the one or more nucleic acids is different.
[0012] In an aspect is provided a method for determining cyanob acteri al toxin contamination in a freshwater sample by the detection of a guanitoxin biosynthetic gene sequence in said sample. In embodiments, the guanitoxin biosynthetic gene is GntB, GntC, GntD, GntG, GntE, GntF, GntA, GntI, or GntJ.
BRIEF DESCRIPTION OF THE DRAWINGS
[0013] FIG. 1A shows guanitoxin (1) as a potent cyanob acteri al organophosphate neurotoxin with an anticholinesterase mechanism of action comparable to organophosphate pesticide parathion and chemical warfare agent sarin.
[0014] FIG. IB shows retrobiosynthetic proposal to produce the guanitoxin (1) from a L- arginine via previously isolated cyanobacterial metabolites (51-4-hydroxy-L-arginine (2) and L-enduracididine (3).
[0015] FIGS. 2A-2C show discovery of a candidate gut biosynthetic gene cluster (BGC) via sequencing a guanitoxin producing cyanobacterium. FIG. 2A shows annotation of the assembled 5.24 Mbp Sphaerospermopsis torques-reginae ITEP-024 genome using antiSMASH v6.018 detected 11 candidate BGCs, while the single candidate guanitoxin BGC was identified via colocalization of relevant candidate enzyme activities. Figure produced with Circos vO.69-8.19 FIG. 2B illustrates organization of the guanitoxin BGC. gut locus figure designed from National Center for Biotechnology Information (NCBI) accession CP080598.1, using Clinker v0.0.21 ,20. FIG. 2C shows the guanitoxin biosynthetic pathway in Sphaerospermopsis torques-reginae ITEP-024.
[0016] FIG. 3A shows GntF MN-di methylates primary amine substrate 6 to form tertiary amine 7 in vitro.
[0017] FIG. 3B shows relative intensities of positive mode extracted ion chromatograms were extracted from Hydrophilic interaction liquid chromatography-Mass Spectrometry (HILIC-MS) traces for both GntF substrate 6 and product 7 masses as appropriate ([M+H]+ (115.0978, 143.1291) ± 0.0100 m/z, respectively). Asterisks indicate that the MS intensities are increased ten-fold relative to other traces for improved visualization of compounds with variable ionization efficiencies.
[0018] FIG. 4A shows characterization of GntC/GntD/GntG/GntE biosynthetic enzymes with intermediate 2.
[0019] FIG. 4B shows relative intensities of positive mode extracted ion chromatograms were extracted from Ultra Performance Liquid Chromatography-Mass Spectrometry (UPLC- MS) traces following l-fluoro-2,4-dinitrophenyl- 5-L-alanine amide (L-FDAA; Marfey's reagent) derivatization of primary amine-containing intermediates 2, 3, 4, 6, and glycine ([M+H]+ (443.16; 425.15; 441.15; 367.15; 328.09) ± 0.20 m/z, respectively) after incubation of 2 with GntC, GntC/GntD, or GntC/GntD/GntG/GntE and all necessary cofactors and cosubstrates.
[0020] FIG. 5A shows GntA, GntI, and GntJ construct the anticholinesterase organophosphate pharmacophore of guanitoxin.
[0021] FIG. 5B shows relative intensities of positive mode extracted ion chromatograms were extracted from HILIC-MS for guanitoxin ([M+H]+ 253.1060 ± 0.0100 m/z) following extraction from Sphaerospermopsis torques-reginae ITEP-024 and in vitro incubation of intermediate 7 with GntA/Gntl/GntJ and all necessary cofactors and cosubstrates. [0022] FIG. 5C shows Acetylcholinesterase (AChE) inhibition as assessed via the Bio Vision Acetylcholinesterase Inhibitor Screening Kit (Colorimetric). The inhibition of AChE is coupled to the decreased formation of the yellow TNB chromophore (412 nanometre (nm) absorbance) via the scheme depicted. GntA/Gntl/GntJ reactions were carried out as previously described, diluted between 10 - 200x, and analyzed for AChE inhibition. In situ- generated 8 and 9 showed negligible AChE inhibition at all dilutions tested compared to the reversible inhibitor donepezil positive control. However, potent AChE inhibition was observed following the addition of GntJ (in situ guanitoxin) and showed a decreasing inhibitory effect at a higher reaction dilution, highlighting the significance of (9-methylation for biological activity.
[0023] FIGS. 6A-6B show environmental detection of guanitoxin biosynthetic capability through metagenomic and metatranscriptomic sequencing. FIG. 6A shows geographic sites with literature reports of guanitoxin or detection of the gnt BGC through environmental sequencing datasets. FIG. 6B illustrates the gnt BGC gene structure from metagenomic and metatranscriptomic de novo assemblies of environmental samples. The two metagenomic samples were successfully linked to their respective taxon of origin via a metagenomic assembled genome (MAG) approach.
[0024] FIG. 7 shows the geographic location of guanitoxin-producing Sphaerospermopsis torques-reginae ITEP-024 cyanoHAB bloom. Sphaerospermopsis torques-reginae ITEP-024 bloom localization during the end of summer and beginning of fall 2002, Tapacura Reservoir, Recife, PE - Brazil (8.037° S 35.169° W)87. Photo of the Tapacura Reservoir used with permission from Dr. Cihelio Alves Amorim (Department of Biological Sciences, Middle East Technical University, Ankara, Turkey).
[0025] FIG. 8 shows genome phylogeny for Sphaerospermopsis torques-reginae ITEP- 024. The genome tree is inferred using GTDB-Tk81 by approximately-maximum-likelihood phylogenetic analysis from an aligned concatenated set of 120 single copy marker proteins for Bacteria using Genome Taxonomy Database (GTDB)82. The robustness of the phylogenetic tree was estimated via bootstrap analysis using 1000 replications. Bar: 0.1 changes per nucleotide position. [0026] FIGS. 9A-9B show an identification of guanitoxin biosynthetic intermediates 3, 7, and 8 in Sphaerospermopsis torques-reginae ITEP-024 methanolic culture extracts. Positive mode HILIC-MS chromatograms identify the presence of 3, 7, and 8 in the cyanobacteria culture extract as compared with synthetic standards 3 and 7, as well as in situ enzymegen erated intermediates 7 and 8. FIG. 9A shows the positive mode extracted ion chromatogram (EIC ± 0.0010 m/z) comparison for 3 ([M+H]+ 173.1033 m/z) from the synthetic standard and cyanob acteri al culture extracts. FIG. 9B shows the positive mode extracted ion chromatogram (EIC ± 0.0100 m/z) comparisons for 7 and 8 ([M+H]+ 143.1291, 159.1240 m/z respectively) from synthetic standard 7, enzyme-generated 7 and 8, and cyanob acteri al culture extracts.
[0027] FIGS. 10A-10B show Sphaerospermopsis torques-reginae ITEP-024 genome and guanitoxin biosynthetic gene cluster information. FIG. 10A is the genome representation with 5.2 Mbp of length and guanitoxin biosynthetic gene cluster position in the genome. FIG. 10B is the genome data of Sphaerospermopsis torques-reginae ITEP-024 cyanobacterium.
[0028] FIG. 11 shows a sodium dodecyl-sulfate polyacrylamide gel electrophoresis (SDS- PAGE) analysis of GntA/C/D/E/F/G/I/J proteins. 4-15% Mini-PROTEAN® TGX™ Precast Protein Gels (Bio-Rad) loaded with Precision Plus Protein Dual Color Standards (Bio-Rad) and purified soluble Gnt pathway proteins (2 pg).
[0029] FIGS. 12A-12B show guanitoxin pyridoxal 5 '-phosphate (PLP)-dependent enzymes in a possible reaction with an L-arginine substrate. FIG. 12A illustrates possible products for GntC/GntG/GntE reactions. The reactions were set up as previously described for 5 hours at room temperature. Half of each assay was methanol-quenched, while the other half was derivatized with Marfey’s reagent for optimized retention times and diastereomer separation prior to UPLC-MS analysis. FIG. 12B shows relative intensities of positive mode extracted ion chromatograms were extracted from UPLC-MS traces (EIC ± 0.30 m/z) for non- derivatized arginine ([M+H]+ 175.12 m/z) and derivatized arginine ([M+H]+ 427.14 m/z), or all putative non-derivatized products ([M+H]+ 191.11, 174.08, 190.00, 173.10, 172.10 m/z) or and all putative derivatized products ([M+H]+ 443.16, 425.15 m/z). [0030] FIG. 13A shows GntD does not hydroxylate L-arginine. GntD reactions were set up as previously described and incubated for 4 hours at room temperature. Reactions were derivatized via Marfey’s analysis for optimized retention times prior to UPLC-MS analysis.
[0031] FIG. 13B illustrates relative intensities of positive mode extracted ion chromatograms from UPLC-MS traces (EIC ± 0.30 m/z) for either derivatized L-arginine ([M+H]+ 427.14 m/z) or all putative derivatized products ([M+H]+ 443.16, 459.15, 475.15, 425.15, 441.14 m/z).
[0032] FIGS. 14A-14D show GntC substrate specificity, time course, and PLP-dependence experiments. GntC assays were set up as previously described and incubated at room temperature and aliquots were taken at the time points listed on the figures (panels of FIGS. 14A-14B: 25 h; panel of FIG. 14C: 15 min, 6 h, 25 h; panel of FIG. 14D: 20 h). Reactions were derivatized with Marfey’s reagent for optimized retention times and diastereomer separation prior to UPLC-MS analysis. Relative intensities of positive mode extracted ion chromatograms were extracted from UPLC-MS traces (EIC ± 0.30 m/z) for Marfey derivatized starting material 2 and product 3 ([M+H]+ 443.16, 425.15 m/z respectively) in all traces unless otherwise listed (ie the bottom traces in FIGS. 14A-14B). FIG. 14A illustrates that GntC catalyzes the cyclodehydration of 2 in vitro to produce 3. FIG. 14B shows that GntC shows negligible activity towards epimerized substrate SI-12 and does not produce epimer SI-14. FIG. 14C shows data illustrating that GntC shows a time-dependent increase in activity over the course of the 25 h assay, and FIG. 14D shows that exogenous PLP is not needed for catalysis due to co-purifying with this cofactor.
[0033] FIG. 15 shows gntB-gntC-pCOLADuet-1 vector map. pCOLADuet-1 vector assembled with gntB and gntC genes from the guanitoxin pathway for enduracididine production. The vector is designed for the coexpression of two target genes from a single plasmid, which encodes two multiple cloning sites (MCS) each of which is preceded by a T7 promoter, lac operon and ribosome binding site. The vector has the COLA replicon from Col A ori and kanamycin resistance gene.
[0034] FIG. 16A shows GntBC produces Z-enduracididine (3) in vivo in E. coli. The GntBC in vivo assay was set up as previously described and incubated at 18 °C for five days. A 100 pM internal standard of synthetic 3 was added to pET28a cell lysate to correct for variations in retention time based on media components. In vivo reactions were derivatized with Marfey’s reagent prior to UPLC-MS analysis.
[0035] FIG. 16B, relative intensities of positive mode extracted ion chromatograms were extracted from UPLC-MS traces (EIC ± 0.30 m/z) for Marfey-derivatized GntC product 3 ([M+H]+ 425.15 m/z). The in vivo production of 3 was dependent on the presence of both gntB and gntC genes but was not observed in the gntB-pET28a or empty vector pET28a incubations.
[0036] FIGS. 17A-17D show divergent cyclic arginine amino acid biosyntheses that use PLP-dependent enzymology. Comparison of guanitoxin and previously characterized actinobacterial biosynthetic pathways. FIG. 17A illustrates a portion of the guanotoxin biosynthesis pathway using Sphaerospermopsis torques-reginae (cyanobacteria/ FIG. 17B illustrates the mannopeptimycin biosynthesis using Streptomycin hygroscopicus (actinobacteria) shown in studies of Han et. al., Biochemistry 2015, 54, 7092; Han et al., Biochemistry 2018, 57, 3252; Burroughs et al., Biochemistry, 2013, 52, 4492 and Haltli et al., Chem. Biol., 2005, 12, 1163. FIG. 17C illustrates the viomycin biosynthesis using Streptomyces punices and other sp. (actinobacteria) shown in studies of Yin et al., ChemBioChem, 2004, 5, 1274; Ju et al., ChemBioChem, 2004, 5, 1281; Yin et al., ChemBioChem, 2004, 5, 1278; Fei et al., J.Nat. Prod., 2007, 70, 618 and Barkei et al., ChemBioChem, 2009, 10, 366. FIG. 17D illustrates the steptolidine biosynthesis using Streptomyces lavendulae (actinobacteria) shown in studies of Chang et al., Angew. Chem. Int. Ed., 2014, 53, 1943.
[0037] FIGS. 18A-18C show GntD substrate specificity, time course and dependence experiments. GntD assays were set up as previously described at room temperature, and aliquots were taken at the time points listed on the figures (panels of FIGS. 18A-18B: 15 min., 90 min., 5 h; panel of FIG. 18C: 2 h, 20 h). Reactions were derivatized with Marfey’s reagent for optimized retention times and diastereomer separation prior to UPLC-MS analysis. Relative intensities of positive mode extracted ion chromatograms were extracted from UPLC-MS traces (EIC ± 0.50 m/z) for both Marfey derivatized substrate 3 and product 4 ([M+H]+ 424.15, 441.14 m/z, respectively) in all traces. FIG. 18A shows that GntD rapidly hydroxylates substrate 3 in vitro but FIG. 18B shows negligible activity towards epimer SI-14. FIG. 18C illustrates results from a GntD dependence assay. [0038] FIG. 19A shows the GntE and GntG forward aldol assay. GntE and GntG in vitro aldol reaction dependence assays were set up as previously described and incubated at room temperature for 18 hours. Reactions were derivatized with Marfey’s reagent for optimized retention times prior to UPLC-MS analysis.
[0039] FIG. 19B shows relative intensities of positive mode extracted ion chromatograms were extracted from UPLC-MS traces (EIC ± 0.50 m/z) for Marfey derivatized 4 ([M+H]+ 441.14 m/z). The enzymatically isolated GntD reaction product 4 was compared to the incubation of 6 with one or both GntE/G enzymes, and a no enzyme control. The inclusion of only GntE (- GntG trace) showed production of 4, indicating that GntE may be capable of performing both aldolase and transamination chemistries. In contrast, the presence of GntG only (- GntE trace) shows no 4 production, indicating that this functional promiscuity is limited to GntE.
[0040] FIG. 20A shows GntCDGE in vitro one pot dependence assay. Enzyme assays were set up as previously described and incubated at room temperature for 18 hours. One condition included all enzymes, while other conditions omitted one or more enzymes. Reactions were derivatized with Marfey’s reagent prior to UPLC-MS analysis for improved retention times.
[0041] FIG. 20B illustrates relative intensities of positive mode extracted ion chromatograms from UPLC-MS traces (EIC ± 0.20 m/z) for all potential products 2, 3, 4, glycine, and 6 for all traces ([M+H]+ 443.16, 425.15, 441.14, 328.08, and 367.14 m/z respectively). The reaction progression from 2 to 6 was halted depending on the omission of particular enzymes that corresponded to their native biosynthetic roles. Analogous to the results obtained in FIG. 16A-16B, the omission of aldolase GntG (- GntG trace) showed no significant deviation from the full assay (all component trace) due to the dual functionality of GntE as both an aldolase and transaminase. However, the omission of aminotransferase GntE (- GntE trace) supports the aldolase activity of GntG via conversion of 4 into glycine and 5 (not detected).
[0042] FIG. 21A shows GntA hydroxylates cyclic guanidine substrate 7. GntA reactions were set up as previously described and incubated overnight at 27 °C. Reactions were quenched with acetonitrile and subjected to HILIC-MS analysis. [0043] FIG. 21B illustrates relative intensities of positive mode extracted ion chromatograms from HILIC-MS traces (EIC ± 0.0100 m/z) for the GntA product 8 and substrate 7 masses as appropriate ([M+H]+ 159.1240 and 143.1291 m/z respectively).
[0044] FIG. 22A shows GntAU produce guanitoxin in situ from synthetic substrate 7. The GntA, GntI, and GntJ reactions were set up as previously described beginning with synthetic substrate 7. Reactions were quenched with acetonitrile and subjected to LC-MS analysis.
[0045] FIG. 22B shows GntA and GntI reactions analyzed using the HILIC Method, and relative intensities of positive mode extracted ion chromatograms extracted from HILIC-MS traces (EIC ± 0.0100 m/z) for substrate 7, GntA product 8, and GntI product 9 as appropriate ([M+H]+ 143.1291, 159.1240, and 239.0904 m/z respectively).
[0046] FIG. 22C shows the GntAU coupled reaction, no substrate control, and Sphaerospermopsis torques-reginae ITEP-024 extract analyzed using the reverse phase (RP) method. Relative intensities of positive mode extracted ion chromatograms were extracted from reversed phase-liquid chromatography-mass spectrometry (RP -LC-MS) traces (EIC ± 0.0100 m/z) for the guanitoxin (1) mass ([M+H]+ 253.1060 m/z). Asterisks indicate that the MS intensities are increased 25-fold relative to other traces for improved visualization.
[0047] FIG. 23A shows guanitoxin biosynthetic intermediates from Sphaerospermopsis torques-reginae ITEP-024.
[0048] FIG. 23B-23D illustrates mass spectrometry-mass spectrometry analyses of guanitoxin biosynthetic intermediates. Intermediates 7 (FIG. 23B), 8 (FIG. 23C), and guanitoxin (1) (FIG. 23D) showed diagnostic fragment A (58.0652 m/z) following HILIC- MS/MS analyses.
[0049] FIG. 24 shows phylogenomic tree for taxonomic classification of MAGs based on Genome Taxonomy Database (GTDB)82. Cuspidothrix bin 5 belongs to Amazon River and Aphanizomenon bin 35 belongs to Lake Mendota. The genome tree is generated using GTDB-Tk81 by the identification and alignment of 120 bacterial single-copy conserved marker genes, then inferred the phylogeny of the concatenated sequences with the WAG+GAMMA models and maximum likelihood algorithm. The robustness of the phylogenetic tree was estimated via bootstrap analysis using 1000 replications. Bar: 0.1 changes per nucleotide position. [0050] FIG. 25 shows genome similarity matrix of MAG-assembled gnt-containing cyanobacteria. Similarity between Lake Mendota bin 35 (Aphanizomenon) and all Aphanizomenon available genomes, and Amazon River bin 5 (Cuspidothrix) with the only available Cuspidothrix genome. Average nucleotide identities were calculated with Ortho ANI vl.486.
[0051] FIG. 26 shows the synthetic scheme of primary amine intermediate 6.
[0052] FIG. 27 shows the synthetic scheme of dimethylamine intermediate 7.
[0053] FIG. 28 shows the synthetic scheme of y-hydroxy-L-arginine diastereomers SI-7 and SI-8.
[0054] FIG. 29 shows the synthetic scheme of (5)-y-hydroxy-L-arginine 2.
[0055] FIG. 30 shows the synthetic scheme of L-enduracididine (3).
[0056] FIG. 31 shows the synthetic scheme of (A)-y-hydroxy-L-arginine (SI-12).
[0057] FIG. 32 shows the synthetic scheme of L-allo-enduracididine (SI-14).
DETAILED DESCRIPTION OF THE INVENTION
[0058] Unless specifically defined otherwise, all technical and scientific terms used herein shall be taken to have the same meaning as commonly understood by one of ordinary skill in the art (e.g., in oncology, cell culture, molecular genetics, epigenetics, and biochemistry).
[0059] As used herein, the term “about” in the context of a numerical value or range means ±10% of the numerical value or range recited or claimed, unless the context requires a more limited range.
[0060] In the descriptions above and in the claims, phrases such as “at least one of’ or “one or more of’ may occur followed by a conjunctive list of elements or features. The term “and/or” may also occur in a list of two or more elements or features. Unless otherwise implicitly or explicitly contradicted by the context in which it is used, such a phrase is intended to mean any of the listed elements or features individually or any of the recited elements or features in combination with any of the other recited elements or features. For example, the phrases “at least one of A and B;” “one or more of A and B;” and “A and/or B” are each intended to mean “A alone, B alone, or A and B together.” A similar interpretation is also intended for lists including three or more items. For example, the phrases “at least one of A, B, and C;” “one or more of A, B, and C;” and “A, B, and/or C” are each intended to mean “A alone, B alone, C alone, A and B together, A and C together, B and C together, or A and B and C together.” In addition, use of the term “based on,” above and in the claims is intended to mean, “based at least in part on,” such that an unrecited feature or element is also permissible.
[0061] It is understood that where a parameter range is provided, all integers within that range, and tenths thereof, are also provided by the invention. For example, “0.2-5 mg” discloses 0.2 mg, 0.3 mg, 0.4 mg, 0.5 mg, 0.6 mg etc. up to and including 5.0 mg. Additionally, where two values for a parameter are disclosed, then a range of all values between and including those two values is also disclosed. For example, “1, 2, and 3” discloses, e.g., 1-2, 1-3, and 2-3.
[0062] "Nucleic acid" refers to nucleotides (e.g., deoxyribonucleotides or ribonucleotides) and polymers thereof in either single-, double- or multiple-stranded form, or complements thereof; or nucleosides (e.g., deoxyribonucleosides or ribonucleosides). In embodiments, “nucleic acid” does not include nucleosides. The terms “polynucleotide,” “oligonucleotide,” “oligo” or the like refer, in the usual and customary sense, to a linear sequence of nucleotides. The term “nucleoside” refers, in the usual and customary sense, to a glycosylamine including a nucleobase and a five-carbon sugar (ribose or deoxyribose). Non limiting examples, of nucleosides include, cytidine, uridine, adenosine, guanosine, thymidine and inosine. The term “nucleotide” refers, in the usual and customary sense, to a single unit of a polynucleotide, i.e., a monomer. Nucleotides can be ribonucleotides, deoxyribonucleotides, or modified versions thereof. Examples of polynucleotides contemplated herein include single and double stranded DNA, single and double stranded RNA, and hybrid molecules having mixtures of single and double stranded DNA and RNA. Examples of nucleic acid, e.g. polynucleotides contemplated herein include any types of RNA, e.g. mRNA, siRNA, miRNA, and guide RNA and any types of DNA, genomic DNA, plasmid DNA, and minicircle DNA, and any fragments thereof. The term “duplex” in the context of polynucleotides refers, in the usual and customary sense, to double strandedness. Nucleic acids can be linear or branched. For example, nucleic acids can be a linear chain of nucleotides or the nucleic acids can be branched, e.g., such that the nucleic acids comprise one or more arms or branches of nucleotides. Optionally, the branched nucleic acids are repetitively branched to form higher ordered structures such as dendrimers and the like.
[0063] The terms also encompass nucleic acids containing known nucleotide analogs or modified backbone residues or linkages, which are synthetic, naturally occurring, and non- naturally occurring, which have similar binding properties as the reference nucleic acid, and which are metabolized in a manner similar to the reference nucleotides. Examples of such analogs include, without limitation, phosphodiester derivatives including, e.g., phosphoramidate, phosphorodiamidate, phosphorothioate (also known as phosphothioate having double bonded sulfur replacing oxygen in the phosphate), phosphorodithioate, phosphonocarboxylic acids, phosphonocarboxylates, phosphonoacetic acid, phosphonoformic acid, methyl phosphonate, boron phosphonate, or O-methylphosphoroamidite linkages (see Eckstein, OLIGONUCLEOTIDES AND ANALOGUES: A PRACTICAL APPROACH, Oxford University Press) as well as modifications to the nucleotide bases such as in 5-methyl cytidine or pseudouridine.; and peptide nucleic acid backbones and linkages. Other analog nucleic acids include those with positive backbones; non-ionic backbones, modified sugars, and non-ribose backbones (e.g. phosphorodiamidate morpholino oligos or locked nucleic acids (LNA) as known in the art), including those described in U.S. Patent Nos. 5,235,033 and 5,034,506, and Chapters 6 and 7, ASC Symposium Series 580, CARBOHYDRATE MODIFICATIONS IN ANTISENSE RESEARCH, Sanghui & Cook, eds. Nucleic acids containing one or more carbocyclic sugars are also included within one definition of nucleic acids. Modifications of the ribose-phosphate backbone may be done for a variety of reasons, e.g., to increase the stability and half-life of such molecules in physiological environments or as probes on a biochip. Mixtures of naturally occurring nucleic acids and analogs can be made; alternatively, mixtures of different nucleic acid analogs, and mixtures of naturally occurring nucleic acids and analogs may be made. In embodiments, the intemucleotide linkages in DNA are phosphodiester, phosphodi ester derivatives, or a combination of both.
[0064] Nucleic acids can include nonspecific sequences. As used herein, the term "nonspecific sequence" refers to a nucleic acid sequence that contains a series of residues that are not designed to be complementary to or are only partially complementary to any other nucleic acid sequence. By way of example, a nonspecific nucleic acid sequence is a sequence of nucleic acid residues that does not function as an inhibitory nucleic acid when contacted with a cell or organism.
[0065] A polynucleotide is typically composed of a specific sequence of four nucleotide bases: adenine (A); cytosine (C); guanine (G); and thymine (T) (uracil (U) for thymine (T) when the polynucleotide is RNA). Thus, the term “polynucleotide sequence” is the alphabetical representation of a polynucleotide molecule; alternatively, the term may be applied to the polynucleotide molecule itself. This alphabetical representation can be input into databases in a computer having a central processing unit and used for bioinformatics applications such as functional genomics and homology searching. Polynucleotides may optionally include one or more non-standard nucleotide(s), nucleotide analog(s) and/or modified nucleotides.
[0066] The phrase “stringent hybridization conditions” refers to conditions under which a probe will hybridize to its target subsequence, typically in a complex mixture of nucleic acids, but to no other sequences. Stringent conditions are sequence-dependent and will be different in different circumstances. Longer sequences hybridize specifically at higher temperatures. An extensive guide to the hybridization of nucleic acids is found in Tijssen, Techniques in Biochemistry and Molecular Biology— Hybridization with Nucleic Probes, “Overview of principles of hybridization and the strategy of nucleic acid assays” (1993).
Generally, stringent conditions are selected to be about 5-10°C lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength pH. The Tm is the temperature (under defined ionic strength, pH, and nucleic concentration) at which 50% of the probes complementary to the target hybridize to the target sequence at equilibrium (as the target sequences are present in excess, at Tm, 50% of the probes are occupied at equilibrium). Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide. For selective or specific hybridization, a positive signal is at least two times background, preferably 10 times background hybridization. Exemplary stringent hybridization conditions can be as following: 50% formamide, 5x SSC, and 1% SDS, incubating at 42°C, or, 5x SSC, 1% SDS, incubating at 65°C, with wash in 0.2x SSC, and 0.1% SDS at 65°C. [0067] Nucleic acids that do not hybridize to each other under stringent conditions are still substantially identical if the polypeptides which they encode are substantially identical. This occurs, for example, when a copy of a nucleic acid is created using the maximum codon degeneracy permitted by the genetic code. In such cases, the nucleic acids typically hybridize under moderately stringent hybridization conditions. Exemplary “moderately stringent hybridization conditions” include a hybridization in a buffer of 40% formamide, 1 M NaCl, 1% SDS at 37°C, and a wash in IX SSC at 45°C. A positive hybridization is at least twice background. One of ordinary skill will readily recognize that alternative hybridization and wash conditions can be utilized to provide conditions of similar stringency. Additional guidelines for determining hybridization parameters are provided in numerous references, e.g., Current Protocols in Molecular Biology, ed. Ausubel, et al., supra.
[0068] The term "gene" means the segment of DNA involved in producing a protein; it includes regions preceding and following the coding region (leader and trailer) as well as intervening sequences (introns) between individual coding segments (exons). The leader, the trailer as well as the introns include regulatory elements that are necessary during the transcription and the translation of a gene. Further, a "protein gene product" is a protein expressed from a particular gene (e.g. GntA protein, GntB protein, GntC protein, GntD protein, GntE protein, GntF protein, GntG protein, GntH protein, GntI protein, GntJ protein, or GntT protein). In embodiments, a gene includes a coding sequence, a promoter region sequence, a terminator region sequence, or an intergene region sequence.
[0069] The term “promoter” or “promoter region sequence” refers to a nucleic acid sequence that regulates, either directly or indirectly, the transcription of a corresponding nucleic acid coding sequence to which it is operably linked. The promoter may function alone to regulate transcription, or, in some cases, may act in concert with one or more other regulatory sequences such as an enhancer or silencer to regulate transcription of the transgene. The promoter comprises a DNA regulatory sequence, wherein the regulatory sequence is derived from a gene, which is capable of binding RNA polymerase and initiating transcription of a downstream (3 '-direction) coding sequence.
[0070] The term “terminator” or “terminator region sequence” refers to a nucleic acid sequence that determines the end of a gene during the transcription process. The terminator may inclue a sequence that directly or indirectly releases the transcript RNA from the transcriptional complex. For example, the terminator region sequence may include the sequence that determines the detachment of RNA polymerase from the DNA template strand.
[0071] The term “coding sequence” (CDS), or “coding region” refers to the portion of a gene that codes for protein. For example, the coding sequence may be the DNA or RNA sequence that determines the sequence of amino acids in a protein.
[0072] The term “intergenic region” or “intergenic region sequence” refers to the nucleic acid sequence between genes. An intergenic region sequence in bacteria may be a non-protein coding sequence. For example, an intergenic region sequence may comprise a part of a bacterial genome located between the last nucleotide of a coding region and the first nucleotide of a subsequent coding region.
[0073] The term “complement,” as used herein, refers to a nucleotide (e.g., RNA or DNA) or a sequence of nucleotides capable of base pairing with a complementary nucleotide or sequence of nucleotides. As described herein and commonly known in the art the complementary (matching) nucleotide of adenosine is thymidine and the complementary (matching) nucleotide of guanosine is cytosine. Thus, a complement may include a sequence of nucleotides that base pair with corresponding complementary nucleotides of a second nucleic acid sequence. The nucleotides of a complement may partially or completely match the nucleotides of the second nucleic acid sequence. Where the nucleotides of the complement completely match each nucleotide of the second nucleic acid sequence, the complement forms base pairs with each nucleotide of the second nucleic acid sequence. Where the nucleotides of the complement partially match the nucleotides of the second nucleic acid sequence only some of the nucleotides of the complement form base pairs with nucleotides of the second nucleic acid sequence. Examples of complementary sequences include coding and a non-coding sequences, wherein the non-coding sequence contains complementary nucleotides to the coding sequence and thus forms the complement of the coding sequence. A further example of complementary sequences are sense and antisense sequences, wherein the sense sequence contains complementary nucleotides to the antisense sequence and thus forms the complement of the antisense sequence.
[0074] As described herein the complementarity of sequences may be partial, in which only some of the nucleic acids match according to base pairing, or complete, where all the nucleic acids match according to base pairing. Thus, two sequences that are complementary to each other, may have a specified percentage of nucleotides that are the same (i.e., about 60% identity, preferably 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or higher identity over a specified region).
[0075] The term "amino acid" refers to naturally occurring and synthetic amino acids, as well as amino acid analogs and amino acid mimetics that function in a manner similar to the naturally occurring amino acids. Naturally occurring amino acids are those encoded by the genetic code, as well as those amino acids that are later modified, e.g., hydroxyproline, y- carboxyglutamate, and O-phosphoserine. Amino acid analogs refers to compounds that have the same basic chemical structure as a naturally occurring amino acid, i.e., an a carbon that is bound to a hydrogen, a carboxyl group, an amino group, and an R group, e.g., homoserine, norleucine, methionine sulfoxide, methionine methyl sulfonium. Such analogs have modified R groups (e.g., norleucine) or modified peptide backbones, but retain the same basic chemical structure as a naturally occurring amino acid. Amino acid mimetics refers to chemical compounds that have a structure that is different from the general chemical structure of an amino acid, but that functions in a manner similar to a naturally occurring amino acid. The terms “non-naturally occurring amino acid” and “unnatural amino acid” refer to amino acid analogs, synthetic amino acids, and amino acid mimetics which are not found in nature.
[0076] Amino acids may be referred to herein by either their commonly known three letter symbols or by the one-letter symbols recommended by the IUPAC-IUB Biochemical Nomenclature Commission. Nucleotides, likewise, may be referred to by their commonly accepted single-letter codes.
[0077] The terms "polypeptide," "peptide" and "protein" are used interchangeably herein to refer to a polymer of amino acid residues. In embodiments, the polymer may be conjugated to a moiety that does not consist of amino acids. The terms apply to amino acid polymers in which one or more amino acid residue is an artificial chemical mimetic of a corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers and non-naturally occurring amino acid polymers. A "fusion protein" refers to a chimeric protein encoding two or more separate protein sequences that are recombinantly expressed as a single moiety. [0078] An amino acid or nucleotide base "position" is denoted by a number that sequentially identifies each amino acid (or nucleotide base) in the reference sequence based on its position relative to the N-terminus (or 5'-end). Due to deletions, insertions, truncations, fusions, and the like that must be taken into account when determining an optimal alignment, in general the amino acid residue number in a test sequence determined by simply counting from the N-terminus will not necessarily be the same as the number of its corresponding position in the reference sequence. For example, in a case where a variant has a deletion relative to an aligned reference sequence, there will be no amino acid in the variant that corresponds to a position in the reference sequence at the site of deletion. Where there is an insertion in an aligned reference sequence, that insertion will not correspond to a numbered amino acid position in the reference sequence. In the case of truncations or fusions there can be stretches of amino acids in either the reference or aligned sequence that do not correspond to any amino acid in the corresponding sequence.
[0079] The terms "numbered with reference to" or "corresponding to," when used in the context of the numbering of a given amino acid or polynucleotide sequence, refers to the numbering of the residues of a specified reference sequence when the given amino acid or polynucleotide sequence is compared to the reference sequence. An amino acid residue in a protein "corresponds" to a given residue when it occupies the same essential structural position within the protein as the given residue. One skilled in the art will immediately recognize the identity and location of residues corresponding to a specific position in a protein in other proteins with different numbering systems. For example, by performing a simple sequence alignment with a protein the identity and location of residues corresponding to specific positions of the protein are identified in other protein sequences aligning to the protein. For example, a selected residue in a selected protein corresponds to glutamic acid at position 138 when the selected residue occupies the same essential spatial or other structural relationship as a glutamic acid at position 138. In some embodiments, where a selected protein is aligned for maximum homology with a protein, the position in the aligned selected protein aligning with glutamic acid 138 is the to correspond to glutamic acid 138. Instead of a primary sequence alignment, a three dimensional structural alignment can also be used, e.g., where the structure of the selected protein is aligned for maximum correspondence with the glutamic acid at position 138, and the overall structures compared. In this case, an amino acid that occupies the same essential position as glutamic acid 138 in the structural model is the to correspond to the glutamic acid 138 residue.
[0080] "Conservatively modified variants" applies to both amino acid and nucleic acid sequences. With respect to particular nucleic acid sequences, "conservatively modified variants" refers to those nucleic acids that encode identical or essentially identical amino acid sequences. Because of the degeneracy of the genetic code, a number of nucleic acid sequences will encode any given protein. For instance, the codons GCA, GCC, GCG and GCU all encode the amino acid alanine. Thus, at every position where an alanine is specified by a codon, the codon can be altered to any of the corresponding codons described without altering the encoded polypeptide. Such nucleic acid variations are "silent variations," which are one species of conservatively modified variations. Every nucleic acid sequence herein which encodes a polypeptide also describes every possible silent variation of the nucleic acid. One of skill will recognize that each codon in a nucleic acid (except AUG, which is ordinarily the only codon for methionine, and TGG, which is ordinarily the only codon for tryptophan) can be modified to yield a functionally identical molecule. Accordingly, each silent variation of a nucleic acid which encodes a polypeptide is implicit in each described sequence.
[0081] As to amino acid sequences, one of skill will recognize that individual substitutions, deletions or additions to a nucleic acid, peptide, polypeptide, or protein sequence which alters, adds or deletes a single amino acid or a small percentage of amino acids in the encoded sequence is a "conservatively modified variant" where the alteration results in the substitution of an amino acid with a chemically similar amino acid. Conservative substitution tables providing functionally similar amino acids are well known in the art. Such conservatively modified variants are in addition to and do not exclude polymorphic variants, interspecies homologs, and alleles of the disclosure.
[0082] The following eight groups each contain amino acids that are conservative substitutions for one another:
1) Alanine (A), Glycine (G);
2) Aspartic acid (D), Glutamic acid (E);
3) Asparagine (N), Glutamine (Q); 4) Arginine (R), Lysine (K);
5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V);
6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W);
7) Serine (S), Threonine (T); and
8) Cysteine (C), Methionine (M)
(see, e.g., Creighton, Proteins (1984)).
[0083] The terms "identical" or percent "identity," in the context of two or more nucleic acids or polypeptide sequences, refer to two or more sequences or subsequences that are the same or have a specified percentage of amino acid residues or nucleotides that are the same (i.e., about 60% identity, preferably 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or higher identity over a specified region, when compared and aligned for maximum correspondence over a comparison window or designated region) as measured using a BLAST or BLAST 2.0 sequence comparison algorithms with default parameters described below, or by manual alignment and visual inspection (see, e.g., NCBI web site http://www.ncbi.nlm.nih.gov/BLAST/ or the like). Such sequences are then said to be "substantially identical." This definition also refers to, or may be applied to, the compliment of a test sequence. The definition also includes sequences that have deletions and/or additions, as well as those that have substitutions. As described below, the preferred algorithms can account for gaps and the like. Preferably, identity exists over a region that is at least about 25 amino acids or nucleotides in length, or more preferably over a region that is 50-100 amino acids or nucleotides in length.
[0084] "Percentage of sequence identity" is determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide or polypeptide sequence in the comparison window may comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the result by 100 to yield the percentage of sequence identity.
[0085] A "comparison window", as used herein, includes reference to a segment of any one of the number of contiguous positions selected from the group consisting of, e.g., a full length sequence or from 20 to 600, about 50 to about 200, or about 100 to about 150 amino acids or nucleotides in which a sequence may be compared to a reference sequence of the same number of contiguous positions after the two sequences are optimally aligned. Methods of alignment in length within sequences for comparison are well-known in the art. Optimal alignment in length within sequences for comparison can be conducted, e.g., by the local homology algorithm of Smith and Waterman (1970) Adv. Appl. Math. 2:482c, by the homology alignment algorithm of Needleman and Wunsch (1970) J. Mol. Biol. 48:443, by the search for similarity method of Pearson and Lipman (1988) Proc. Nat’l. Acad. Sci. USA 85:2444, by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., Madison, WI), or by manual alignment and visual inspection (see, e.g., Ausubel et al., Current Protocols in Molecular Biology (1995 supplement)).
[0086] An example of an algorithm that is suitable for determining percent sequence identity and sequence similarity are the BLAST and BLAST 2.0 algorithms, which are described in Altschul et al. (1977) Az/c. Acids Res. 25:3389-3402, and Altschul et al. (1990) J. Mol. Biol. 215:403-410, respectively. Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/). This algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence, which either match or satisfy some positive-valued threshold score T when aligned with a word of the same length in a database sequence. T is referred to as the neighborhood word score threshold (Altschul et al., supra). These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs containing them. The word hits are extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always > 0) and N (penalty score for mismatching residues; always < 0). For amino acid sequences, a scoring matrix is used to calculate the cumulative score. Extension of the word hits in each direction are halted when: the cumulative alignment score falls off by the quantity X from its maximum achieved value; the cumulative score goes to zero or below, due to the accumulation of one or more negativescoring residue alignments; or the end of either sequence is reached. The BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment. The BLASTN program (for nucleotide sequences) uses as defaults a word length (W) of 11, an expectation (E) or 10, M=5, N=-4 and a comparison of both strands. For amino acid sequences, the BLASTP program uses as defaults a word length of 3, and expectation (E) of 10, and the BLOSUM62 scoring matrix (see Henikoff and Henikoff (1989) Proc. Natl. Acad. Set. USA 89: 10915) alignments (B) of 50, expectation (E) of 10, M=5, N=-4, and a comparison of both strands.
[0087] The BLAST algorithm also performs a statistical analysis of the similarity between two sequences (see, e.g., Karlin and Altschul (1993) Proc. Natl. Acad. Set. USA 90:5873- 5787). One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance. For example, a nucleic acid is considered similar to a reference sequence if the smallest sum probability in a comparison of the test nucleic acid to the reference nucleic acid is less than about 0.2, more preferably less than about 0.01, and most preferably less than about 0.001.
[0088] An indication that two nucleic acid sequences or polypeptides are substantially identical is that the polypeptide encoded by the first nucleic acid is immunologically cross reactive with the antibodies raised against the polypeptide encoded by the second nucleic acid, as described below. Thus, a polypeptide is typically substantially identical to a second polypeptide, for example, where the two peptides differ only by conservative substitutions. Another indication that two nucleic acid sequences are substantially identical is that the two molecules or their complements hybridize to each other under stringent conditions, as described below. Yet another indication that two nucleic acid sequences are substantially identical is that the same primers can be used to amplify the sequence.
[0089] A "heme dependent pre-guanitoxin N-hydrolase" or “heme dependent pre- guanitoxin N-hydrolase protein” as referred to herein includes any of the recombinant or naturally-occurring forms of heme dependent pre-guanitoxin N-hydrolase (GntA protein) or variants or homologs thereof that maintain heme dependent pre-guanitoxin N-hydrolase activity (e.g. within at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 100% activity compared to heme dependent pre-guanitoxin N-hydrolase). In some aspects, the variants or homologs have at least 90%, 95%, 96%, 97%, 98%, 99% or 100% amino acid sequence identity across the whole sequence or a portion of the sequence (e.g. a 50, 100, 150 or 200 continuous amino acid portion) compared to a naturally occurring heme dependent pre- guanitoxin N-hydrolase protein (e.g. SEQ ID NO:76). In embodiments, the heme dependent pre-guanitoxin N-hydrolase includes the sequence of SEQ ID NO:76. In embodiments, the heme dependent pre-guanitoxin N-hydrolase is encoded by the sequence of SEQ ID NO:23. In embodiments, the heme dependent pre-guanitoxin N-hydrolase is encoded by the sequence of SEQ ID NO:24. In embodiments, the heme dependent pre-guanitoxin N-hydrolase is encoded by the sequence of SEQ ID NO:25. In embodiments, the heme dependent pre- guanitoxin N-hydrolase is encoded by the sequence of SEQ ID NO:26. In embodiments, the heme dependent pre-guanitoxin N-hydrolase is encoded by the sequence of SEQ ID NO:27.
[0090] A "L-arginine gamma (S) hydroxylase" or “L-arginine gamma (S) hydroxylase protein” as referred to herein includes any of the recombinant or naturally-occurring forms of L-arginine gamma (S) hydroxylase (GntB protein) or variants or homologs thereof that maintain L-arginine gamma (S) hydroxylase activity (e.g. within at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 100% activity compared to L-arginine gamma (S) hydroxylase). In some aspects, the variants or homologs have at least 90%, 95%, 96%, 97%, 98%, 99% or 100% amino acid sequence identity across the whole sequence or a portion of the sequence (e.g. a 50, 100, 150 or 200 continuous amino acid portion) compared to a naturally occurring L-arginine gamma (S) hydroxylase protein (e.g. SEQ ID NO:77). In embodiments, the L-arginine gamma (S) hydroxylase includes the sequence of SEQ ID NO:77. In embodiments, the L-arginine gamma (S) hydroxylase is encoded by the sequence of SEQ ID NO:28. In embodiments, the L-arginine gamma (S) hydroxylase is encoded by the sequence of SEQ ID NO:29. In embodiments, the L-arginine gamma (S) hydroxylase is encoded by the sequence of SEQ ID NO:30. In embodiments, the L-arginine gamma (S) hydroxylase is encoded by the sequence of SEQ ID NO:31. In embodiments, the L-arginine gamma (S) hydroxylase is encoded by the sequence of SEQ ID NO:32. [0091] A "PLP-dependent (S)-gamma-hydroxy-L-arginine cyclodehydratase" or “PLP- dependent (S)-gamma-hydroxy-L-arginine cyclodehydratase protein” as referred to herein includes any of the recombinant or naturally-occurring forms of PLP-dependent (S)-gamma- hydroxy-L-arginine cyclodehydratase (GntC protein) or variants or homologs thereof that maintain PLP-dependent (S)-gamma-hydroxy-L-arginine cyclodehydratase (e.g. within at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 100% activity compared to PLP- dependent (S)-gamma-hydroxy-L-arginine cyclodehydratase). In some aspects, the variants or homologs have at least 90%, 95%, 96%, 97%, 98%, 99% or 100% amino acid sequence identity across the whole sequence or a portion of the sequence (e.g. a 50, 100, 150 or 200 continuous amino acid portion) compared to a naturally occurring PLP-dependent (S)- gamma-hydroxy-L-arginine cyclodehydratase protein (e.g. SEQ ID NO:78). In embodiments, the PLP-dependent (S)-gamma-hydroxy-L-arginine cyclodehydratase includes the sequence of SEQ ID NO:78. In embodiments, the PLP-dependent (S)-gamma-hydroxy- L-arginine cyclodehydratase is encoded by the sequence of SEQ ID NO:33. In embodiments, the PLP-dependent (S)-gamma-hydroxy-L-arginine cyclodehydratase is encoded by the sequence of SEQ ID NO:34. In embodiments, the PLP-dependent (S)-gamma-hydroxy-L- arginine cyclodehydratase is encoded by the sequence of SEQ ID NO:35. In embodiments, the PLP-dependent (S)-gamma-hydroxy-L-arginine cyclodehydratase is encoded by the sequence of SEQ ID NO:36. In embodiments, the PLP-dependent (S)-gamma-hydroxy-L- arginine cyclodehydratase is encoded by the sequence of SEQ ID NO:37.
[0092] A "L-enduracididine beta-hydroxylase" or “L-enduracididine beta-hydroxylase protein” as referred to herein includes any of the recombinant or naturally-occurring forms of L-enduracididine beta-hydroxylase (GntD protein) or variants or homologs thereof that maintain L-enduracididine beta-hydroxylase (e.g. within at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 100% activity compared to L-enduracididine beta-hydroxylase). In some aspects, the variants or homologs have at least 90%, 95%, 96%, 97%, 98%, 99% or 100% amino acid sequence identity across the whole sequence or a portion of the sequence (e.g. a 50, 100, 150 or 200 continuous amino acid portion) compared to a naturally occurring L- enduracididine beta-hydroxylase protein (e.g. SEQ ID NO:79). In embodiments, the L- enduracididine beta-hydroxylase includes the sequence of SEQ ID NO:79. In embodiments, the L-enduracididine beta-hydroxylase is encoded by the sequence of SEQ ID NO:38. In embodiments, the L-enduracididine beta-hydroxylase is encoded by the sequence of SEQ ID NO:39. In embodiments, the L-enduracididine beta-hydroxylase is encoded by the sequence of SEQ ID NO:40. In embodiments, the L-enduracididine beta-hydroxylase is encoded by the sequence of SEQ ID NO:41. In embodiments, the L-enduracididine beta-hydroxylase is encoded by the sequence of SEQ ID NO:42.
[0093] A "PLP-dependent transaminase" or “PLP-dependent transaminase protein” as referred to herein includes any of the recombinant or naturally-occurring forms of PLP- dependent transaminase (GntE protein) or variants or homologs thereof that maintain PLP- dependent transaminase (e.g. within at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 100% activity compared to PLP-dependent transaminase). In some aspects, the variants or homologs have at least 90%, 95%, 96%, 97%, 98%, 99% or 100% amino acid sequence identity across the whole sequence or a portion of the sequence (e.g. a 50, 100, 150 or 200 continuous amino acid portion) compared to a naturally occurring PLP-dependent transaminase protein (e.g. SEQ ID NO:80). In embodiments, the PLP-dependent transaminase includes the sequence of SEQ ID NO:80. In embodiments, the PLP-dependent transaminase is encoded by the sequence of SEQ ID NO:43. In embodiments, the PLP- dependent transaminase is encoded by the sequence of SEQ ID NO:44. In embodiments, the PLP-dependent transaminase is encoded by the sequence of SEQ ID NO:45. In embodiments, the PLP-dependent transaminase is encoded by the sequence of SEQ ID NO:46. In embodiments, the PLP-dependent transaminase is encoded by the sequence of SEQ ID NO:47.
[0094] A "pre-guani toxin forming N-methyltransferase" or “pre-guanitoxin forming N- methyltransferase protein” as referred to herein includes any of the recombinant or naturally- occurring forms of pre-guanitoxin forming N-methyltransferase (GntF protein) or variants or homologs thereof that maintain pre-guanitoxin forming N-methyltransferase (e.g. within at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 100% activity compared to pre- guanitoxin forming N-methyltransferase). In some aspects, the variants or homologs have at least 90%, 95%, 96%, 97%, 98%, 99% or 100% amino acid sequence identity across the whole sequence or a portion of the sequence (e.g. a 50, 100, 150 or 200 continuous amino acid portion) compared to a naturally occurring pre-guanitoxin forming N-methyltransferase protein (e.g. SEQ ID NO: 81). In embodiments, the pre-guanitoxin forming N- m ethyltransferase includes the sequence of SEQ ID NO:81. In embodiments, the pre- guanitoxin forming N-methyltransferase is encoded by the sequence of SEQ ID NO:48. In embodiments, the pre-guanitoxin forming N-methyltransferase is encoded by the sequence of SEQ ID NO:49. In embodiments, the pre-guanitoxin forming N-methyltransferase is encoded by the sequence of SEQ ID NO:50. In embodiments, the pre-guanitoxin forming N- methyltransferase is encoded by the sequence of SEQ ID NO:51. In embodiments, the pre- guanitoxin forming N-methyltransferase is encoded by the sequence of SEQ ID NO:52.
[0095] A "PLP-dependent aldolase" or “PLP-dependent aldolase protein” as referred to herein includes any of the recombinant or naturally-occurring forms of PLP-dependent aldolase (GntG protein) or variants or homologs thereof that maintain PLP-dependent aldolase (e.g. within at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 100% activity compared to PLP-dependent aldolase). In some aspects, the variants or homologs have at least 90%, 95%, 96%, 97%, 98%, 99% or 100% amino acid sequence identity across the whole sequence or a portion of the sequence (e.g. a 50, 100, 150 or 200 continuous amino acid portion) compared to a naturally occurring PLP-dependent aldolase protein (e.g. SEQ ID NO:82). In embodiments, the PLP-dependent aldolase includes the sequence of SEQ ID NO:82. In embodiments, the PLP-dependent aldolase is encoded by the sequence of SEQ ID NO:53. In embodiments, the PLP-dependent aldolase is encoded by the sequence of SEQ ID NO:54. In embodiments, the PLP-dependent aldolase is encoded by the sequence of SEQ ID NO:55. In embodiments, the PLP-dependent aldolase is encoded by the sequence of SEQ ID NO:56. In embodiments, the PLP-dependent aldolase is encoded by the sequence of SEQ ID NO:57.
[0096] A "MBL fold metallo-hydrolase" or “MBL fold metallo-hydrolase protein” as referred to herein includes any of the recombinant or naturally-occurring forms of MBL fold metallo-hydrolase (GntH protein) or variants or homologs thereof that maintain MBL fold metallo-hydrolase (e.g. within at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 100% activity compared to MBL fold metallo-hydrolase). In some aspects, the variants or homologs have at least 90%, 95%, 96%, 97%, 98%, 99% or 100% amino acid sequence identity across the whole sequence or a portion of the sequence (e.g. a 50, 100, 150 or 200 continuous amino acid portion) compared to a naturally occurring MBL fold metallo- hydrolase protein (e.g. SEQ ID NO:83). In embodiments, the MBL fold metallo-hydrolase includes the sequence of SEQ ID NO: 83. In embodiments, the MBL fold metallo-hydrolase is encoded by the sequence of SEQ ID NO:56. In embodiments, the MBL fold metallo- hydrolase is encoded by the sequence of SEQ ID NO:57. In embodiments, the MBL fold metallo-hydrolase is encoded by the sequence of SEQ ID NO:58. In embodiments, the MBL fold metallo-hydrolase is encoded by the sequence of SEQ ID NO:59. In embodiments, the MBL fold metallo-hydrolase is encoded by the sequence of SEQ ID NO:60. In embodiments, the MBL fold metallo-hydrolase is encoded by the sequence of SEQ ID NO:61. In embodiments, the MBL fold metallo-hydrolase is encoded by the sequence of SEQ ID NO:62.
[0097] A "pre-guani toxin N-oxide kinase" or “pre-guanitoxin N-oxide kinase protein” as referred to herein includes any of the recombinant or naturally-occurring forms of pre- guanitoxin N-oxide kinase (GntI protein) or variants or homologs thereof that maintain pre- guanitoxin N-oxide kinase (e.g. within at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 100% activity compared to pre-guanitoxin N-oxide kinase). In some aspects, the variants or homologs have at least 90%, 95%, 96%, 97%, 98%, 99% or 100% amino acid sequence identity across the whole sequence or a portion of the sequence (e.g. a 50, 100, 150 or 200 continuous amino acid portion) compared to a naturally occurring pre-guanitoxin N-oxide kinase protein (e.g. SEQ ID NO:84). In embodiments, the pre-guanitoxin N-oxide kinase includes the sequence of SEQ ID NO:84. In embodiments, the pre-guanitoxin N-oxide kinase is encoded by the sequence of SEQ ID NO:63. In embodiments, the pre-guanitoxin N- oxide kinase is encoded by the sequence of SEQ ID NO:64. In embodiments, the pre- guanitoxin N-oxide kinase is encoded by the sequence of SEQ ID NO:65. In embodiments, the pre-guanitoxin N-oxide kinase is encoded by the sequence of SEQ ID NO:66. In embodiments, the pre-guanitoxin N-oxide kinase is encoded by the sequence of SEQ ID NO:67.
[0098] A "guanitoxin forming phosphate O-methyltransferase" or “guanitoxin forming phosphate O-methyltransferase protein” as referred to herein includes any of the recombinant or naturally-occurring forms of guanitoxin forming phosphate O-methyltransferase (GntJ protein) or variants or homologs thereof that maintain guanitoxin forming phosphate O- methyltransf erase (e.g. within at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 100% activity compared to guanitoxin forming phosphate O-methyltransferase). In some aspects, the variants or homologs have at least 90%, 95%, 96%, 97%, 98%, 99% or 100% amino acid sequence identity across the whole sequence or a portion of the sequence (e.g. a 50, 100, 150 or 200 continuous amino acid portion) compared to a naturally occurring guanitoxin forming phosphate O-methyltransferase protein (e.g. SEQ ID NO:85). In embodiments, the guanitoxin forming phosphate O-methyltransferase includes the sequence of SEQ ID NO:85. In embodiments, the guanitoxin forming phosphate O-methyltransferase is encoded by the sequence of SEQ ID NO:68. In embodiments, the guanitoxin forming phosphate O- methyltransferase is encoded by the sequence of SEQ ID NO:69. In embodiments, the guanitoxin forming phosphate O-methyltransferase is encoded by the sequence of SEQ ID NO:70. In embodiments, the guanitoxin forming phosphate O-methyltransferase is encoded by the sequence of SEQ ID NO:71. In embodiments, the guanitoxin forming phosphate O- methyltransferase is encoded by the sequence of SEQ ID NO:72.
[0099] A "MATE family efflux transporter" or “MATE family efflux transporter protein” as referred to herein includes any of the recombinant or naturally-occurring forms of g MATE family efflux transporter (GntT protein) or variants or homologs thereof that maintain MATE family efflux transporter (e.g. within at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 100% activity compared to MATE family efflux transporter). In some aspects, the variants or homologs have at least 90%, 95%, 96%, 97%, 98%, 99% or 100% amino acid sequence identity across the whole sequence or a portion of the sequence (e.g. a 50, 100, 150 or 200 continuous amino acid portion) compared to a naturally occurring MATE family efflux transporter protein (e.g. SEQ ID NO:86). In embodiments, the MATE family efflux transporter includes the sequence of SEQ ID NO: 86. In embodiments, the MATE family efflux transporter is encoded by the sequence of SEQ ID NO:73. In embodiments, the MATE family efflux transporter is encoded by the sequence of SEQ ID NO:74. In embodiments, the MATE family efflux transporter is encoded by the sequence of SEQ ID NO:75.
[0100] For specific proteins described herein, the named protein includes any of the protein’s naturally occurring forms, variants or homologs that maintain the protein transcription factor activity (e.g., within at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 100% activity compared to the native protein). In some embodiments, variants or homologs have at least 90%, 95%, 96%, 97%, 98%, 99% or 100% amino acid sequence identity across the whole sequence or a portion of the sequence (e.g. a 50, 100, 150 or 200 continuous amino acid portion) compared to a naturally occurring form. In other embodiments, the protein is the protein as identified by its NCBI sequence reference. In other embodiments, the protein is the protein as identified by its NCBI sequence reference, homolog or functional fragment thereof.
[0101] A "label", “detectable label” or "detectable moiety" are used interchangeably and refer to a composition detectable by spectroscopic, photochemical, biochemical, immunochemical, chemical, or other physical means. For example, useful labels include 32P, fluorescent dyes, electron-dense reagents, enzymes (e.g., as commonly used in an ELISA), biotin, digoxigenin, or haptens and proteins or other entities which can be made detectable, e.g., by incorporating a radiolabel into a nucleic acid. Any appropriate method known in the art for conjugating a nucleic acid to the label may be employed, e.g., using methods described in Hermanson, Bioconjugate Techniques 1996, Academic Press, Inc., San Diego. In embodiments, the label is a dye that binds to double-stranded DNA. In embodiments, the label is a fluorescent label. In embodiments, the label is FAM, SUN, 3, Texas Red-X, or Cy5. In embodiments, the label includes a plurality of fluorescent labels wherein each fluorescent label of the plurality has a different emission wavelenght (e.g. for multiplex PCR methods (e.g. qPCR, RT qPCR, etc.)).
[0102] "Contacting" is used in accordance with its plain ordinary meaning and refers to the process of allowing at least two distinct species (e.g. antibodies and antigens) to become sufficiently proximal to react, interact, or physically touch. It should be appreciated; however, that the resulting reaction product can be produced directly from a reaction between the added reagents or from an intermediate from one or more of the added reagents which can be produced in the reaction mixture.
[0103] The term "contacting" may include allowing two species to react, interact, or physically touch, wherein the two species may be, for example, a pharmaceutical composition as provided herein and a cell. In embodiments contacting includes, for example, allowing a pharmaceutical composition as described herein to interact with a cell.
[0104] A "cell" as used herein, refers to a cell carrying out metabolic or other function sufficient to preserve or replicate its genomic DNA. A cell can be identified by well-known methods in the art including, for example, presence of an intact membrane, staining by a particular dye, ability to produce progeny or, in the case of a gamete, ability to combine with a second gamete to produce a viable offspring. Cells may include prokaryotic and eukaryotic cells. Prokaryotic cells include but are not limited to bacteria. Eukaryotic cells include, but are not limited to, yeast cells and cells derived from plants and animals, for example mammalian, insect (e.g., spodoptera) and human cells. In embodiments, the cell is a bacteria cell. In embodiments, the cell is a cyanobacteria cell. In embodiments, the cell is a guanitoxin producing bacteria cell.
[0105] The term "isolated", when applied to a nucleic acid or protein, denotes that the nucleic acid or protein is essentially free of other cellular components with which it is associated in the natural state. It can be, for example, in a homogeneous state and may be in either a dry or aqueous solution. Purity and homogeneity are typically determined using analytical chemistry techniques such as polyacrylamide gel electrophoresis or high performance liquid chromatography. A protein that is the predominant species present in a preparation is substantially purified.
[0106] The term "heterologous" when used with reference to portions of a nucleic acid indicates that the nucleic acid comprises two or more subsequences that are not found in the same relationship to each other in nature. For instance, the nucleic acid is typically recombinantly produced, having two or more sequences from unrelated genes arranged to make a new functional nucleic acid, e.g., a promoter from one source and a coding region from another source. Similarly, a heterologous protein indicates that the protein comprises two or more subsequences that are not found in the same relationship to each other in nature (e.g., a fusion protein).
[0107] The term "exogenous" refers to a molecule or substance (e.g., a compound, nucleic acid or protein) that originates from outside a given cell or organism. For example, an "exogenous promoter" as referred to herein is a promoter that does not originate from the cell or organism it is expressed by. Conversely, the term "endogenous" or "endogenous promoter" refers to a molecule or substance that is native to, or originates within, a given cell or organism.
[0108] The term "expression" includes any step involved in the production of the polypeptide including, but not limited to, transcription, post-transcriptional modification, translation, post-translational modification, and secretion. Expression can be detected using conventional techniques for detecting protein (e.g., ELISA, Western blotting, flow cytometry, immunofluorescence, immunohistochemistry, etc.).
[0109] “Biological sample” or “sample” refer to materials obtained from or derived from a subject, patient, or a liquid (e.g. a lake, river pond; a private water system, a public water system). A biological sample includes sections of tissues such as biopsy and autopsy samples, and frozen sections taken for histological purposes. Such samples include bodily fluids such as blood and blood fractions or products (e.g., serum, plasma, platelets, red blood cells, and the like), sputum, tissue, cultured cells (e.g., primary cultures, explants, and transformed cells) stool, urine, synovial fluid, joint tissue, synovial tissue, synoviocytes, fibroblast-like synoviocytes, macrophage-like synoviocytes, immune cells, hematopoietic cells, fibroblasts, macrophages, T cells, etc. A biological sample is typically obtained from a eukaryotic organism, such as a mammal such as a primate e.g., chimpanzee or human; cow; dog; cat; a rodent, e.g., guinea pig, rat, mouse; rabbit; or a bird; reptile; or fish. A sample may include a volume of liquid (e.g. an aqueous liquid) taken from a lake, river pond, private water system, or public water system. The sample may include bacteria, including guanitoxin producing bacteria.
[0110] A “control” or “standard control” refers to a sample, measurement, or value that serves as a reference, usually a known reference, for comparison to a test sample, measurement, or value. For example, a test sample (e.g. aqueous liquid) can be taken from a lake, pond, river, public water system, or private water system suspected of having guanitoxin producing bacteria. The test sample (e.g. aqueous liquid) can be compared to a known sample including one or more guanitoxin biosynthetic genes (e.g. a standard control). For example, a control DNA can be one or more guanitoxin biosynthetic genes (e.g. GntA, GntB, GntC, GntD, GntE, GntF, GntG, GntH, GntI, GntJ, GntT, or a combination thereof). The control DNA may be a positive control in a PCR or isothermal amplification method, for example, in qPCR. In another example, a standard control value can also be obtained from a lake, pond, river, public water system, or private water system prior to contamination with guanitoxin producing bacteria. In another example, a standard control can be obtained by excluding one or more reagent from a test, assay, or method. For example, a negative control for a PCR method (e.g. pPCR, RT-aPCR) may include performing the PCR method without one or more reagents (e.g. polymerase). [0111] One of skill in the art will understand which standard controls are most appropriate in a given situation and be able to analyze data based on comparisons to standard control values. Standard controls are also valuable for determining the significance (e.g. statistical significance) of data. For example, if values for a given parameter are widely variant in standard controls, variation in test samples will not be considered as significant.
[0112] “Patient” or “subject in need thereof’ refers to a living organism suffering from or prone to a disease or condition (e.g. guanitoxin toxicity, symptom of guanitoxin toxicity) that can be treated by administration of a composition or pharmaceutical composition as provided herein. Non-limiting examples include humans, other mammals, bovines, rats, mice, dogs, monkeys, goat, sheep, cows, deer, and other non-mammalian animals. In some embodiments, a patient is human.
[0113] The term “associated” or “associated with” in the context of a substance or substance activity or function associated with a disease or condition (e.g. guanitoxin associated toxicity, guanitoxin producing bacteria associated toxicity) means that the disease or condition is caused by (in whole or in part), or a symptom of the disease is caused by (in whole or in part) the substance (guanitoxin, guanitoxin producing bacteria) or substance activity or function. As used herein, what is described as being associated with a disease, if a causative agent, could be a target for treatment of the disease.
[0114] The term “signaling pathway” as used herein refers to a series of interactions between cellular and optionally extra-cellular components (e.g. proteins, nucleic acids, small molecules, ions, lipids) that conveys a change in one component to one or more other components, which in turn may convey a change to additional components, which is optionally propagated to other signaling pathway components.
[0115] The term "aberrant" as used herein refers to different from normal. When used to describe enzymatic activity, aberrant refers to activity that is greater or less than a normal control or the average of normal non-diseased control samples. Aberrant activity may refer to an amount of activity that results in a disease, wherein returning the aberrant activity to a normal or non-disease-associated amount (e.g. by using a method as described herein), results in reduction of the disease or one or more disease symptoms. [0116] A "therapeutic agent" as referred to herein, is a composition useful in treating or preventing a disease such as guanitoxin toxicity (e.g. guanitoxin induced toxicity). In embodiments, the therapeutic agent is a muscle relaxant, benzodiazepine, or barbiturate. In embodiments, the therapeutic agent is atropine. In embodiments, the trea therapeutic agent is glycopyrrolate. In embodiments, the therapeutic agent is physostigmine. In embodiments, the therapeutic agent is 2-PAM. In embodiments, the therapeutic is an agent identified herein having utility in treating symptoms (e.g. seizure, tremors, etc) of guanitoxin toxicity.
[0117] As used herein, “treating” or “treatment of’ a condition, disease or disorder or symptoms associated with a condition, disease or disorder refers to an approach for obtaining beneficial or desired results, including clinical results. Beneficial or desired clinical results can include, but are not limited to, alleviation or amelioration of one or more symptoms or conditions, diminishment of extent of condition, disorder or disease, stabilization of the state of condition, disorder or disease, prevention of development of condition, disorder or disease, prevention of spread of condition, disorder or disease, delay or slowing of condition, disorder or disease progression, delay or slowing of condition, disorder or disease onset, amelioration or palliation of the condition, disorder or disease state, and remission, whether partial or total. “Treating” can also mean prolonging survival of a subject beyond that expected in the absence of treatment. “Treating” can also mean inhibiting the progression of the condition, disorder or disease, slowing the progression of the condition, disorder or disease temporarily, although in some instances, it involves halting the progression of the condition, disorder or disease permanently. As used herein the terms treatment, treat, or treating refers to a method of reducing the effects of one or more symptoms of a disease or condition characterized by expression of the protease or symptom of the disease or condition characterized by expression of the protease. Thus in the disclosed method, treatment can refer to a 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% reduction in the severity of an established disease, condition, or symptom of the disease or condition. For example, a method for treating a disease is considered to be a treatment if there is a 10% reduction in one or more symptoms of the disease in a subject as compared to a control. Thus the reduction can be a 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, or any percent reduction in between 10% and 100% as compared to native or control levels. It is understood that treatment does not necessarily refer to a cure or complete ablation of the disease, condition, or symptoms of the disease or condition. Further, as used herein, references to decreasing, reducing, or inhibiting include a change of 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90% or greater as compared to a control level and such terms can include but do not necessarily include complete elimination.
[0118] The terms “dose” and “dosage” are used interchangeably herein. A dose refers to the amount of active ingredient given to an individual at each administration. The dose will vary depending on a number of factors, including the range of normal doses for a given therapy, frequency of administration; size and tolerance of the individual; severity of the condition; risk of side effects; and the route of administration. One of skill will recognize that the dose can be modified depending on the above factors or based on therapeutic progress. The term “dosage form” refers to the particular format of the pharmaceutical or pharmaceutical composition, and depends on the route of administration. For example, a dosage form can be in a liquid form for nebulization, e.g., for inhalants, in a tablet or liquid, e.g., for oral delivery, or a saline solution, e.g., for injection.
[0119] By “therapeutically effective dose or amount” as used herein is meant a dose that produces effects for which it is administered (e.g. treating or preventing a disease). The exact dose and formulation will depend on the purpose of the treatment, and will be ascertainable by one skilled in the art using known techniques (see, e.g., Lieberman, Pharmaceutical Dosage Forms (vols. 1-3, 1992); Lloyd, The Art, Science and Technology of Pharmaceutical Compounding (1999); Remington: The Science and Practice of Pharmacy, 20th Edition, Gennaro, Editor (2003), and Pickar, Dosage Calculations (1999)). For example, for the given parameter, a therapeutically effective amount will show an increase or decrease of at least 5%, 10%, 15%, 20%, 25%, 40%, 50%, 60%, 75%, 80%, 90%, or at least 100%. Therapeutic efficacy can also be expressed as “-fold” increase or decrease. For example, a therapeutically effective amount can have at least a 1.2-fold, 1.5-fold, 2-fold, 5-fold, or more effect over a standard control. A therapeutically effective dose or amount may ameliorate one or more symptoms of a disease. A therapeutically effective dose or amount may prevent or delay the onset of a disease or one or more symptoms of a disease when the effect for which it is being administered is to treat a person who is at risk of developing the disease.
[0120] As used herein, the term "administering" means oral administration, administration as a suppository, topical contact, intravenous, intraperitoneal, intramuscular, intralesional, intrathecal, intranasal or subcutaneous administration, or the implantation of a slow-release device, e.g., a mini-osmotic pump, to a subject. Administration is by any route, including parenteral and transmucosal (e.g., buccal, sublingual, palatal, gingival, nasal, vaginal, rectal, or transdermal). Parenteral administration includes, e.g., intravenous, intramuscular, intraarteriole, intradermal, subcutaneous, intraperitoneal, intraventricular, and intracranial. Other modes of delivery include, but are not limited to, the use of liposomal formulations, intravenous infusion, transdermal patches, etc. By "co-administer" it is meant that a composition described herein is administered at the same time, just prior to, or just after the administration of one or more additional therapies, for example cancer therapies such as chemotherapy, hormonal therapy, radiotherapy, or immunotherapy. The compounds of the invention can be administered alone or can be coadministered to the patient.
Coadministration is meant to include simultaneous or sequential administration of the compounds individually or in combination (more than one compound). Thus, the preparations can also be combined, when desired, with other active substances (e.g. to reduce metabolic degradation). The compositions of the present invention can be delivered by transdermally, by a topical route, formulated as applicator sticks, solutions, suspensions, emulsions, gels, creams, ointments, pastes, jellies, paints, powders, and aerosols.
METHODS
[0121] Provided herein are, inter alia, methods for detecting guanitoxin-producing bacteria in aqueous liquids. The methods decribed herein provide sensitive and accurate detection of guanitoxin producing bacteria by detecting one or more guanitoxin biosynthetic genes in the aqueous liquid, wherein the one or more guanitoxin biosynthetic genes are GntA, GntB, GntC, GntD, GntE, GntF, GntG, GntH, GntI, GntJ, GntT, or a combination thereof. The methods provided herein including embodiments thereof are contemplated to be effective for diagnosing guanitoxin contamination in an aqueous liquid (e.g. derived from a pond, lake, or river; derived from a public water system or private water system) by detecting guanitoxin- producing bacteria in the aqueous liquid. The methods provided herein including embodiments thereof are further contemplated to be useful for treating guanitoxin toxicity in a subject in need thereof. Thus, in an aspect is provided a method of detecting guanitoxin- producing bacteria in an aqueous liquid, the method including detecting one or more guanitoxin biosynthetic genes in the aqueous liquid, wherein the one or more guanitoxin biosynthetic genes are GntA, GntB, GntC, GntD, GntE, GntF, GntG, GntH, GntI, GntJ, GntT, or a combination thereof. [0122] The term “guanitoxin”, also referred to as “anatoxin-a(S)”, is used in accordance to its ordinary meaning in the art, and refers to a compound having the structure shown in FIG. 1 A (left panel). Guanitoxin may be produced by cyanobacteria (e.g. guanitoxin producing bacteria). Guanitoxin may irreversibly inhibit the active site of the enzyme acetylcholinesterase, thereby causing toxicity in a subject who has ingested, inhaled, or come in contact with an aqueous liquid including guanitoxin producing bacteria. Thus, the term “guanitoxin producing bacteria” refers to freshwater bacteria (e.g. cyanobacteria) that are capable of guanitoxin biosynthesis. Guanitoxin producing bacteria may produce guanitoxin or an intermediate compound in the bioxynthesis of guanitoxin. In embodiments, an intermediate compound in guanitoxin biosynthesis includes any one of the structures shown in FIG. 2C (compounds 2-9).
[0123] “Detecting” or “assaying” means using a procedure (e.g. a PCR method, an isothermal amplification method, a sequencing method, etc.) to qualitatively assess or quantitatively measure the presence or amount of the guanitoxin biosynthetic genes as described herein such as, for example, detecting the presence of one or more of GntA, GntB, GntC, GntD, GntE, GntF, GntG, GntH, GntI, GntJ, GntT, or any combination thereof, using a method (such as qPCR, RT-PCR, sequencing, etc.) to qualitatively assess or quantitatively measure the presence or amount of the selected guanitoxin biosynthetic gene.
[0124] As used herein, the term “guanitoxin biosynthetic gene” refers to a gene that encodes a protein involved in producing guanitoxin or an intermediate compound in the biosynthesis of guanitoxin. In embodiments, guanitoxin biosynthetic gene is GntA, GntB, GntC, GntD, GntE, GntF, GntG, GntH, GntI, GntJ, or GntT or a fragment thereof. In embodiments, the guanitoxin biosynthetic gene is GntA or a fragment thereof. In embodiments, the guanitoxin biosynthetic gene is GntB or a fragment thereof. In embodiments, the guanitoxin biosynthetic gene is GntC or a fragment thereof. In embodiments, the guanitoxin biosynthetic gene is GntD or a fragment thereof. In embodiments, the guanitoxin biosynthetic gene is GntE or a fragment thereof. In embodiments, the guanitoxin biosynthetic gene is GntF or a fragment thereof. In embodiments, the guanitoxin biosynthetic gene is GntG or a fragment thereof. In embodiments, the guanitoxin biosynthetic gene is GntH or a fragment thereof. In embodiments, the guanitoxin biosynthetic gene is GntI or a fragment thereof. In embodiments, the guanitoxin biosynthetic gene is GntJ or a fragment thereof. In embodiments, the guanitoxin biosynthetic gene is GntT or a fragment thereof.
[0125] The term “GntA gene” or “GntA” as used herein refer to the any of the recombinant or naturally-occurring forms of the GntA gene or variants or homologs thereof. In embodiments, the GntA gene codes for a GntA polypeptide capable of maintaining the activity of the GntA polypeptide (e.g., within at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 100% activity compared to GntA polypeptide). In some aspects, the variants or homologs have at least 90%, 95%, 96%, 97%, 98%, 99% or 100% nucleic acid sequence identity across the whole sequence or a portion of the sequence (e.g., a 50, 100, 150 or 200 continuous nucleic acid portion) compared to a naturally occurring GntA gene (e.g. SEQ ID NO:23-27).
[0126] In embodiments, the GntA gene is substantially identical to the nucleic acid sequence of SEQ ID NO:23 or a variant or homolog having substantial identity thereto. In embodiments, the GntA gene includes the nucleic acid sequence of SEQ ID NO:23. In embodiments, the GntA gene is the nucleic acid sequence of SEQ ID NO:23. In embodiments, the GntA gene is a portion of SEQ ID NO:23.
[0127] In embodiments, the GntA gene is substantially identical to the nucleic acid sequence of SEQ ID NO:24 or a variant or homolog having substantial identity thereto. In embodiments, the GntA gene includes the nucleic acid sequence of SEQ ID NO:24. In embodiments, the GntA gene is the nucleic acid sequence of SEQ ID NO:24. In embodiments, the GntA gene is a portion of SEQ ID NO:24.
[0128] In embodiments, the GntA gene is substantially identical to the nucleic acid sequence of SEQ ID NO:25 or a variant or homolog having substantial identity thereto. In embodiments, the GntA gene includes the nucleic acid sequence of SEQ ID NO:25. In embodiments, the GntA gene is the nucleic acid sequence of SEQ ID NO:25. In embodiments, the GntA gene is a portion of SEQ ID NO:25.
[0129] In embodiments, the GntA gene is substantially identical to the nucleic acid sequence of SEQ ID NO:26 or a variant or homolog having substantial identity thereto. In embodiments, the GntA gene includes the nucleic acid sequence of SEQ ID NO:26. In embodiments, the GntA gene is the nucleic acid sequence of SEQ ID NO:26. In embodiments, the GntA gene is a portion of SEQ ID NO:26.
[0130] In embodiments, the GntA gene is substantially identical to the nucleic acid sequence of SEQ ID NO:27 or a variant or homolog having substantial identity thereto. In embodiments, the GntA gene includes the nucleic acid sequence of SEQ ID NO:27. In embodiments, the GntA gene is the nucleic acid sequence of SEQ ID NO:27. In embodiments, the GntA gene is a portion of SEQ ID NO:27.
[0131] In embodiments, the GntA gene is about 50 nt to about 800 nt in length. In embodiments, the GntA gene is about 100 nt to about 800 nt in length. In embodiments, the GntA gene is about 150 nt to about 800 nt in length. In embodiments, the GntA gene is about 200 nt to about 800 nt in length. In embodiments, the GntA gene is about 250 nt to about 800 nt in length. In embodiments, the GntA gene is about 300 nt to about 800 nt in length. In embodiments, the GntA gene is about 350 nt to about 800 nt in length. In embodiments, the GntA gene is about 400 nt to about 800 nt in length. In embodiments, the GntA gene is about 450 nt to about 800 nt in length. In embodiments, the GntA gene is about 500 nt to about 800 nt in length. In embodiments, the GntA gene is about 550 nt to about 800 nt in length. In embodiments, the GntA gene is about 600 nt to about 800 nt in length. In embodiments, the GntA gene is about 650 nt to about 800 nt in length. In embodiments, the GntA gene is about 700 nt to about 800 nt in length. In embodiments, the GntA gene is about 750 nt to about 800 nt in length.
[0132] In embodiments, the GntA gene is about 50 nt to about 750 nt in length. In embodiments, the GntA gene is about 50 nt to about 700 nt in length. In embodiments, the GntA gene is about 50 nt to about 650 nt in length. In embodiments, the GntA gene is about 50 nt to about 600 nt in length. In embodiments, the GntA gene is about 50 nt to about 550 nt in length. In embodiments, the GntA gene is about 50 nt to about 500 nt in length. In embodiments, the GntA gene is about 50 nt to about 450 nt in length. In embodiments, the GntA gene is about 50 nt to about 400 nt in length. In embodiments, the GntA gene is about 50 nt to about 350 nt in length. In embodiments, the GntA gene is about 50 nt to about 300 nt in length. In embodiments, the GntA gene is about 50 nt to about 250 nt in length. In embodiments, the GntA gene is about 50 nt to about 200 nt in length. In embodiments, the GntA gene is about 50 nt to about 150 nt in length. In embodiments, the GntA gene is about 50 nt to about 100 nt in length. In embodiments, the GntA gene is about 50 nt, 100 nt, 150 nt, 200 nt, 250 nt, 300 nt, 350 nt, 400 nt, 450 nt, 500 nt, 550 nt, 600 nt, 650 nt, 700 nt, 750 nt, or 800 nt in length. In embodiments, the GntA gene is about 804 nt in length. In embodiments, the GntA gene is 804 nt in length. In embodiments, the sequence lengths described herein include a fragment or a portion of the nucleic acid sequence of SEQ ID NO:23. In embodiments, the sequence lengths described herein include a fragment or a portion of the nucleic acid sequence of SEQ ID NO:24. In embodiments, the sequence lengths described herein include a fragment or a portion of the nucleic acid sequence of SEQ ID NO:25. In embodiments, the sequence lengths described herein include a fragment or a portion of the nucleic acid sequence of SEQ ID NO:26. In embodiments, the sequence lengths described herein include a fragment or a portion of the nucleic acid sequence of SEQ ID NO:27.
[0133] The term “GntB gene” or “GntB” as used herein refer to the any of the recombinant or naturally-occurring forms of the GntB gene or variants or homologs thereof. In embodiments, the GntB gene codes for a GntB polypeptide capable of maintaining the activity of the GntB polypeptide (e.g., within at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 100% activity compared to GntB polypeptide). In some aspects, the variants or homologs have at least 90%, 95%, 96%, 97%, 98%, 99% or 100% nucleic acid sequence identity across the whole sequence or a portion of the sequence (e.g., a 50, 100, 150 or 200 continuous nucleic acid portion) compared to a naturally occurring GntB gene (SEQ ID NO:28-32).
[0134] In embodiments, the GntB gene is substantially identical to the nucleic acid sequence of SEQ ID NO:28 or a variant or homolog having substantial identity thereto. In embodiments, the GntB gene includes the nucleic acid sequence of SEQ ID NO:28. In embodiments, the GntB gene is the nucleic acid sequence of SEQ ID NO:28. In embodiments, the GntB gene is a portion of SEQ ID NO:28.
[0135] In embodiments, the GntB gene is substantially identical to the nucleic acid sequence of SEQ ID NO:29 or a variant or homolog having substantial identity thereto. In embodiments, the GntB gene includes the nucleic acid sequence of SEQ ID NO:29. In embodiments, the GntB gene is the nucleic acid sequence of SEQ ID NO:29. In embodiments, the GntB gene is a portion of SEQ ID NO:29. [0136] In embodiments, the GntB gene is substantially identical to the nucleic acid sequence of SEQ ID NO:30 or a variant or homolog having substantial identity thereto. In embodiments, the GntB gene includes the nucleic acid sequence of SEQ ID NO:30. In embodiments, the GntB gene is the nucleic acid sequence of SEQ ID NO:30. In embodiments, the GntB gene is a portion of SEQ ID NO:30.
[0137] In embodiments, the GntB gene is substantially identical to the nucleic acid sequence of SEQ ID NO:31 or a variant or homolog having substantial identity thereto. In embodiments, the GntB gene includes the nucleic acid sequence of SEQ ID NO:31. In embodiments, the GntB gene is the nucleic acid sequence of SEQ ID NO:31. In embodiments, the GntB gene is a portion of SEQ ID NO:31.
[0138] In embodiments, the GntB gene is substantially identical to the nucleic acid sequence of SEQ ID NO:32 or a variant or homolog having substantial identity thereto. In embodiments, the GntB gene includes the nucleic acid sequence of SEQ ID NO:32. In embodiments, the GntB gene is the nucleic acid sequence of SEQ ID NO:32. In embodiments, the GntB gene is a portion of SEQ ID NO:32.
[0139] In embodiments, the GntB gene is about 50 nt to about 1000 nt in length. In embodiments, the GntB gene is about 100 nt to about 1000 nt in length. In embodiments, the GntB gene is about 150 nt to about 1000 nt in length. In embodiments, the GntB gene is about 200 nt to about 1000 nt in length. In embodiments, the GntB gene is about 250 nt to about 1000 nt in length. In embodiments, the GntB gene is about 300 nt to about 1000 nt in length. In embodiments, the GntB gene is about 350 nt to about 1000 nt in length. In embodiments, the GntB gene is about 400 nt to about 1000 nt in length. In embodiments, the GntB gene is about 450 nt to about 1000 nt in length. In embodiments, the GntB gene is about 500 nt to about 1000 nt in length. In embodiments, the GntB gene is about 550 nt to about 1000 nt in length. In embodiments, the GntB gene is about 600 nt to about 1000 nt in length. In embodiments, the GntB gene is about 650 nt to about 1000 nt in length. In embodiments, the GntB gene is about 700 nt to about 1000 nt in length. In embodiments, the GntB gene is about 750 nt to about 1000 nt in length. In embodiments, the GntB gene is about 800 nt to about 1000 nt in length. In embodiments, the GntB gene is about 750 nt to about 1000 nt in length. In embodiments, the GntB gene is about 850 nt to about 1000 nt in length. In embodiments, the GntB gene is about 750 nt to about 1000 nt in length. In embodiments, the GntB gene is about 900 nt to about 1000 nt in length. In embodiments, the GntB gene is about 950 nt to about 1000 nt in length.
[0140] In embodiments, the GntB gene is about 50 nt to about 950 nt in length. In embodiments, the GntB gene is about 50 nt to about 900 nt in length. In embodiments, the GntB gene is about 50 nt to about 850 nt in length. In embodiments, the GntB gene is about 50 nt to about 800 nt in length. In embodiments, the GntB gene is about 50 nt to about 750 nt in length. In embodiments, the GntB gene is about 50 nt to about 700 nt in length. In embodiments, the GntB gene is about 50 nt to about 650 nt in length. In embodiments, the GntB gene is about 50 nt to about 600 nt in length. In embodiments, the GntB gene is about 50 nt to about 550 nt in length. In embodiments, the GntB gene is about 50 nt to about 500 nt in length. In embodiments, the GntB gene is about 50 nt to about 450 nt in length. In embodiments, the GntB gene is about 50 nt to about 400 nt in length. In embodiments, the GntB gene is about 50 nt to about 350 nt in length. In embodiments, the GntB gene is about 50 nt to about 300 nt in length. In embodiments, the GntB gene is about 50 nt to about 250 nt in length. In embodiments, the GntB gene is about 50 nt to about 200 nt in length. In embodiments, the GntB gene is about 50 nt to about 150 nt in length. In embodiments, the GntB gene is about 50 nt to about 100 nt in length. In embodiments, the GntB gene is about 50 nt, 100 nt, 150 nt, 200 nt, 250 nt, 300 nt, 350 nt, 400 nt, 450 nt, 500 nt, 550 nt, 600 nt, 650 nt, 700 nt, 750 nt, 800 nt, 850 nt, 900 nt, 950 nt, or 1000 nt in length. In embodiments, the GntB gene is about 957 nt in length. In embodiments, the GntB gene is 957 nt in length. In embodiments, the sequence lengths described herein include a fragment or a portion of the nucleic acid sequence of SEQ ID NO:28. In embodiments, the sequence lengths described herein include a fragment or a portion of the nucleic acid sequence of SEQ ID NO:29. In embodiments, the sequence lengths described herein include a fragment or a portion of the nucleic acid sequence of SEQ ID NO:30. In embodiments, the sequence lengths described herein include a fragment or a portion of the nucleic acid sequence of SEQ ID NO: 31. In embodiments, the sequence lengths described herein include a fragment or a portion of the nucleic acid sequence of SEQ ID NO:32.
[0141] The term “GntC gene” or “GntC” as used herein refer to the any of the recombinant or naturally-occurring forms of the GntC gene or variants or homologs thereof. In embodiments, the GntC gene codes for a GntC polypeptide capable of maintaining the activity of the GntC polypeptide (e.g., within at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 100% activity compared to GntC polypeptide). In some aspects, the variants or homologs have at least 90%, 95%, 96%, 97%, 98%, 99% or 100% nucleic acid sequence identity across the whole sequence or a portion of the sequence (e.g., a 50, 100, 150 or 200 continuous nucleic acid portion) compared to a naturally occurring GntC gene (e.g. SEQ ID NO:33-37).
[0142] In embodiments, the GntC gene is substantially identical to the nucleic acid sequence of SEQ ID NO:33 or a variant or homolog having substantial identity thereto. In embodiments, the GntC gene includes the nucleic acid sequence of SEQ ID NO:33. In embodiments, the GntC gene is the nucleic acid sequence of SEQ ID NO:33. In embodiments, the GntC gene is a portion of SEQ ID NO:33.
[0143] In embodiments, the GntC gene is substantially identical to the nucleic acid sequence of SEQ ID NO:34 or a variant or homolog having substantial identity thereto. In embodiments, the GntC gene includes the nucleic acid sequence of SEQ ID NO:34. In embodiments, the GntC gene is the nucleic acid sequence of SEQ ID NO:34. In embodiments, the GntC gene is a portion of SEQ ID NO:34.
[0144] In embodiments, the GntC gene is substantially identical to the nucleic acid sequence of SEQ ID NO:35 or a variant or homolog having substantial identity thereto. In embodiments, the GntC gene includes the nucleic acid sequence of SEQ ID NO:35. In embodiments, the GntC gene is the nucleic acid sequence of SEQ ID NO:35. In embodiments, the GntC gene is a portion of SEQ ID NO:35.
[0145] In embodiments, the GntC gene is substantially identical to the nucleic acid sequence of SEQ ID NO:36 or a variant or homolog having substantial identity thereto. In embodiments, the GntC gene includes the nucleic acid sequence of SEQ ID NO:36. In embodiments, the GntC gene is the nucleic acid sequence of SEQ ID NO:36. In embodiments, the GntC gene is a portion of SEQ ID NO:36.
[0146] In embodiments, the GntC gene is substantially identical to the nucleic acid sequence of SEQ ID NO:37 or a variant or homolog having substantial identity thereto. In embodiments, the GntC gene includes the nucleic acid sequence of SEQ ID NO:37. In embodiments, the GntC gene is the nucleic acid sequence of SEQ ID NO:37. In embodiments, the GntC gene is a portion of SEQ ID NO:37.
[0147] In embodiments, the GntC gene is about 50 nt to about 1100 nt in length. In embodiments, the GntC gene is about 100 nt to about 1100 nt in length. In embodiments, the GntC gene is about 150 nt to about 1100 nt in length. In embodiments, the GntC gene is about 200 nt to about 1100 nt in length. In embodiments, the GntC gene is about 250 nt to about 1100 nt in length. In embodiments, the GntC gene is about 300 nt to about 1100 nt in length. In embodiments, the GntC gene is about 350 nt to about 1100 nt in length. In embodiments, the GntC gene is about 400 nt to about 1100 nt in length. In embodiments, the GntC gene is about 450 nt to about 1100 nt in length. In embodiments, the GntC gene is about 500 nt to about 1100 nt in length. In embodiments, the GntC gene is about 550 nt to about 1100 nt in length. In embodiments, the GntC gene is about 600 nt to about 1100 nt in length. In embodiments, the GntC gene is about 650 nt to about 1100 nt in length. In embodiments, the GntC gene is about 700 nt to about 1100 nt in length. In embodiments, the GntC gene is about 750 nt to about 1100 nt in length. In embodiments, the GntC gene is about 800 nt to about 1100 nt in length. In embodiments, the GntC gene is about 750 nt to about 1100 nt in length. In embodiments, the GntC gene is about 850 nt to about 1100 nt in length. In embodiments, the GntC gene is about 750 nt to about 1100 nt in length. In embodiments, the GntC gene is about 900 nt to about 1100 nt in length. In embodiments, the GntC gene is about 950 nt to about 1100 nt in length. In embodiments, the GntC gene is about 1000 nt to about 1100 nt in length. In embodiments, the GntC gene is about 1050 nt to about 1100 nt in length.
[0148] In embodiments, the GntC gene is about 50 nt to about 1050 nt in length. In embodiments, the GntC gene is about 50 nt to about 1000 nt in length. In embodiments, the GntC gene is about 50 nt to about 950 nt in length. In embodiments, the GntC gene is about 50 nt to about 900 nt in length. In embodiments, the GntC gene is about 50 nt to about 850 nt in length. In embodiments, the GntC gene is about 50 nt to about 800 nt in length. In embodiments, the GntC gene is about 50 nt to about 750 nt in length. In embodiments, the GntC gene is about 50 nt to about 700 nt in length. In embodiments, the GntC gene is about 50 nt to about 650 nt in length. In embodiments, the GntC gene is about 50 nt to about 600 nt in length. In embodiments, the GntC gene is about 50 nt to about 550 nt in length. In embodiments, the GntC gene is about 50 nt to about 500 nt in length. In embodiments, the GntC gene is about 50 nt to about 450 nt in length. In embodiments, the GntC gene is about 50 nt to about 400 nt in length. In embodiments, the GntC gene is about 50 nt to about 350 nt in length. In embodiments, the GntC gene is about 50 nt to about 300 nt in length. In embodiments, the GntC gene is about 50 nt to about 250 nt in length. In embodiments, the GntC gene is about 50 nt to about 200 nt in length. In embodiments, the GntC gene is about 50 nt to about 150 nt in length. In embodiments, the GntC gene is about 50 nt to about 100 nt in length. In embodiments, the GntC gene is about 50 nt, 100 nt, 150 nt, 200 nt, 250 nt, 300 nt, 350 nt, 400 nt, 450 nt, 500 nt, 550 nt, 600 nt, 650 nt, 700 nt, 750 nt, 800 nt, 850 nt, 900 nt, 950 nt, 1000 nt, 1050 nt, or 1100 nt in length. In embodiments, the GntC gene is about 1113 nt in length. In embodiments, the GntC gene is 1113 nt in length. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:33. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:34. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:35. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:36. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:37.
[0149] The term “GntD gene” or “GntD” as used herein refer to the any of the recombinant or naturally-occurring forms of the GntD gene or variants or homologs thereof. In embodiments, the GntD gene codes for a GntD polypeptide capable of maintaining the activity of the GntD polypeptide (e.g., within at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 100% activity compared to GntD polypeptide). In some aspects, the variants or homologs have at least 90%, 95%, 96%, 97%, 98%, 99% or 100% nucleic acid sequence identity across the whole sequence or a portion of the sequence (e.g., a 50, 100, 150 or 200 continuous nucleic acid portion) compared to a naturally occurring GntD gene (e.g. SEQ ID NO: 38-42).
[0150] In embodiments, the GntD gene is substantially identical to the nucleic acid sequence of SEQ ID NO:38 or a variant or homolog having substantial identity thereto. In embodiments, the GntD gene includes the nucleic acid sequence of SEQ ID NO:38. In embodiments, the GntD gene is the nucleic acid sequence of SEQ ID NO:38. In embodiments, the GntD gene is a portion of SEQ ID NO:38. [0151] In embodiments, the GntD gene is substantially identical to the nucleic acid sequence of SEQ ID NO:39 or a variant or homolog having substantial identity thereto. In embodiments, the GntD gene includes the nucleic acid sequence of SEQ ID NO:39. In embodiments, the GntD gene is the nucleic acid sequence of SEQ ID NO:38. In embodiments, the GntD gene is a portion of SEQ ID NO:39.
[0152] In embodiments, the GntD gene is substantially identical to the nucleic acid sequence of SEQ ID NO:40 or a variant or homolog having substantial identity thereto. In embodiments, the GntD gene includes the nucleic acid sequence of SEQ ID NO:40. In embodiments, the GntD gene is the nucleic acid sequence of SEQ ID NO:40. In embodiments, the GntD gene is a portion of SEQ ID NO:40.
[0153] In embodiments, the GntD gene is substantially identical to the nucleic acid sequence of SEQ ID NO:41 or a variant or homolog having substantial identity thereto. In embodiments, the GntD gene includes the nucleic acid sequence of SEQ ID NO:41. In embodiments, the GntD gene is the nucleic acid sequence of SEQ ID NO:38. In embodiments, the GntD gene is a portion of SEQ ID NO:41.
[0154] In embodiments, the GntD gene is substantially identical to the nucleic acid sequence of SEQ ID NO:42 or a variant or homolog having substantial identity thereto. In embodiments, the GntD gene includes the nucleic acid sequence of SEQ ID NO:42. In embodiments, the GntD gene is the nucleic acid sequence of SEQ ID NO:42. In embodiments, the GntD gene is a portion of SEQ ID NO:42.
[0155] In embodiments, the GntD gene is about 50 nt to about 1050 nt in length. In embodiments, the GntD gene is about 100 nt to about 1050 nt in length. In embodiments, the GntD gene is about 150 nt to about 1050 nt in length. In embodiments, the GntD gene is about 200 nt to about 1050 nt in length. In embodiments, the GntD gene is about 250 nt to about 1050 nt in length. In embodiments, the GntD gene is about 300 nt to about 1050 nt in length. In embodiments, the GntD gene is about 350 nt to about 1050 nt in length. In embodiments, the GntD gene is about 400 nt to about 1050 nt in length. In embodiments, the GntD gene is about 450 nt to about 1050 nt in length. In embodiments, the GntD gene is about 500 nt to about 1050 nt in length. In embodiments, the GntD gene is about 550 nt to about 1050 nt in length. In embodiments, the GntD gene is about 600 nt to about 1050 nt in length. In embodiments, the GntD gene is about 650 nt to about 1050 nt in length. In embodiments, the GntD gene is about 700 nt to about 1050 nt in length. In embodiments, the GntD gene is about 750 nt to about 1050 nt in length. In embodiments, the GntD gene is about 800 nt to about 1050 nt in length. In embodiments, the GntD gene is about 750 nt to about 1050 nt in length. In embodiments, the GntD gene is about 850 nt to about 1050 nt in length. In embodiments, the GntD gene is about 750 nt to about 1050 nt in length. In embodiments, the GntD gene is about 900 nt to about 1050 nt in length. In embodiments, the GntD gene is about 950 nt to about 1050 nt in length. In embodiments, the GntD gene is about 1000 nt to about 1050 nt in length.
[0156] In embodiments, the GntD gene is about 50 nt to about 1000 nt in length. In embodiments, the GntD gene is about 50 nt to about 950 nt in length. In embodiments, the GntD gene is about 50 nt to about 900 nt in length. In embodiments, the GntD gene is about 50 nt to about 850 nt in length. In embodiments, the GntD gene is about 50 nt to about 800 nt in length. In embodiments, the GntD gene is about 50 nt to about 750 nt in length. In embodiments, the GntD gene is about 50 nt to about 700 nt in length. In embodiments, the GntD gene is about 50 nt to about 650 nt in length. In embodiments, the GntD gene is about 50 nt to about 600 nt in length. In embodiments, the GntD gene is about 50 nt to about 550 nt in length. In embodiments, the GntD gene is about 50 nt to about 500 nt in length. In embodiments, the GntD gene is about 50 nt to about 450 nt in length. In embodiments, the GntD gene is about 50 nt to about 400 nt in length. In embodiments, the GntD gene is about 50 nt to about 350 nt in length. In embodiments, the GntD gene is about 50 nt to about 300 nt in length. In embodiments, the GntD gene is about 50 nt to about 250 nt in length. In embodiments, the GntD gene is about 50 nt to about 200 nt in length. In embodiments, the GntD gene is about 50 nt to about 150 nt in length. In embodiments, the GntD gene is about 50 nt to about 100 nt in length. In embodiments, the GntD gene is about 50 nt, 100 nt, 150 nt, 200 nt, 250 nt, 300 nt, 350 nt, 400 nt, 450 nt, 500 nt, 550 nt, 600 nt, 650 nt, 700 nt, 750 nt, 800 nt, 850 nt, 900 nt, 950 nt, 1000 nt, or 1050 nt in length. In embodiments, the GntD gene is about 1044 nt in length. In embodiments, the GntD gene is 1044 nt in length.
In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:38. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:39. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:40. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:41. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:42.
[0157] The term “GntE gene” or “GntE” as used herein refer to the any of the recombinant or naturally-occurring forms of the GntE gene or variants or homologs thereof. In embodiments, the GntE gene codes for a GntE polypeptide capable of maintaining the activity of the GntE polypeptide (e.g., within at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 100% activity compared to GntE polypeptide). In some aspects, the variants or homologs have at least 90%, 95%, 96%, 97%, 98%, 99% or 100% nucleic acid sequence identity across the whole sequence or a portion of the sequence (e.g., a 50, 100, 150 or 200 continuous nucleic acid portion) compared to a naturally occurring GntE gene (e.g. SEQ ID NO:43-47).
[0158] In embodiments, the GntE gene is substantially identical to the nucleic acid sequence of SEQ ID NO:43 or a variant or homolog having substantial identity thereto. In embodiments, the GntE gene includes the nucleic acid sequence of SEQ ID NO:43. In embodiments, the GntE gene is the nucleic acid sequence of SEQ ID NO:43. In embodiments, the GntE gene is a portion of SEQ ID NO:43.
[0159] In embodiments, the GntE gene is substantially identical to the nucleic acid sequence of SEQ ID NO:44 or a variant or homolog having substantial identity thereto. In embodiments, the GntE gene includes the nucleic acid sequence of SEQ ID NO:44. In embodiments, the GntE gene is the nucleic acid sequence of SEQ ID NO:44. In embodiments, the GntE gene is a portion of SEQ ID NO:44.
[0160] In embodiments, the GntE gene is substantially identical to the nucleic acid sequence of SEQ ID NO:45 or a variant or homolog having substantial identity thereto. In embodiments, the GntE gene includes the nucleic acid sequence of SEQ ID NO:45. In embodiments, the GntE gene is the nucleic acid sequence of SEQ ID NO:45. In embodiments, the GntE gene is a portion of SEQ ID NO:45.
[0161] In embodiments, the GntE gene is substantially identical to the nucleic acid sequence of SEQ ID NO:46 or a variant or homolog having substantial identity thereto. In embodiments, the GntE gene includes the nucleic acid sequence of SEQ ID NO:46. In embodiments, the GntE gene is the nucleic acid sequence of SEQ ID NO:46. In embodiments, the GntE gene is a portion of SEQ ID NO:46.
[0162] In embodiments, the GntE gene is substantially identical to the nucleic acid sequence of SEQ ID NO:47 or a variant or homolog having substantial identity thereto. In embodiments, the GntE gene includes the nucleic acid sequence of SEQ ID NO:47. In embodiments, the GntE gene is the nucleic acid sequence of SEQ ID NO:47. In embodiments, the GntE gene is a portion of SEQ ID NO:47.
[0163] In embodiments, the GntE gene is about 50 nt to about 1300 nt in length. In embodiments, the GntE gene is about 100 nt to about 1300 nt in length. In embodiments, the GntE gene is about 150 nt to about 1300 nt in length. In embodiments, the GntE gene is about 200 nt to about 1300 nt in length. In embodiments, the GntE gene is about 250 nt to about 1300 nt in length. In embodiments, the GntE gene is about 300 nt to about 1300 nt in length. In embodiments, the GntE gene is about 350 nt to about 1300 nt in length. In embodiments, the GntE gene is about 400 nt to about 1300 nt in length. In embodiments, the GntE gene is about 450 nt to about 1300 nt in length. In embodiments, the GntE gene is about 500 nt to about 1300 nt in length. In embodiments, the GntE gene is about 550 nt to about 1300 nt in length. In embodiments, the GntE gene is about 600 nt to about 1300 nt in length. In embodiments, the GntE gene is about 650 nt to about 1300 nt in length. In embodiments, the GntE gene is about 700 nt to about 1300 nt in length. In embodiments, the GntE gene is about 750 nt to about 1300 nt in length. In embodiments, the GntE gene is about 800 nt to about 1300 nt in length. In embodiments, the GntE gene is about 750 nt to about 1300 nt in length. In embodiments, the GntE gene is about 850 nt to about 1300 nt in length. In embodiments, the GntE gene is about 750 nt to about 1300 nt in length. In embodiments, the GntE gene is about 900 nt to about 1300 nt in length. In embodiments, the GntE gene is about 950 nt to about 1300 nt in length. In embodiments, the GntE gene is about 1000 nt to about 1300 nt in length. In embodiments, the GntE gene is about 1050 nt to about 1300 nt in length. In embodiments, the GntE gene is about 1100 nt to about 1300 nt in length. In embodiments, the GntE gene is about 1150 nt to about 1300 nt in length. In embodiments, the GntE gene is about 1200 nt to about 1300 nt in length. In embodiments, the GntE gene is about 1250 nt to about 1300 nt in length. [0164] In embodiments, the GntE gene is about 50 nt to about 1250 nt in length. In embodiments, the GntE gene is about 50 nt to about 1200 nt in length. In embodiments, the GntE gene is about 50 nt to about 1150 nt in length. In embodiments, the GntE gene is about 50 nt to about 1100 nt in length. In embodiments, the GntE gene is about 50 nt to about 1050 nt in length. In embodiments, the GntE gene is about 50 nt to about 1000 nt in length. In embodiments, the GntE gene is about 50 nt to about 950 nt in length. In embodiments, the GntE gene is about 50 nt to about 900 nt in length. In embodiments, the GntE gene is about 50 nt to about 850 nt in length. In embodiments, the GntE gene is about 50 nt to about 800 nt in length. In embodiments, the GntE gene is about 50 nt to about 750 nt in length. In embodiments, the GntE gene is about 50 nt to about 700 nt in length. In embodiments, the GntE gene is about 50 nt to about 650 nt in length. In embodiments, the GntE gene is about 50 nt to about 600 nt in length. In embodiments, the GntE gene is about 50 nt to about 550 nt in length. In embodiments, the GntE gene is about 50 nt to about 500 nt in length. In embodiments, the GntE gene is about 50 nt to about 450 nt in length. In embodiments, the GntE gene is about 50 nt to about 400 nt in length. In embodiments, the GntE gene is about 50 nt to about 350 nt in length. In embodiments, the GntE gene is about 50 nt to about 300 nt in length. In embodiments, the GntE gene is about 50 nt to about 250 nt in length. In embodiments, the GntE gene is about 50 nt to about 200 nt in length. In embodiments, the GntE gene is about 50 nt to about 150 nt in length. In embodiments, the GntE gene is about 50 nt to about 100 nt in length. In embodiments, the GntE gene is about 50 nt, 100 nt, 150 nt, 200 nt, 250 nt, 300 nt, 350 nt, 400 nt, 450 nt, 500 nt, 550 nt, 600 nt, 650 nt, 700 nt, 750 nt, 800 nt, 850 nt, 900 nt, 950 nt, 1000 nt, 1050 nt, 1100 nt, 1150 nt, 1200 nt, 1250 nt, or 1300 nt in length. In embodiments, the GntE gene is about 1311 nt in length. In embodiments, the GntE gene is 1311 nt in length. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:43. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:44. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:45. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:46. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO: 47.
[0165] The term “GntF gene” or “GntF” as used herein refer to the any of the recombinant or naturally-occurring forms of the GntF gene or variants or homologs thereof. In embodiments, the GntF gene codes for a GntF polypeptide capable of maintaining the activity of the GntF polypeptide (e.g., within at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 100% activity compared to GntF polypeptide). In some aspects, the variants or homologs have at least 90%, 95%, 96%, 97%, 98%, 99% or 100% nucleic acid sequence identity across the whole sequence or a portion of the sequence (e.g., a 50, 100, 150 or 200 continuous nucleic acid portion) compared to a naturally occurring GntF gene (e.g. SEQ ID NO:48-52).
[0166] In embodiments, the GntF gene is substantially identical to the nucleic acid sequence of SEQ ID NO:48 or a variant or homolog having substantial identity thereto. In embodiments, the GntF gene includes the nucleic acid sequence of SEQ ID NO:48. In embodiments, the GntF gene is the nucleic acid sequence of SEQ ID NO:48. In embodiments, the GntF gene is a portion of SEQ ID NO:48.
[0167] In embodiments, the GntF gene is substantially identical to the nucleic acid sequence of SEQ ID NO:49 or a variant or homolog having substantial identity thereto. In embodiments, the GntF gene includes the nucleic acid sequence of SEQ ID NO:49. In embodiments, the GntF gene is the nucleic acid sequence of SEQ ID NO:49. In embodiments, the GntF gene is a portion of SEQ ID NO:49.
[0168] In embodiments, the GntF gene is substantially identical to the nucleic acid sequence of SEQ ID NO:50 or a variant or homolog having substantial identity thereto. In embodiments, the GntF gene includes the nucleic acid sequence of SEQ ID NO:50. In embodiments, the GntF gene is the nucleic acid sequence of SEQ ID NO:50. In embodiments, the GntF gene is a portion of SEQ ID NO:50.
[0169] In embodiments, the GntF gene is substantially identical to the nucleic acid sequence of SEQ ID NO:51 or a variant or homolog having substantial identity thereto. In embodiments, the GntF gene includes the nucleic acid sequence of SEQ ID NO:51. In embodiments, the GntF gene is the nucleic acid sequence of SEQ ID NO:51. In embodiments, the GntF gene is a portion of SEQ ID NO:51.
[0170] In embodiments, the GntF gene is substantially identical to the nucleic acid sequence of SEQ ID NO:52 or a variant or homolog having substantial identity thereto. In embodiments, the GntF gene includes the nucleic acid sequence of SEQ ID NO:52. In embodiments, the GntF gene is the nucleic acid sequence of SEQ ID NO:52. In embodiments, the GntF gene is a portion of SEQ ID NO:52.
[0171] In embodiments, the GntF gene is about 50 nt to about 800 nt in length. In embodiments, the GntF gene is about 100 nt to about 800 nt in length. In embodiments, the GntF gene is about 150 nt to about 800 nt in length. In embodiments, the GntF gene is about 200 nt to about 800 nt in length. In embodiments, the GntF gene is about 250 nt to about 800 nt in length. In embodiments, the GntF gene is about 330 nt to about 800 nt in length. In embodiments, the GntF gene is about 350 nt to about 800 nt in length. In embodiments, the GntF gene is about 400 nt to about 800 nt in length. In embodiments, the GntF gene is about 450 nt to about 800 nt in length. In embodiments, the GntF gene is about 500 nt to about 800 nt in length. In embodiments, the GntF gene is about 550 nt to about 800 nt in length. In embodiments, the GntF gene is about 600 nt to about 800 nt in length. In embodiments, the GntF gene is about 650 nt to about 800 nt in length. In embodiments, the GntF gene is about 700 nt to about 800 nt in length. In embodiments, the GntF gene is about 750 nt to about 800 nt in length.
[0172] In embodiments, the GntF gene is about 50 nt to about 750 nt in length. In embodiments, the GntF gene is about 50 nt to about 700 nt in length. In embodiments, the GntF gene is about 50 nt to about 650 nt in length. In embodiments, the GntF gene is about 50 nt to about 600 nt in length. In embodiments, the GntF gene is about 50 nt to about 550 nt in length. In embodiments, the GntF gene is about 50 nt to about 500 nt in length. In embodiments, the GntF gene is about 50 nt to about 450 nt in length. In embodiments, the GntF gene is about 50 nt to about 400 nt in length. In embodiments, the GntF gene is about 50 nt to about 350 nt in length. In embodiments, the GntF gene is about 50 nt to about 330 nt in length. In embodiments, the GntF gene is about 50 nt to about 250 nt in length. In embodiments, the GntF gene is about 50 nt to about 200 nt in length. In embodiments, the GntF gene is about 50 nt to about 150 nt in length. In embodiments, the GntF gene is about 50 nt to about 100 nt in length. In embodiments, the GntF gene is about 50 nt, 100 nt, 150 nt, 200 nt, 250 nt, 330 nt, 350 nt, 400 nt, 450 nt, 500 nt, 550 nt, 600 nt, 650 nt, 700 nt, 750 nt, or 800 nt in length. In embodiments, the GntF gene is about 825 nt in length. In embodiments, the GntF gene is 825 nt in length. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:48. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:49. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:50. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:51. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:52.
[0173] The term “GntG gene” or “GntG” as used herein refer to the any of the recombinant or naturally-occurring forms of the GntG gene or variants or homologs thereof. In embodiments, the GntG gene codes for a GntG polypeptide capable of maintaining the activity of the GntG polypeptide (e.g., within at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 100% activity compared to GntG polypeptide). In some aspects, the variants or homologs have at least 90%, 95%, 96%, 97%, 98%, 99% or 100% nucleic acid sequence identity across the whole sequence or a portion of the sequence (e.g., a 50, 100, 150 or 200 continuous nucleic acid portion) compared to a naturally occurring GntG gene (e.g. SEQ ID NO:53-57).
[0174] In embodiments, the GntG gene is substantially identical to the nucleic acid sequence of SEQ ID NO:53 or a variant or homolog having substantial identity thereto. In embodiments, the GntG gene includes the nucleic acid sequence of SEQ ID NO:53. In embodiments, the GntG gene is the nucleic acid sequence of SEQ ID NO:53. In embodiments, the GntG gene is a portion of SEQ ID NO:53.
[0175] In embodiments, the GntG gene is substantially identical to the nucleic acid sequence of SEQ ID NO:54 or a variant or homolog having substantial identity thereto. In embodiments, the GntG gene includes the nucleic acid sequence of SEQ ID NO:54. In embodiments, the GntG gene is the nucleic acid sequence of SEQ ID NO:54. In embodiments, the GntG gene is a portion of SEQ ID NO:54.
[0176] In embodiments, the GntG gene is substantially identical to the nucleic acid sequence of SEQ ID NO:55 or a variant or homolog having substantial identity thereto. In embodiments, the GntG gene includes the nucleic acid sequence of SEQ ID NO: 55. In embodiments, the GntG gene is the nucleic acid sequence of SEQ ID NO:55. In embodiments, the GntG gene is a portion of SEQ ID NO:55. [0177] In embodiments, the GntG gene is substantially identical to the nucleic acid sequence of SEQ ID NO:56 or a variant or homolog having substantial identity thereto. In embodiments, the GntG gene includes the nucleic acid sequence of SEQ ID NO:56. In embodiments, the GntG gene is the nucleic acid sequence of SEQ ID NO:56. In embodiments, the GntG gene is a portion of SEQ ID NO:56.
[0178] In embodiments, the GntG gene is substantially identical to the nucleic acid sequence of SEQ ID NO:57 or a variant or homolog having substantial identity thereto. In embodiments, the GntG gene includes the nucleic acid sequence of SEQ ID NO:57. In embodiments, the GntG gene is the nucleic acid sequence of SEQ ID NO:57. In embodiments, the GntG gene is a portion of SEQ ID NO:57.
[0179] In embodiments, the GntG gene is about 50 nt to about 1000 nt in length. In embodiments, the GntG gene is about 100 nt to about 1000 nt in length. In embodiments, the GntG gene is about 150 nt to about 1000 nt in length. In embodiments, the GntG gene is about 200 nt to about 1000 nt in length. In embodiments, the GntG gene is about 250 nt to about 1000 nt in length. In embodiments, the GntG gene is about 300 nt to about 1000 nt in length. In embodiments, the GntG gene is about 350 nt to about 1000 nt in length. In embodiments, the GntG gene is about 400 nt to about 1000 nt in length. In embodiments, the GntG gene is about 450 nt to about 1000 nt in length. In embodiments, the GntG gene is about 500 nt to about 1000 nt in length. In embodiments, the GntG gene is about 550 nt to about 1000 nt in length. In embodiments, the GntG gene is about 600 nt to about 1000 nt in length. In embodiments, the GntG gene is about 650 nt to about 1000 nt in length. In embodiments, the GntG gene is about 700 nt to about 1000 nt in length. In embodiments, the GntG gene is about 750 nt to about 1000 nt in length. In embodiments, the GntG gene is about 800 nt to about 1000 nt in length. In embodiments, the GntG gene is about 750 nt to about 1000 nt in length. In embodiments, the GntG gene is about 850 nt to about 1000 nt in length. In embodiments, the GntG gene is about 750 nt to about 1000 nt in length. In embodiments, the GntG gene is about 900 nt to about 1000 nt in length. In embodiments, the GntG gene is about 950 nt to about 1000 nt in length.
[0180] In embodiments, the GntG gene is about 50 nt to about 950 nt in length. In embodiments, the GntG gene is about 50 nt to about 900 nt in length. In embodiments, the GntG gene is about 50 nt to about 850 nt in length. In embodiments, the GntG gene is about 50 nt to about 800 nt in length. In embodiments, the GntG gene is about 50 nt to about 750 nt in length. In embodiments, the GntG gene is about 50 nt to about 700 nt in length. In embodiments, the GntG gene is about 50 nt to about 650 nt in length. In embodiments, the GntG gene is about 50 nt to about 600 nt in length. In embodiments, the GntG gene is about 50 nt to about 550 nt in length. In embodiments, the GntG gene is about 50 nt to about 500 nt in length. In embodiments, the GntG gene is about 50 nt to about 450 nt in length. In embodiments, the GntG gene is about 50 nt to about 400 nt in length. In embodiments, the GntG gene is about 50 nt to about 350 nt in length. In embodiments, the GntG gene is about 50 nt to about 300 nt in length. In embodiments, the GntG gene is about 50 nt to about 250 nt in length. In embodiments, the GntG gene is about 50 nt to about 200 nt in length. In embodiments, the GntG gene is about 50 nt to about 150 nt in length. In embodiments, the GntG gene is about 50 nt to about 100 nt in length. In embodiments, the GntG gene is about 50 nt, 100 nt, 150 nt, 200 nt, 250 nt, 300 nt, 350 nt, 400 nt, 450 nt, 500 nt, 550 nt, 600 nt, 650 nt, 700 nt, 750 nt, 800 nt, 850 nt, 900 nt, 950 nt, or 1000 nt in length. In embodiments, the GntG gene is about 1035 nt in length. In embodiments, the GntG gene is 1035 nt in length. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:53. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:54. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO: 55. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:56. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:57.
[0181] The term “GntH gene” or “GntH” as used herein refer to the any of the recombinant or naturally-occurring forms of the GntH gene or variants or homologs thereof. In embodiments, the GntH gene codes for a GntH polypeptide capable of maintaining the activity of the GntH polypeptide (e.g., within at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 100% activity compared to GntH polypeptide). In some aspects, the variants or homologs have at least 90%, 95%, 96%, 97%, 98%, 99% or 100% nucleic acid sequence identity across the whole sequence or a portion of the sequence (e.g., a 50, 100, 150 or 200 continuous nucleic acid portion) compared to a naturally occurring GntH gene (e.g. SEQ ID NO:58-62). [0182] In embodiments, the GntH gene is substantially identical to the nucleic acid sequence of SEQ ID NO:58 or a variant or homolog having substantial identity thereto. In embodiments, the GntH gene includes the nucleic acid sequence of SEQ ID NO: 58. In embodiments, the GntH gene is the nucleic acid sequence of SEQ ID NO:58. In embodiments, the GntH gene is a portion of SEQ ID NO:58.
[0183] In embodiments, the GntH gene is substantially identical to the nucleic acid sequence of SEQ ID NO:59 or a variant or homolog having substantial identity thereto. In embodiments, the GntH gene includes the nucleic acid sequence of SEQ ID NO:59. In embodiments, the GntH gene is the nucleic acid sequence of SEQ ID NO:59. In embodiments, the GntH gene is a portion of SEQ ID NO:59.
[0184] In embodiments, the GntH gene is substantially identical to the nucleic acid sequence of SEQ ID NO:60 or a variant or homolog having substantial identity thereto. In embodiments, the GntH gene includes the nucleic acid sequence of SEQ ID NO:60. In embodiments, the GntH gene is the nucleic acid sequence of SEQ ID NO:60. In embodiments, the GntH gene is a portion of SEQ ID NO:60.
[0185] In embodiments, the GntH gene is substantially identical to the nucleic acid sequence of SEQ ID NO:61 or a variant or homolog having substantial identity thereto. In embodiments, the GntH gene includes the nucleic acid sequence of SEQ ID NO:61. In embodiments, the GntH gene is the nucleic acid sequence of SEQ ID NO:61. In embodiments, the GntH gene is a portion of SEQ ID NO:61.
[0186] In embodiments, the GntH gene is substantially identical to the nucleic acid sequence of SEQ ID NO:62 or a variant or homolog having substantial identity thereto. In embodiments, the GntH gene includes the nucleic acid sequence of SEQ ID NO:62. In embodiments, the GntH gene is the nucleic acid sequence of SEQ ID NO:62. In embodiments, the GntH gene is a portion of SEQ ID NO:62.
[0187] In embodiments, the GntH gene is about 50 nt to about 1200 nt in length. In embodiments, the GntH gene is about 100 nt to about 1200 nt in length. In embodiments, the GntH gene is about 150 nt to about 1200 nt in length. In embodiments, the GntH gene is about 200 nt to about 1200 nt in length. In embodiments, the GntH gene is about 250 nt to about 1200 nt in length. In embodiments, the GntH gene is about 300 nt to about 1200 nt in length. In embodiments, the GntH gene is about 350 nt to about 1200 nt in length. In embodiments, the GntH gene is about 400 nt to about 1200 nt in length. In embodiments, the GntH gene is about 450 nt to about 1200 nt in length. In embodiments, the GntH gene is about 500 nt to about 1200 nt in length. In embodiments, the GntH gene is about 550 nt to about 1200 nt in length. In embodiments, the GntH gene is about 600 nt to about 1200 nt in length. In embodiments, the GntH gene is about 650 nt to about 1200 nt in length. In embodiments, the GntH gene is about 700 nt to about 1200 nt in length. In embodiments, the GntH gene is about 750 nt to about 1200 nt in length. In embodiments, the GntH gene is about 800 nt to about 1200 nt in length. In embodiments, the GntH gene is about 750 nt to about 1200 nt in length. In embodiments, the GntH gene is about 850 nt to about 1200 nt in length. In embodiments, the GntH gene is about 750 nt to about 1200 nt in length. In embodiments, the GntH gene is about 900 nt to about 1200 nt in length. In embodiments, the GntH gene is about 950 nt to about 1200 nt in length. In embodiments, the GntH gene is about 1000 nt to about 1200 nt in length. In embodiments, the GntH gene is about 1050 nt to about 1200 nt in length. In embodiments, the GntH gene is about 1100 nt to about 1200 nt in length. In embodiments, the GntH gene is about 1150 nt to about 1200 nt in length.
[0188] In embodiments, the GntH gene is about 50 nt to about 1150 nt in length. In embodiments, the GntH gene is about 50 nt to about 1100 nt in length. In embodiments, the GntH gene is about 50 nt to about 1050 nt in length. In embodiments, the GntH gene is about 50 nt to about 1000 nt in length. In embodiments, the GntH gene is about 50 nt to about 950 nt in length. In embodiments, the GntH gene is about 50 nt to about 900 nt in length. In embodiments, the GntH gene is about 50 nt to about 850 nt in length. In embodiments, the GntH gene is about 50 nt to about 800 nt in length. In embodiments, the GntH gene is about 50 nt to about 750 nt in length. In embodiments, the GntH gene is about 50 nt to about 700 nt in length. In embodiments, the GntH gene is about 50 nt to about 650 nt in length. In embodiments, the GntH gene is about 50 nt to about 600 nt in length. In embodiments, the GntH gene is about 50 nt to about 550 nt in length. In embodiments, the GntH gene is about 50 nt to about 500 nt in length. In embodiments, the GntH gene is about 50 nt to about 450 nt in length. In embodiments, the GntH gene is about 50 nt to about 400 nt in length. In embodiments, the GntH gene is about 50 nt to about 350 nt in length. In embodiments, the GntH gene is about 50 nt to about 300 nt in length. In embodiments, the GntH gene is about 50 nt to about 250 nt in length. In embodiments, the GntH gene is about 50 nt to about 200 nt in length. In embodiments, the GntH gene is about 50 nt to about 150 nt in length. In embodiments, the GntH gene is about 50 nt to about 100 nt in length. In embodiments, the GntH gene is about 50 nt, 100 nt, 150 nt, 200 nt, 250 nt, 300 nt, 350 nt, 400 nt, 450 nt, 500 nt, 550 nt, 600 nt, 650 nt, 700 nt, 750 nt, 800 nt, 850 nt, 900 nt, 950 nt, 1000 nt, 1050 nt, 1100 nt, 1150 nt, 1200 nt in length. In embodiments, the GntH gene is about 1242 nt in length. In embodiments, the GntH gene is 1242 nt in length. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO: 58. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:59. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:60. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:61. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:62.
[0189] The term “GntI gene” or “Gntl” as used herein refer to the any of the recombinant or naturally-occurring forms of the Gntl gene or variants or homologs thereof. In embodiments, the Gntl gene codes for a Gntl polypeptide capable of maintaining the activity of the Gntl polypeptide (e.g., within at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 100% activity compared to Gntl polypeptide). In some aspects, the variants or homologs have at least 90%, 95%, 96%, 97%, 98%, 99% or 100% nucleic acid sequence identity across the whole sequence or a portion of the sequence (e.g., a 50, 100, 150 or 200 continuous nucleic acid portion) compared to a naturally occurring Gntl gene (e.g. SEQ ID NO:63-67).
[0190] In embodiments, the Gntl gene is substantially identical to the nucleic acid sequence of SEQ ID NO:63 or a variant or homolog having substantial identity thereto. In embodiments, the Gntl gene includes the nucleic acid sequence of SEQ ID NO:63. In embodiments, the Gntl gene is the nucleic acid sequence of SEQ ID NO:63. In embodiments, the Gntl gene is a portion of SEQ ID NO:63.
[0191] In embodiments, the Gntl gene is substantially identical to the nucleic acid sequence of SEQ ID NO:64 or a variant or homolog having substantial identity thereto. In embodiments, the Gntl gene includes the nucleic acid sequence of SEQ ID NO:64. In embodiments, the Gntl gene is the nucleic acid sequence of SEQ ID NO:64. In embodiments, the Gntl gene is a portion of SEQ ID NO:64. [0192] In embodiments, the GntI gene is substantially identical to the nucleic acid sequence of SEQ ID NO:65 or a variant or homolog having substantial identity thereto. In embodiments, the GntI gene includes the nucleic acid sequence of SEQ ID NO: 65. In embodiments, the GntI gene is the nucleic acid sequence of SEQ ID NO:65. In embodiments, the GntI gene is a portion of SEQ ID NO:65.
[0193] In embodiments, the GntI gene is substantially identical to the nucleic acid sequence of SEQ ID NO:66 or a variant or homolog having substantial identity thereto. In embodiments, the GntI gene includes the nucleic acid sequence of SEQ ID NO:66. In embodiments, the GntI gene is the nucleic acid sequence of SEQ ID NO:66. In embodiments, the GntI gene is a portion of SEQ ID NO:66.
[0194] In embodiments, the GntI gene is substantially identical to the nucleic acid sequence of SEQ ID NO:67 or a variant or homolog having substantial identity thereto. In embodiments, the GntI gene includes the nucleic acid sequence of SEQ ID NO:67. In embodiments, the GntI gene is the nucleic acid sequence of SEQ ID NO:67. In embodiments, the GntI gene is a portion of SEQ ID NO:67.
[0195] In embodiments, the GntI gene is about 50 nt to about 900 nt in length. In embodiments, the GntI gene is about 100 nt to about 900 nt in length. In embodiments, the GntI gene is about 150 nt to about 900 nt in length. In embodiments, the GntI gene is about 200 nt to about 900 nt in length. In embodiments, the GntI gene is about 250 nt to about 900 nt in length. In embodiments, the GntI gene is about 300 nt to about 900 nt in length. In embodiments, the GntI gene is about 350 nt to about 900 nt in length. In embodiments, the GntI gene is about 400 nt to about 900 nt in length. In embodiments, the GntI gene is about 450 nt to about 900 nt in length. In embodiments, the GntI gene is about 500 nt to about 900 nt in length. In embodiments, the GntI gene is about 550 nt to about 900 nt in length. In embodiments, the GntI gene is about 600 nt to about 900 nt in length. In embodiments, the GntI gene is about 650 nt to about 900 nt in length. In embodiments, the GntI gene is about 700 nt to about 900 nt in length. In embodiments, the GntI gene is about 750 nt to about 900 nt in length. In embodiments, the GntI gene is about 800 nt to about 900 nt in length. In embodiments, the GntI gene is about 850 nt to about 900 nt in length. [0196] In embodiments, the GntI gene is about 50 nt to about 850 nt in length. In embodiments, the GntI gene is about 50 nt to about 800 nt in length. In embodiments, the GntI gene is about 50 nt to about 750 nt in length. In embodiments, the GntI gene is about 50 nt to about 700 nt in length. In embodiments, the GntI gene is about 50 nt to about 650 nt in length. In embodiments, the GntI gene is about 50 nt to about 600 nt in length. In embodiments, the GntI gene is about 50 nt to about 550 nt in length. In embodiments, the GntI gene is about 50 nt to about 500 nt in length. In embodiments, the GntI gene is about 50 nt to about 450 nt in length. In embodiments, the GntI gene is about 50 nt to about 400 nt in length. In embodiments, the GntI gene is about 50 nt to about 350 nt in length. In embodiments, the GntI gene is about 50 nt to about 300 nt in length. In embodiments, the GntI gene is about 50 nt to about 250 nt in length. In embodiments, the GntI gene is about 50 nt to about 200 nt in length. In embodiments, the GntI gene is about 50 nt to about 150 nt in length. In embodiments, the GntI gene is about 50 nt to about 100 nt in length. In embodiments, the GntI gene is about 50 nt, 100 nt, 150 nt, 200 nt, 250 nt, 300 nt, 350 nt, 400 nt, 450 nt, 500 nt, 550 nt, 600 nt, 650 nt, 700 nt, 750 nt, 800 nt, 850 nt, or 900 nt in length. In embodiments, the GntI gene is about 918 nt in length. In embodiments, the GntI gene is 918 nt in length. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:63. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:64. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO: 65. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:66. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:67.
[0197] The term “GntJ gene” or “GntJ” as used herein refer to the any of the recombinant or naturally-occurring forms of the GntJ gene or variants or homologs thereof. In embodiments, the GntJ gene codes for a GntJ polypeptide capable of maintaining the activity of the GntJ polypeptide (e.g., within at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 100% activity compared to GntJ polypeptide). In some aspects, the variants or homologs have at least 90%, 95%, 96%, 97%, 98%, 99% or 100% nucleic acid sequence identity across the whole sequence or a portion of the sequence (e.g., a 50, 100, 150 or 200 continuous nucleic acid portion) compared to a naturally occurring GntJ gene (e.g. SEQ ID NO:68-72). [0198] In embodiments, the GntJ gene is substantially identical to the nucleic acid sequence of SEQ ID NO:68 or a variant or homolog having substantial identity thereto. In embodiments, the GntJ gene includes the nucleic acid sequence of SEQ ID NO: 68. In embodiments, the GntJ gene is the nucleic acid sequence of SEQ ID NO:68. In embodiments, the GntJ gene is a portion of SEQ ID NO:68.
[0199] In embodiments, the GntJ gene is substantially identical to the nucleic acid sequence of SEQ ID NO:69 or a variant or homolog having substantial identity thereto. In embodiments, the GntJ gene includes the nucleic acid sequence of SEQ ID NO:69. In embodiments, the GntJ gene is the nucleic acid sequence of SEQ ID NO:69. In embodiments, the GntJ gene is a portion of SEQ ID NO:69.
[0200] In embodiments, the GntJ gene is substantially identical to the nucleic acid sequence of SEQ ID NO:70 or a variant or homolog having substantial identity thereto. In embodiments, the GntJ gene includes the nucleic acid sequence of SEQ ID NO:70. In embodiments, the GntJ gene is the nucleic acid sequence of SEQ ID NO:70. In embodiments, the GntJ gene is a portion of SEQ ID NO:70.
[0201] In embodiments, the GntJ gene is substantially identical to the nucleic acid sequence of SEQ ID NO:71 or a variant or homolog having substantial identity thereto. In embodiments, the GntJ gene includes the nucleic acid sequence of SEQ ID NO:71. In embodiments, the GntJ gene is the nucleic acid sequence of SEQ ID NO:71. In embodiments, the GntJ gene is a portion of SEQ ID NO:71.
[0202] In embodiments, the GntJ gene is substantially identical to the nucleic acid sequence of SEQ ID NO:72 or a variant or homolog having substantial identity thereto. In embodiments, the GntJ gene includes the nucleic acid sequence of SEQ ID NO:72. In embodiments, the GntJ gene is the nucleic acid sequence of SEQ ID NO:72. In embodiments, the GntJ gene is a portion of SEQ ID NO:72.
[0203] In embodiments, the GntJ gene is about 50 nt to about 750 nt in length. In embodiments, the GntJ gene is about 100 nt to about 750 nt in length. In embodiments, the GntJ gene is about 150 nt to about 750 nt in length. In embodiments, the GntJ gene is about 200 nt to about 750 nt in length. In embodiments, the GntJ gene is about 250 nt to about 750 nt in length. In embodiments, the GntJ gene is about 330 nt to about 750 nt in length. In embodiments, the GntJ gene is about 350 nt to about 750 nt in length. In embodiments, the GntJ gene is about 400 nt to about 750 nt in length. In embodiments, the GntJ gene is about 450 nt to about 750 nt in length. In embodiments, the GntJ gene is about 500 nt to about 750 nt in length. In embodiments, the GntJ gene is about 550 nt to about 750 nt in length. In embodiments, the GntJ gene is about 600 nt to about 750 nt in length. In embodiments, the GntJ gene is about 650 nt to about 750 nt in length. In embodiments, the GntJ gene is about 700 nt to about 750 nt in length.
[0204] In embodiments, the GntJ gene is about 50 nt to about 700 nt in length. In embodiments, the GntJ gene is about 50 nt to about 650 nt in length. In embodiments, the GntJ gene is about 50 nt to about 600 nt in length. In embodiments, the GntJ gene is about 50 nt to about 550 nt in length. In embodiments, the GntJ gene is about 50 nt to about 500 nt in length. In embodiments, the GntJ gene is about 50 nt to about 450 nt in length. In embodiments, the GntJ gene is about 50 nt to about 400 nt in length. In embodiments, the GntJ gene is about 50 nt to about 350 nt in length. In embodiments, the GntJ gene is about 50 nt to about 330 nt in length. In embodiments, the GntJ gene is about 50 nt to about 250 nt in length. In embodiments, the GntJ gene is about 50 nt to about 200 nt in length. In embodiments, the GntJ gene is about 50 nt to about 150 nt in length. In embodiments, the GntJ gene is about 50 nt to about 100 nt in length. In embodiments, the GntJ gene is about 50 nt, 100 nt, 150 nt, 200 nt, 250 nt, 330 nt, 350 nt, 400 nt, 450 nt, 500 nt, 550 nt, 600 nt, 650 nt, 700 nt, or 750 nt in length. In embodiments, the GntJ gene is about 750 nt in length. In embodiments, the GntJ gene is 750 nt in length. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO: 68. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:69. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:70. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:71. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:72.
[0205] The term “GntT gene” or “GntT” as used herein refer to the any of the recombinant or naturally-occurring forms of the GntT gene or variants or homologs thereof. In embodiments, the GntT gene codes for a GntT polypeptide capable of maintaining the activity of the GntT polypeptide (e.g., within at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 100% activity compared to GntT polypeptide). In some aspects, the variants or homologs have at least 90%, 95%, 96%, 97%, 98%, 99% or 100% nucleic acid sequence identity across the whole sequence or a portion of the sequence (e.g., a 50, 100, 150 or 200 continuous nucleic acid portion) compared to a naturally occurring GntT gene (e.g. SEQ ID NO:73-75).
[0206] In embodiments, the GntT gene is substantially identical to the nucleic acid sequence of SEQ ID NO:73 or a variant or homolog having substantial identity thereto. In embodiments, the GntT gene includes the nucleic acid sequence of SEQ ID NO:73. In embodiments, the GntT gene is the nucleic acid sequence of SEQ ID NO:73. In embodiments, the GntT gene is a portion of SEQ ID NO:73.
[0207] In embodiments, the GntT gene is substantially identical to the nucleic acid sequence of SEQ ID NO:74 or a variant or homolog having substantial identity thereto. In embodiments, the GntT gene includes the nucleic acid sequence of SEQ ID NO:74. In embodiments, the GntT gene is the nucleic acid sequence of SEQ ID NO:74. In embodiments, the GntT gene is a portion of SEQ ID NO:74.
[0208] In embodiments, the GntT gene is substantially identical to the nucleic acid sequence of SEQ ID NO:75 or a variant or homolog having substantial identity thereto. In embodiments, the GntT gene includes the nucleic acid sequence of SEQ ID NO: 75. In embodiments, the GntT gene is the nucleic acid sequence of SEQ ID NO:75. In embodiments, the GntT gene is a portion of SEQ ID NO:75.
[0209] In embodiments, the GntT gene is about 50 nt to about 450 nt in length. In embodiments, the GntT gene is about 100 nt to about 450 nt in length. In embodiments, the GntT gene is about 150 nt to about 450 nt in length. In embodiments, the GntT gene is about 200 nt to about 450 nt in length. In embodiments, the GntT gene is about 250 nt to about 450 nt in length. In embodiments, the GntT gene is about 300 nt to about 450 nt in length. In embodiments, the GntT gene is about 350 nt to about 450 nt in length. In embodiments, the GntT gene is about 400 nt to about 450 nt in length.
[0210] In embodiments, the GntT gene is about 50 nt to about 400 nt in length. In embodiments, the GntT gene is about 50 nt to about 350 nt in length. In embodiments, the GntT gene is about 50 nt to about 300 nt in length. In embodiments, the GntT gene is about 50 nt to about 250 nt in length. In embodiments, the GntT gene is about 50 nt to about 200 nt in length. In embodiments, the GntT gene is about 50 nt to about 150 nt in length. In embodiments, the GntT gene is about 50 nt to about 100 nt in length. In embodiments, the GntT gene is about 50 nt, 100 nt, 150 nt, 200 nt, 250 nt, 330 nt, 350 nt, 400 nt, or 450 nt in length. In embodiments, the GntT gene is about 473 nt in length. In embodiments, the GntT gene is 473 nt in length. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:73. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:74. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO: 75.
[0211] In embodiments, the GntT gene is about 50 nt to about 1300 nt in length. In embodiments, the GntT gene is about 100 nt to about 1300 nt in length. In embodiments, the GntT gene is about 150 nt to about 1300 nt in length. In embodiments, the GntT gene is about 200 nt to about 1300 nt in length. In embodiments, the GntT gene is about 250 nt to about 1300 nt in length. In embodiments, the GntT gene is about 300 nt to about 1300 nt in length. In embodiments, the GntT gene is about 350 nt to about 1300 nt in length. In embodiments, the GntT gene is about 400 nt to about 1300 nt in length. In embodiments, the GntT gene is about 450 nt to about 1300 nt in length. In embodiments, the GntT gene is about 500 nt to about 1300 nt in length. In embodiments, the GntT gene is about 550 nt to about 1300 nt in length. In embodiments, the GntT gene is about 600 nt to about 1300 nt in length. In embodiments, the GntT gene is about 650 nt to about 1300 nt in length. In embodiments, the GntT gene is about 700 nt to about 1300 nt in length. In embodiments, the GntT gene is about 750 nt to about 1300 nt in length. In embodiments, the GntT gene is about 800 nt to about 1300 nt in length. In embodiments, the GntT gene is about 750 nt to about 1300 nt in length. In embodiments, the GntT gene is about 850 nt to about 1300 nt in length. In embodiments, the GntT gene is about 750 nt to about 1300 nt in length. In embodiments, the GntT gene is about 900 nt to about 1300 nt in length. In embodiments, the GntT gene is about 950 nt to about 1300 nt in length. In embodiments, the GntT gene is about 1000 nt to about 1300 nt in length. In embodiments, the GntT gene is about 1050 nt to about 1300 nt in length. In embodiments, the GntT gene is about 1100 nt to about 1300 nt in length. In embodiments, the GntT gene is about 1150 nt to about 1300 nt in length. In embodiments, the GntT gene is about 1200 nt to about 1300 nt in length. In embodiments, the GntT gene is about 1250 nt to about 1300 nt in length. [0212] In embodiments, the GntT gene is about 50 nt to about 1250 nt in length. In embodiments, the GntT gene is about 50 nt to about 1200 nt in length. In embodiments, the GntT gene is about 50 nt to about 1150 nt in length. In embodiments, the GntT gene is about 50 nt to about 1100 nt in length. In embodiments, the GntT gene is about 50 nt to about 1050 nt in length. In embodiments, the GntT gene is about 50 nt to about 1000 nt in length. In embodiments, the GntT gene is about 50 nt to about 950 nt in length. In embodiments, the GntT gene is about 50 nt to about 900 nt in length. In embodiments, the GntT gene is about 50 nt to about 850 nt in length. In embodiments, the GntT gene is about 50 nt to about 800 nt in length. In embodiments, the GntT gene is about 50 nt to about 750 nt in length. In embodiments, the GntT gene is about 50 nt to about 700 nt in length. In embodiments, the GntT gene is about 50 nt to about 650 nt in length. In embodiments, the GntT gene is about 50 nt to about 600 nt in length. In embodiments, the GntT gene is about 50 nt to about 550 nt in length. In embodiments, the GntT gene is about 50 nt to about 500 nt in length. In embodiments, the GntT gene is about 50 nt to about 450 nt in length. In embodiments, the GntT gene is about 50 nt to about 400 nt in length. In embodiments, the GntT gene is about 50 nt to about 350 nt in length. In embodiments, the GntT gene is about 50 nt to about 300 nt in length. In embodiments, the GntT gene is about 50 nt to about 250 nt in length. In embodiments, the GntT gene is about 50 nt to about 200 nt in length. In embodiments, the GntT gene is about 50 nt to about 150 nt in length. In embodiments, the GntT gene is about 50 nt to about 100 nt in length. In embodiments, the GntT gene is about 50 nt, 100 nt, 150 nt, 200 nt, 250 nt, 300 nt, 350 nt, 400 nt, 450 nt, 500 nt, 550 nt, 600 nt, 650 nt, 700 nt, 750 nt, 800 nt, 850 nt, 900 nt, 950 nt, 1000 nt, 1050 nt, 1100 nt, 1150 nt, 1200 nt, 1250 nt, or 1300 nt in length. In embodiments, the GntT gene is about 1359 nt in length. In embodiments, the GntT gene is 1359 nt in length. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:73. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:74. In embodiments, the sequence lengths described herein are within nucleic acid sequence of SEQ ID NO:75.
[0213] In embodiments, the one or more guanitoxin biosynthetic genes include any combination of genes selected from GntA, GntB, GntC, GntD, GntE, GntF, GntG, GntH, GntI, GntJ, and GntT. In embodiments, the one or more guanitoxin biosynthetic genes include GntA, GntJ, GntC, or a combination thereof. In embodiments, the one or more guanitoxin biosynthetic genes includes GntA. In embodiments, the one or more guanitoxin biosynthetic genes includes GntJ. In embodiments, the one or more guanitoxin biosynthetic genes includes GntC. In embodiments, the one or more guanitoxin biosynthetic genes is GntA. In embodiments, the one or more guanitoxin biosynthetic genes is GntJ. In embodiments, the one or more guanitoxin biosynthetic genes is GntC. In embodiments, the one or more guanitoxin biosynthetic genes include GntA and at least one gene selected from GntB, GntC, GntD, GntE, GntF, GntG, GntH, GntI, GntJ, and GntT. In embodiments, the one or more guanitoxin biosynthetic genes include GntJ and at least one gene selected from GntA, GntB, GntC, GntD, GntE, GntF, GntG, GntH, GntI, and GntT. In embodiments, the one or more guanitoxin biosynthetic genes include GntC and at least one gene selected from GntA, GntB, GntD, GntE, GntF, GntG, GntH, GntI, GntJ, and GntT. In embodiments, the one or more guanitoxin biosynthetic genes include GntB and at least one gene selected from GntA, GntC, GntD, GntE, GntF, GntG, GntH, GntI, GntJ, and GntT. In embodiments, the one or more guanitoxin biosynthetic genes include GntD and at least one gene selected from GntA, GntB, GntC, GntE, GntF, GntG, GntH, GntI, GntJ, and GntT. In embodiments, the one or more guanitoxin biosynthetic genes include GntE and at least one gene selected from GntA, GntB, GntC, GntD, GntF, GntG, GntH, GntI, GntJ, and GntT. In embodiments, the one or more guanitoxin biosynthetic genes include GntF and at least one gene selected from GntA, GntB, GntC, GntD, GntE, GntG, GntH, GntI, GntJ, and GntT. In embodiments, the one or more guanitoxin biosynthetic genes include GntG and at least one gene selected from GntA, GntB, GntC, GntD, GntE, GntF, GntH, GntI, GntJ, and GntT. In embodiments, the one or more guanitoxin biosynthetic genes include GntH and at least one gene selected from GntA, GntB, GntC, GntD, GntE, GntF, GntG, GntI, GntJ, and GntT. In embodiments, the one or more guanitoxin biosynthetic genes include GntI and at least one gene selected from GntA, GntB, GntC, GntD, GntE, GntF, GntG, GntH, GntJ, and GntT. In embodiments, the one or more guanitoxin biosynthetic genes include GntT and at least one gene selected from GntA, GntB, GntC, GntD, GntE, GntF, GntG, GntH, GntI, and GntJ.
[0214] In embodiments, the one or more guanitoxin biosynthetic genes is one guanitoxin biosynthetic gene. In embodiments, the one guanitoxin biosynthetic gene is GntA, GntB, GntC, GntD, GntE, GntF, GntG, GntH, GntI, GntJ, or GntT. In embodiments, the guanitoxin biosynthetic gene is GntA and no other guanitoxin biosynthetic gene. In embodiments, the guanitoxin biosynthetic gene is GntB and no other guanitoxin biosynthetic gene. In embodiments, the guanitoxin biosynthetic gene is GntC and no other guanitoxin biosynthetic gene. In embodiments, the guanitoxin biosynthetic gene is GntD and no other guanitoxin biosynthetic gene. In embodiments, the guanitoxin biosynthetic gene is GntE and no other guanitoxin biosynthetic gene. In embodiments, the guanitoxin biosynthetic gene is GntF and no other guanitoxin biosynthetic gene. In embodiments, the guanitoxin biosynthetic gene is GntG and no other guanitoxin biosynthetic gene. In embodiments, the guanitoxin biosynthetic gene is GntH and no other guanitoxin biosynthetic gene. In embodiments, the guanitoxin biosynthetic gene is GntI and no other guanitoxin biosynthetic gene. In embodiments, the guanitoxin biosynthetic gene is GntT and no other guanitoxin biosynthetic gene.
[0215] In embodiments, the method includes concentrating bacterial cells or particulate matter in the aqueous liquid. One of skill in the art would recognize that a number of techniques may be used for concentrating cells or particulate matter, for example use of filters with pores quantitatively designed to capture the particulate matter and/or bacterial cells, including STERIVEX™ filters, glass fiber filters, etc. In embodiments, the method includes concentrating bacterial cells using a filter, wherein the filter has a pore size of less than about 1 pm. In embodiments, the method further includes extracting the guanitoxin biosynthetic gene from the concentrated cells or particulate matter. In embodiments, the method further includes purifying the guanitoxin biosynthetic gene from the concentrated cells or particulate matter. In embodiments, the the guanitoxin biosynthetic gene is DNA or RNA. In embodiments, the the guanitoxin biosynthetic gene is DNA. In embodiments, the guanitoxin biosynthetic gene is RNA. One of skill in the art would recognize that a variety of methods can be used for extracting and/or purifying the guanitoxin biosynthetic gene, including use of a number of commercially available kits (e.g. DNeasy® PowerWater® Kit, etc.). Methods for contrating cells or particulate matter, and extracting and/or purifying nucleic acids from aqueous liquid are described in more detail in Kurmayer, R. et al. (2017). Molecular Tools for the Detection and Quantification of Toxigenic Cyanobacteria. John Wiley and Sons Ltd.
[0216] In embodiments, the method includes contacting the aqueous liquid with one or more nucleic acids, wherein each of the one or more nucleic acids are at least partially complementary to a portion of the one or more guanitoxin biosynthetic genes. In embodiments, the one or more nucleic acids is a primer. In embodiments, the one or more nucleic acids is a probe. A “primer” refers to a short, single-stranded DNA sequence used in polymerase chain reaction (PCR) methods and isothermal amplification methods. In embodiments, a pair of primers (e.g. a first nucleic and and a second nucleic acid) is used to hybridize with DNA (e.g. guanitoxin biosynthetic gene) in a sample (e.g. aqueous liquid) and define the region of the DNA that will be amplified. A “probe” is a single- stranded DNA sequence used to detect DNA or RNA (e.g. guanitoxin biosynthetic gene) in a sample (e.g. aqueous liquid). In embodiments, a probe hybridizes with the DNA or RNA, and hybridization is detected, thereby allowing detection of the DNA or RNA. In embodiments, the probe is coupled to a detectable label (e.g. a fluorescent or radioactive compound). In embodiments, the method further includes contacting the aqueous liquid with an enzyme. In embodiments, the enzyme is a polymerase. In embodiments, the polymerase is Taq polymerase.
[0217] In embodiments, the method further includes includes performing reverse transcription, thereby producing a cDNA of the one or more guanitoxin biosynthetic gene. Thus, in embodiments, the method further includes contacting the aqueous liquid with an enzyme. In embodiments, the enzyme is a reverse transcriptase.
[0218] For the method provided herein, in embodiments, the guanitoxin biosynthetic gene is amplified, thereby producing amplicons. Thus, in embodiments, the method includes performing a PCR method, an isothermal amplification method, or a combination thereof. In embodiments, the method includes performing a PCR method. In embodiments, the PCR method is reverse transcription PCR (RT-PCR), quantitative PCR (qPCR), or reverse transcription quantitative PCR (RT-qPCR). In embodiments, the PCR method is RT-PCR. In embodiments, the PCR method is qPCR. In embodiments, the PCR method is RT-qPCR. In embodiments, the method includes performing an isothermal amplification method. In embodiments, the isothermal amplification method is loop-mediated isothermal amplification (LAMP). In embodiments, the method further includes detecting amplicons. In embodiments, detecting amplicons includes a colorimetric method, a fluorometric method, a luminometric method, an ionic method, or an electrical detection method. In embodiments, the method includes performing a sequencing method. For example, the guanitoxin biosynthetic gene or the amplicon may be sequenced. In embodiments, the sequencing method includes next generation sequencing. Thus, in embodiments, the method includes performing a a PCR method, an isothermal amplification method, a sequencing method, or a combination thereof. In embodiments, the PCR method, isothermal amplification method, or sequencing method is a multiplex method. For example, in embodiments, the PCR method, isothermal amplification method, or sequencing method allows for detection of multiple guanitoxin biosynthetic genes simultaneously. In embodiments, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more guanitoxin biosynthetic genes are detected simultaneously.
[0219] For the method provided herein, in embodiments, the portion of the one or more guanitoxin biosynthetic genes includes a coding sequence, a promoter region sequence, a terminator region sequence, or an intergene region sequence. In embodiments, the portion of the one or more guanitoxin biosynthetic genes includes a coding sequence. In embodiments, the portion of the one or more guanitoxin biosynthetic genes includes a promoter region sequence. In embodiments, the portion of the one or more guanitoxin biosynthetic genes includes a terminator region sequence. In embodiments, the portion of the one or more guanitoxin biosynthetic genes includes an intergene region sequence.
[0220] For the method provided herein, in embodiments, the one or more nucleic acids each independently includes a sequence having at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to any one of SEQ ID NO: 1 to SEQ ID NO:22, wherein each nucleic acid of the one or more nucleic acids is different. In embodiments, the one or more nucleic acids each independently includes the sequence of one of SEQ ID NO: 1 to SEQ ID NO:22, wherein each nucleic acid of the one or more nucleic acids is different.
[0221] In embodiments, the one or more nucleic acids includes a sequence having at least 80% identity to SEQ ID NO: 1. In embodiments, the one or more nucleic acids includes a sequence having at least 85% identity to SEQ ID NO: 1. In embodiments, the one or more nucleic acids includes a sequence having at least 90% identity to SEQ ID NO: 1. In embodiments, the one or more nucleic acids includes a sequence having at least 95% identity to SEQ ID NO: 1. In embodiments, the one or more nucleic acids includes a sequence having at least 98% identity to SEQ ID NO: 1. In embodiments, the one or more nucleic acids includes the sequence of SEQ ID NO: 1. In embodiments, the one or more nucleic acids includes SEQ ID NO: 1. In embodiments, the one or more nucleic acids is SEQ ID NO: 1. [0222] In embodiments, the one or more nucleic acids includes a sequence having at least 80% identity to SEQ ID NO:2. In embodiments, the one or more nucleic acids includes a sequence having at least 85% identity to SEQ ID NO:2. In embodiments, the one or more nucleic acids includes a sequence having at least 90% identity to SEQ ID NO:2. In embodiments, the one or more nucleic acids includes a sequence having at least 95% identity to SEQ ID NO:2. In embodiments, the one or more nucleic acids includes a sequence having at least 98% identity to SEQ ID NO:2. In embodiments, the one or more nucleic acids includes the sequence of SEQ ID NO:2. In embodiments, the one or more nucleic acids includes SEQ ID NO:2. In embodiments, the one or more nucleic acids is SEQ ID NO:2.
[0223] In embodiments, the one or more nucleic acids includes a sequence having at least 80% identity to SEQ ID NO:3. In embodiments, the one or more nucleic acids includes a sequence having at least 85% identity to SEQ ID NO:3. In embodiments, the one or more nucleic acids includes a sequence having at least 90% identity to SEQ ID NO:3. In embodiments, the one or more nucleic acids includes a sequence having at least 95% identity to SEQ ID NO:3. In embodiments, the one or more nucleic acids includes a sequence having at least 98% identity to SEQ ID NO:3. In embodiments, the one or more nucleic acids includes the sequence of SEQ ID NO:3. In embodiments, the one or more nucleic acids includes SEQ ID NO:3. In embodiments, the one or more nucleic acids is SEQ ID NO:3.
[0224] In embodiments, the one or more nucleic acids includes a sequence having at least 80% identity to SEQ ID NO:4. In embodiments, the one or more nucleic acids includes a sequence having at least 85% identity to SEQ ID NO:4. In embodiments, the one or more nucleic acids includes a sequence having at least 90% identity to SEQ ID NO:4. In embodiments, the one or more nucleic acids includes a sequence having at least 95% identity to SEQ ID NO:4. In embodiments, the one or more nucleic acids includes a sequence having at least 98% identity to SEQ ID NO:4. In embodiments, the one or more nucleic acids includes the sequence of SEQ ID NO:4. In embodiments, the one or more nucleic acids includes SEQ ID NO:4. In embodiments, the one or more nucleic acids is SEQ ID NO:4.
[0225] In embodiments, the one or more nucleic acids includes a sequence having at least 80% identity to SEQ ID NO: 5. In embodiments, the one or more nucleic acids includes a sequence having at least 85% identity to SEQ ID NO:5. In embodiments, the one or more nucleic acids includes a sequence having at least 90% identity to SEQ ID NO: 5. In embodiments, the one or more nucleic acids includes a sequence having at least 95% identity to SEQ ID NO:5. In embodiments, the one or more nucleic acids includes a sequence having at least 98% identity to SEQ ID NO: 5. In embodiments, the one or more nucleic acids includes the sequence of SEQ ID NO: 5. In embodiments, the one or more nucleic acids includes SEQ ID NO:5. In embodiments, the one or more nucleic acids is SEQ ID NO:5.
[0226] In embodiments, the one or more nucleic acids includes a sequence having at least 80% identity to SEQ ID NO:6. In embodiments, the one or more nucleic acids includes a sequence having at least 85% identity to SEQ ID NO:6. In embodiments, the one or more nucleic acids includes a sequence having at least 90% identity to SEQ ID NO:6. In embodiments, the one or more nucleic acids includes a sequence having at least 95% identity to SEQ ID NO:6. In embodiments, the one or more nucleic acids includes a sequence having at least 98% identity to SEQ ID NO:6. In embodiments, the one or more nucleic acids includes the sequence of SEQ ID NO:6. In embodiments, the one or more nucleic acids includes SEQ ID NO:6. In embodiments, the one or more nucleic acids is SEQ ID NO:6.
[0227] In embodiments, the one or more nucleic acids includes a sequence having at least 80% identity to SEQ ID NO:7. In embodiments, the one or more nucleic acids includes a sequence having at least 85% identity to SEQ ID NO:7. In embodiments, the one or more nucleic acids includes a sequence having at least 90% identity to SEQ ID NO:7. In embodiments, the one or more nucleic acids includes a sequence having at least 95% identity to SEQ ID NO:7. In embodiments, the one or more nucleic acids includes a sequence having at least 98% identity to SEQ ID NO:7. In embodiments, the one or more nucleic acids includes the sequence of SEQ ID NO:7. In embodiments, the one or more nucleic acids includes SEQ ID NO:7. In embodiments, the one or more nucleic acids is SEQ ID NO:7.
[0228] In embodiments, the one or more nucleic acids includes a sequence having at least 80% identity to SEQ ID NO: 8. In embodiments, the one or more nucleic acids includes a sequence having at least 85% identity to SEQ ID NO:8. In embodiments, the one or more nucleic acids includes a sequence having at least 90% identity to SEQ ID NO: 8. In embodiments, the one or more nucleic acids includes a sequence having at least 95% identity to SEQ ID NO:8. In embodiments, the one or more nucleic acids includes a sequence having at least 98% identity to SEQ ID NO: 8. In embodiments, the one or more nucleic acids includes the sequence of SEQ ID NO: 8. In embodiments, the one or more nucleic acids includes SEQ ID NO:8. In embodiments, the one or more nucleic acids is SEQ ID NO:8.
[0229] In embodiments, the one or more nucleic acids includes a sequence having at least 80% identity to SEQ ID NO:9. In embodiments, the one or more nucleic acids includes a sequence having at least 85% identity to SEQ ID NO:9. In embodiments, the one or more nucleic acids includes a sequence having at least 90% identity to SEQ ID NO:9. In embodiments, the one or more nucleic acids includes a sequence having at least 95% identity to SEQ ID NO:9. In embodiments, the one or more nucleic acids includes a sequence having at least 98% identity to SEQ ID NO:9. In embodiments, the one or more nucleic acids includes the sequence of SEQ ID NO:9. In embodiments, the one or more nucleic acids includes SEQ ID NO:9. In embodiments, the one or more nucleic acids is SEQ ID NO:9.
[0230] In embodiments, the one or more nucleic acids includes a sequence having at least 80% identity to SEQ ID NO: 10. In embodiments, the one or more nucleic acids includes a sequence having at least 85% identity to SEQ ID NO: 10. In embodiments, the one or more nucleic acids includes a sequence having at least 90% identity to SEQ ID NO: 10. In embodiments, the one or more nucleic acids includes a sequence having at least 95% identity to SEQ ID NO: 10. In embodiments, the one or more nucleic acids includes a sequence having at least 98% identity to SEQ ID NO: 10. In embodiments, the one or more nucleic acids includes the sequence of SEQ ID NO: 10. In embodiments, the one or more nucleic acids includes SEQ ID NO: 10. In embodiments, the one or more nucleic acids is SEQ ID NO: 10.
[0231] In embodiments, the one or more nucleic acids includes a sequence having at least 80% identity to SEQ ID NO: 11. In embodiments, the one or more nucleic acids includes a sequence having at least 85% identity to SEQ ID NO: 11. In embodiments, the one or more nucleic acids includes a sequence having at least 90% identity to SEQ ID NO: 11. In embodiments, the one or more nucleic acids includes a sequence having at least 95% identity to SEQ ID NO: 11. In embodiments, the one or more nucleic acids includes a sequence having at least 98% identity to SEQ ID NO: 11. In embodiments, the one or more nucleic acids includes the sequence of SEQ ID NO: 11. In embodiments, the one or more nucleic acids includes SEQ ID NO: 11. In embodiments, the one or more nucleic acids is SEQ ID NO: 11. [0232] In embodiments, the one or more nucleic acids includes a sequence having at least 80% identity to SEQ ID NO: 12. In embodiments, the one or more nucleic acids includes a sequence having at least 85% identity to SEQ ID NO: 12. In embodiments, the one or more nucleic acids includes a sequence having at least 90% identity to SEQ ID NO: 12. In embodiments, the one or more nucleic acids includes a sequence having at least 95% identity to SEQ ID NO: 12. In embodiments, the one or more nucleic acids includes a sequence having at least 98% identity to SEQ ID NO: 12. In embodiments, the one or more nucleic acids includes the sequence of SEQ ID NO: 12. In embodiments, the one or more nucleic acids includes SEQ ID NO: 12. In embodiments, the one or more nucleic acids is SEQ ID NO: 12.
[0233] In embodiments, the one or more nucleic acids includes a sequence having at least 80% identity to SEQ ID NO: 13. In embodiments, the one or more nucleic acids includes a sequence having at least 85% identity to SEQ ID NO: 13. In embodiments, the one or more nucleic acids includes a sequence having at least 90% identity to SEQ ID NO: 13. In embodiments, the one or more nucleic acids includes a sequence having at least 95% identity to SEQ ID NO: 13. In embodiments, the one or more nucleic acids includes a sequence having at least 98% identity to SEQ ID NO: 13. In embodiments, the one or more nucleic acids includes the sequence of SEQ ID NO: 13. In embodiments, the one or more nucleic acids includes SEQ ID NO: 13. In embodiments, the one or more nucleic acids is SEQ ID NO: 13.
[0234] In embodiments, the one or more nucleic acids includes a sequence having at least 80% identity to SEQ ID NO: 14. In embodiments, the one or more nucleic acids includes a sequence having at least 85% identity to SEQ ID NO: 14. In embodiments, the one or more nucleic acids includes a sequence having at least 90% identity to SEQ ID NO: 14. In embodiments, the one or more nucleic acids includes a sequence having at least 95% identity to SEQ ID NO: 14. In embodiments, the one or more nucleic acids includes a sequence having at least 98% identity to SEQ ID NO: 14. In embodiments, the one or more nucleic acids includes the sequence of SEQ ID NO: 14. In embodiments, the one or more nucleic acids includes SEQ ID NO: 14. In embodiments, the one or more nucleic acids is SEQ ID NO: 14.
[0235] In embodiments, the one or more nucleic acids includes a sequence having at least 80% identity to SEQ ID NO: 15. In embodiments, the one or more nucleic acids includes a sequence having at least 85% identity to SEQ ID NO: 15. In embodiments, the one or more nucleic acids includes a sequence having at least 90% identity to SEQ ID NO: 15. In embodiments, the one or more nucleic acids includes a sequence having at least 95% identity to SEQ ID NO: 15. In embodiments, the one or more nucleic acids includes a sequence having at least 98% identity to SEQ ID NO: 15. In embodiments, the one or more nucleic acids includes the sequence of SEQ ID NO: 15. In embodiments, the one or more nucleic acids includes SEQ ID NO: 15. In embodiments, the one or more nucleic acids is SEQ ID NO: 15.
[0236] In embodiments, the one or more nucleic acids includes a sequence having at least 80% identity to SEQ ID NO: 16. In embodiments, the one or more nucleic acids includes a sequence having at least 85% identity to SEQ ID NO: 16. In embodiments, the one or more nucleic acids includes a sequence having at least 90% identity to SEQ ID NO: 16. In embodiments, the one or more nucleic acids includes a sequence having at least 95% identity to SEQ ID NO: 16. In embodiments, the one or more nucleic acids includes a sequence having at least 98% identity to SEQ ID NO: 16. In embodiments, the one or more nucleic acids includes the sequence of SEQ ID NO: 16. In embodiments, the one or more nucleic acids includes SEQ ID NO: 16. In embodiments, the one or more nucleic acids is SEQ ID NO: 16.
[0237] In embodiments, the one or more nucleic acids includes a sequence having at least 80% identity to SEQ ID NO: 17. In embodiments, the one or more nucleic acids includes a sequence having at least 85% identity to SEQ ID NO: 17. In embodiments, the one or more nucleic acids includes a sequence having at least 90% identity to SEQ ID NO: 17. In embodiments, the one or more nucleic acids includes a sequence having at least 95% identity to SEQ ID NO: 17. In embodiments, the one or more nucleic acids includes a sequence having at least 98% identity to SEQ ID NO: 17. In embodiments, the one or more nucleic acids includes the sequence of SEQ ID NO: 17. In embodiments, the one or more nucleic acids includes SEQ ID NO: 17. In embodiments, the one or more nucleic acids is SEQ ID NO: 17.
[0238] In embodiments, the one or more nucleic acids includes a sequence having at least 80% identity to SEQ ID NO: 18. In embodiments, the one or more nucleic acids includes a sequence having at least 85% identity to SEQ ID NO: 18. In embodiments, the one or more nucleic acids includes a sequence having at least 90% identity to SEQ ID NO: 18. In embodiments, the one or more nucleic acids includes a sequence having at least 95% identity to SEQ ID NO: 18. In embodiments, the one or more nucleic acids includes a sequence having at least 98% identity to SEQ ID NO: 18. In embodiments, the one or more nucleic acids includes the sequence of SEQ ID NO: 18. In embodiments, the one or more nucleic acids includes SEQ ID NO: 18. In embodiments, the one or more nucleic acids is SEQ ID NO: 18.
[0239] In embodiments, the one or more nucleic acids includes a sequence having at least 80% identity to SEQ ID NO: 19. In embodiments, the one or more nucleic acids includes a sequence having at least 85% identity to SEQ ID NO: 19. In embodiments, the one or more nucleic acids includes a sequence having at least 90% identity to SEQ ID NO: 19. In embodiments, the one or more nucleic acids includes a sequence having at least 95% identity to SEQ ID NO: 19. In embodiments, the one or more nucleic acids includes a sequence having at least 98% identity to SEQ ID NO: 19. In embodiments, the one or more nucleic acids includes the sequence of SEQ ID NO: 19. In embodiments, the one or more nucleic acids includes SEQ ID NO: 19. In embodiments, the one or more nucleic acids is SEQ ID NO: 19.
[0240] In embodiments, the one or more nucleic acids includes a sequence having at least 80% identity to SEQ ID NO:20. In embodiments, the one or more nucleic acids includes a sequence having at least 85% identity to SEQ ID NO:20. In embodiments, the one or more nucleic acids includes a sequence having at least 90% identity to SEQ ID NO:20. In embodiments, the one or more nucleic acids includes a sequence having at least 95% identity to SEQ ID NO:20. In embodiments, the one or more nucleic acids includes a sequence having at least 98% identity to SEQ ID NO:20. In embodiments, the one or more nucleic acids includes the sequence of SEQ ID NO:20. In embodiments, the one or more nucleic acids includes SEQ ID NO:20. In embodiments, the one or more nucleic acids is SEQ ID NO:20.
[0241] In embodiments, the one or more nucleic acids includes a sequence having at least 80% identity to SEQ ID NO:21. In embodiments, the one or more nucleic acids includes a sequence having at least 85% identity to SEQ ID NO:21. In embodiments, the one or more nucleic acids includes a sequence having at least 90% identity to SEQ ID NO:21. In embodiments, the one or more nucleic acids includes a sequence having at least 95% identity to SEQ ID NO:21. In embodiments, the one or more nucleic acids includes a sequence having at least 98% identity to SEQ ID NO:21. In embodiments, the one or more nucleic acids includes the sequence of SEQ ID NO:21. In embodiments, the one or more nucleic acids includes SEQ ID NO:21. In embodiments, the one or more nucleic acids is SEQ ID NO:21. [0242] In embodiments, the one or more nucleic acids includes a sequence having at least 80% identity to SEQ ID NO:22. In embodiments, the one or more nucleic acids includes a sequence having at least 85% identity to SEQ ID NO:22. In embodiments, the one or more nucleic acids includes a sequence having at least 90% identity to SEQ ID NO:22. In embodiments, the one or more nucleic acids includes a sequence having at least 95% identity to SEQ ID NO:22. In embodiments, the one or more nucleic acids includes a sequence having at least 98% identity to SEQ ID NO:22. In embodiments, the one or more nucleic acids includes the sequence of SEQ ID NO:22. In embodiments, the one or more nucleic acids includes SEQ ID NO:22. In embodiments, the one or more nucleic acids is SEQ ID NO:22.
[0243] In embodiments, the one or more nucleic acids includes a sequence having at least 80% identity to SEQ ID NO:87. In embodiments, the one or more nucleic acids includes a sequence having at least 85% identity to SEQ ID NO:87. In embodiments, the one or more nucleic acids includes a sequence having at least 90% identity to SEQ ID NO:87. In embodiments, the one or more nucleic acids includes a sequence having at least 95% identity to SEQ ID NO:87. In embodiments, the one or more nucleic acids includes a sequence having at least 98% identity to SEQ ID NO:87. In embodiments, the one or more nucleic acids includes the sequence of SEQ ID NO:87. In embodiments, the one or more nucleic acids includes SEQ ID NO:87. In embodiments, the one or more nucleic acids is SEQ ID NO:87.
[0244] In embodiments, the one or more nucleic acids includes a sequence having at least 80% identity to SEQ ID NO:88. In embodiments, the one or more nucleic acids includes a sequence having at least 85% identity to SEQ ID NO:88. In embodiments, the one or more nucleic acids includes a sequence having at least 90% identity to SEQ ID NO:88. In embodiments, the one or more nucleic acids includes a sequence having at least 95% identity to SEQ ID NO:88. In embodiments, the one or more nucleic acids includes a sequence having at least 98% identity to SEQ ID NO:88. In embodiments, the one or more nucleic acids includes the sequence of SEQ ID NO:88. In embodiments, the one or more nucleic acids includes SEQ ID NO:88. In embodiments, the one or more nucleic acids is SEQ ID NO:88.
[0245] In embodiments, the one or more nucleic acids includes a sequence having at least 80% identity to SEQ ID NO:89. In embodiments, the one or more nucleic acids includes a sequence having at least 85% identity to SEQ ID NO:89. In embodiments, the one or more nucleic acids includes a sequence having at least 90% identity to SEQ ID NO:89. In embodiments, the one or more nucleic acids includes a sequence having at least 95% identity to SEQ ID NO:89. In embodiments, the one or more nucleic acids includes a sequence having at least 98% identity to SEQ ID NO:89. In embodiments, the one or more nucleic acids includes the sequence of SEQ ID NO:89. In embodiments, the one or more nucleic acids includes SEQ ID NO:89. In embodiments, the one or more nucleic acids is SEQ ID NO:89.
[0246] In embodiments, the one or more nucleic acids includes a sequence having at least 80% identity to SEQ ID NO:90. In embodiments, the one or more nucleic acids includes a sequence having at least 85% identity to SEQ ID NO:90. In embodiments, the one or more nucleic acids includes a sequence having at least 90% identity to SEQ ID NO:90. In embodiments, the one or more nucleic acids includes a sequence having at least 95% identity to SEQ ID NO:90. In embodiments, the one or more nucleic acids includes a sequence having at least 98% identity to SEQ ID NO:90. In embodiments, the one or more nucleic acids includes the sequence of SEQ ID NO:90. In embodiments, the one or more nucleic acids includes SEQ ID NO:90. In embodiments, the one or more nucleic acids is SEQ ID NO:90.
[0247] In embodiments, the one or more nucleic acids include a first nucleic acid including a sequence having at least 80% identity to SEQ ID NO: 1 and a second nucleic acid including a sequence having at least 80% identity to SEQ ID NO:2. In embodiments, the one or more nucleic acids include a first nucleic acid including a sequence having at least 80% identity to SEQ ID NO: 3 and a second nucleic acid including a sequence having at least 80% identity to SEQ ID NON. In embodiments, the one or more nucleic acids include a first nucleic acid including a sequence having at least 80% identity to SEQ ID NO:5 and a second nucleic acid including a sequence having at least 80% identity to SEQ ID NO:6. In embodiments, the one or more nucleic acids include a first nucleic acid including a sequence having at least 80% identity to SEQ ID NO: 7 and a second nucleic acid including a sequence having at least 80% identity to SEQ ID NO: 8. In embodiments, the one or more nucleic acids include a first nucleic acid including a sequence having at least 80% identity to SEQ ID NO:9 and a second nucleic acid including a sequence having at least 80% identity to SEQ ID NO: 10. In embodiments, the one or more nucleic acids include a first nucleic acid including a sequence having at least 80% identity to SEQ ID NO: 11 and a second nucleic acid including a sequence having at least 80% identity to SEQ ID NO: 12. In embodiments, the one or more nucleic acids include a first nucleic acid including a sequence having at least 80% identity to SEQ ID NO: 13 and a second nucleic acid including a sequence having at least 80% identity to SEQ ID NO: 14. In embodiments, the one or more nucleic acids include a first nucleic acid including a sequence having at least 80% identity to SEQ ID NO: 15 and a second nucleic acid including a sequence having at least 80% identity to SEQ ID NO: 16. In embodiments, the one or more nucleic acids include a first nucleic acid including a sequence having at least 80% identity to SEQ ID NO: 17 and a second nucleic acid including a sequence having at least 80% identity to SEQ ID NO: 18. In embodiments, the one or more nucleic acids include a first nucleic acid including a sequence having at least 80% identity to SEQ ID NO: 19 and a second nucleic acid including a sequence having at least 80% identity to SEQ ID NO:20. In embodiments, the one or more nucleic acids include a first nucleic acid including a sequence having at least 80% identity to SEQ ID NO:21 and a second nucleic acid including a sequence having at least 80% identity to SEQ ID NO:22. In embodiments, the one or more nucleic acids include a first nucleic acid including a sequence having at least 80% identity to SEQ ID NO: 87 and a second nucleic acid including a sequence having at least 80% identity to SEQ ID NO:88. In embodiments, the one or more nucleic acids include a first nucleic acid including a sequence having at least 80% identity to SEQ ID NO:89 and a second nucleic acid including a sequence having at least 80% identity to SEQ ID NO:90.
[0248] For the methods provided herein, in embodiments, the guanitoxin-producing bacteria are cyanobacteria. In embodiments, the cyanobacteria are Sphaerospermopsis torques-reginae, Chrysosporum ovalisporum, Cuspidothrix, Cylindrospermopsis, Cylindrospermum, Dolichospermum, Microcystis, Oscillatoria, Planktothrix, Phormidium, Anabaena flos-aquae, A. lemmermannii Raphidiopsis mediterranea, Tychonema, or Woronichinia. In embodiments, the cyanobacteria are Sphaerospermopsis torques-reginae . In embodiments, the cyanobacteria are Chrysosporum ovalisporum. In embodiments, the cyanobacteria are Cuspidothrix. In embodiments, the cyanobacteria are Cylindrospermopsis . In embodiments, the cyanobacteria are Cylindrospermum. In embodiments, the cyanobacteria are Dolichospermum. In embodiments, the cyanobacteria are Microcystis. In embodiments, the cyanobacteria are Oscillatoria. In embodiments, the cyanobacteria are Planktothrix. In embodiments, the cyanobacteria are Phormidium. In embodiments, the cyanobacteria are Anabaena flos-aquae. In embodiments, the cyanobacteria are A. lemmermannii Raphidiopsis mediterranea. In embodiments, the cyanobacteria are Tychonema. In embodiments, the cyanobacteria are Woronichinia. [0249] For the method provided herein, including embodiments thereof, the aqueous liquid may be derived from a body of water which may be ingested by, inhaled by, or in contact with a subject. In embodiments, the aqueous liquid is derived from a potable water source. In embodiments, the aqueous liquid is derived from a lake, river, or pond. In embodiments, the aqueous liquid is derived from a lake. In embodiments, the aqueous liquid is derived from a river. In embodiments, the aqueous liquid is derived from a pond.
[0250] In embodiments, the aqueous liquid is derived from a public water system or a private water system. The public water system or private water system may be a potable water source. In embodiments, the aqueous liquid is derived from a public water system. “Public water system” is used according to its commonly known meaning in the art, and refers to water provided to humans for consumption through a contracted conveyance (e.g. pipes, etc.) that has 15 or more service connections, or serves at least 25 people for at least 60 days out of the year. In embodiments, the public water system has a contracted conveyance (e.g. pipes, etc.) that has 15 or more service connections. In embodiments, the public water system serves at least 25 people for at least 60 days out of the year. “Service connection” is used in accordance to its commonly known meaning in the art and refers to the point of connection between the constructed conveyance of the water system and the constructed conveyance used by a subject. In embodiments, the aqueous liquid is a community water system, a non-transient non-community water system, or a transient non-community water system.
[0251] In embodiments, the aqueous liquid is derived from a private water system. The term “private water system” is used according to its commonly known meaning in the art and refers to water systems that serve no more than 25 people at least 60 days out of the year or have no more than 15 service connections. In embodiments, the private water system serves no more than 25 people for at least 60 days out of the year. In embodiments, the private water system has no more than 15 service connections. In embodiments, the private water system is water from a spring, stream, pond, or shallow well. In embodiments, the private water system is private ground water, a residential well, or a cistern. In embodiments, the private water system is bottled water (commercial or filled individually). In embodiments, the private water system supplies water to an individual residence. [0252] For the methods provided herein, in embodiments, the aqueous liquid is ingested by, inhaled by, or contacted with a subject. In embodiments, the aqueous liquid is ingested by the subject. In embodiments, the aqueous liquid is ingested by the subject as drinking water (e.g. drinking water from a bottle, a public or private water system, a spring, etc.). In embodiments, the aqueous liquid is unintentionally ingested (e.g. while a subject is swimming). In embodiments, the aqueous liquid is inhaled by the subject. In embodiments, the aqueous liquid is inhaled as aerosolized liquid. In embodiments, the aqueous liquid is contacted with the subject (e.g. the skin or mucous membrane of the subject).
[0253] For the method provided herein, in embodiments, the subject is treated for guanitoxin-induced toxicity when the one or more guanitoxin biosynthetic genes are detected. In embodiments, the treatment includes ameliorating a symptom of guanitoxin-induced toxicity. In embodiments, the symptom of guanitoxin-induced toxicity is tremors and/or seizure. In embodiments, the treatment includes administering an effective amount of a muscle relaxant, benzodiazepine, or barbiturate to the subject. In embodiments, the treatment includes administering an effective amount of atropine to the subject. In embodiments, the treatment includes administering an effective amount of glycopyrrolate to the subject. In embodiments, the treatment includes administering an effective amount of physostigmine to the subject. In embodiments, the treatment includes administering an effective amount of 2-PAM to the subject.
KITS
[0254] Provided herein are, inter alia, kits including components, such as such as reagents and reaction mixtures, for detecting guanitoxin biosynthetic genes (e.g. GntA, GntB, GntC, GntD, GntE, GntF, GntG, GntH, GntI, GntJ, GntT, or a combination thereof) as described herein including embodiments thereof. In embodiments, the kit includes materials and instructions (e.g., for storage and use of kit components). In embodiments, the kit includes reagents capable of detecting the presence of one or more of GntA, GntB, GntC, GntD, GntE, GntF, GntG, GntH, GntI, GntJ, GntT, or any combination thereof in an aqueous liquid. In embodiments, the kit includes one or more nucleic acids at least partially complementary to a portion to one or more guanitoxin biosynthetic genes (e.g. GntA, GntB, GntC, GntD, GntE, GntF, GntG, GntH, GntI, GntJ, GntT, or a combination thereof) as described herein including embodiments thereof. In embodiments, the one or more nucleic acids is a probe that can hybridize to one or more guanitoxin biosynthetic genes. In embodiments, the one or more nucleic acids is a primer or pairs of primers for amplifying one or more guanitoxin biosynthetic genes. In embodiments, the kits include reagents and reaction mixtures for a PCR method or an isothermal amplification method. In embodiments, the kit includes instructions. In embodiments, the kit includes a label or insert indicating regulatory approval for diagnostic use. Thus, in an aspect is provided a kit for detecting guanitoxin-producing bacteria in an aqueous liquid, the kit including one or more nucleic acids each at least partially complementary to a portion of one or more guanitoxin biosynthetic genes, wherein the one or more guanitoxin biosynthetic genes are GntA, GntB, GntC, GntD, GntE, GntF, GntG, GntH, GntI, GntJ, GntT, or a combination thereof.
[0255] In embodiments, the one or more guanitoxin biosynthetic genes are GntA, GntJ, GntC, or a combination thereof. In embodiments, the one or more guanitoxin biosynthetic genes includes GntA. In embodiments, the one or more guanitoxin biosynthetic genes includes GntJ. In embodiments, the one or more guanitoxin biosynthetic genes includes GntC.
[0256] In embodiments, the portion of the one or more guanitoxin biosynthetic genes includes a coding sequence, a promoter region sequence, a terminator region sequence, or an intergene region sequence. In embodiments, the portion of the one or more guanitoxin biosynthetic genes includes a coding sequence.
[0257] In embodiments, the one or more nucleic acids each independently includes a sequence having at least 80% identity to any one of SEQ ID NO: 1 to SEQ ID NO:22, wherein each nucleic acid of the one or more nucleic acids is different. In embodiments, the one or more nucleic acids each independently includes a sequence having at least 80% identity to any one of SEQ ID NO: 1 to SEQ ID NO:4. In embodiments, the one or more nucleic acids includes a first nucleic acid including a sequence having at least 80% identity to SEQ ID NO: 1 and a second nucleic acid including a sequence having at least 80% identity to SEQ ID NO:2. In embodiments, the one or more nucleic acids includes a first nucleic acid including a sequence having at least 80% identity to SEQ ID NO:3 and a second nucleic acid including a sequence having at least 80% identity to SEQ ID NO:4. [0258] In embodiments, the guanitoxin-producing bacteria are cyanobacteria. In embodiments, the cyanobacteria are Sphaerospermopsis torques-reginae, Chrysosporum ovalisporum, Cuspidothrix, Cylindrospermopsis, Cylindrospermum, Dolichospermum, Microcystis, Oscillatoria, Planktothrix, Phormidium, Anabaena flos-aquae, A. lemmermannii Raphidiopsis mediterranea, Tychonema, or Woronichinia. In embodiments, the cyanobacteria are Sphaerospermopsis torques-reginae .
[0259] In embodiments, the aqueous liquid is derived from a lake, river, or pond. In embodiments, the aqueous liquid is derived from a public water system or a private water system. In embodiments, the aqueous liquid is ingested, inhaled, or contacted by a subject.
[0260] In embodiments, the kit includes an enzyme, deoxynucleoside triphosphates (dNTPs), a control DNA, a detectable compound, or a combination thereof. In embodiments, the kit includes an enzyme. In embodiments, the enzyme is a reverse transcriptase. In embodiments, the enzyme is a polymerase. In embodiments, the polymerase is Taq polymerase. In embodiments, the kit includes dNTPs. In embodiments, the kit includes a control DNA. In embodiments, the control DNA includes a guanitoxin biosynthetic gene or a fragment thereof as provided herein including embodiments thereof. In embodiments, the control DNA includes one or more of SEQ ID NO:23-75 or a portion or fragment thereof, as provided herein including embodiments thereof. In embodiments, the kit includes a detectable label. In embodiments, the detectable label is a fluorescent compound. In embodiments, the fluorescent compound binds non-specifically to double-stranded DNA. The detectable label may include any number of DNA detecting probes or compounds useful for detecting DNA (e.g. SYBR Green, TAQMAN™).
[0261] In embodiments, the kit further includes a therapeutic effective for treating guanitoxin-induced toxicity.
NUCLEIC ACID COMPOSITIONS
[0262] The compositions provided herein include nucleic acids at least partially complementary to a portion of a guanitoxin biosynthetic gene (e.g. GntA, GntB, GntC, GntD, GntE, GntF, GntG, GntH, GntI, GntJ, GntT) as provided herein including embodiments thereof. The nucleic acid at least partially complementary to a portion of a guanitoxin biosynthetic gene is described in detail throughout this application (including in the description above and in the examples section). Thus, in an aspect is provided a composition including one or more nucleic acids each independently comprising a sequence having at least 80% identity to any one of SEQ ID NO: 1 to SEQ ID NO:22, wherein each nucleic acid of the one or more nucleic acids is different.
EMBODIMENTS
[0263] Embodiment 1. A method of detecting guanitoxin-producing bacteria in an aqueous liquid, the method comprising detecting one or more guanitoxin biosynthetic genes in the aqueous liquid, wherein the one or more guanitoxin biosynthetic genes are GntA, GntB, GntC, GntD, GntE, GntF, GntG, GntH, GntI, GntJ, GntT, or a combination thereof.
[0264] Embodiment 2. The method of embodiment 1, wherein the one or more guanitoxin biosynthetic genes are GntA, GntJ, GntC, or a combination thereof.
[0265] Embodiment 3. The method of embodiment 1 or 2, the method comprising contacting the aqueous liquid with one or more nucleic acids, wherein each of the one or more nucleic acids are at least partially complementary to a portion of the one or more guanitoxin biosynthetic genes.
[0266] Embodiment 4. The method of any one of embodiments 1 to 3, wherein the detecting comprises performing a PCR method, an isothermal amplification method, a sequencing method, or a combination thereof.
[0267] Embodiment 5. The method of embodiment 3 or 4, wherein the portion of the one or more guanitoxin biosynthetic genes comprises a coding sequence, a promoter region sequence, a terminator region sequence, or an intergene region sequence.
[0268] Embodiment 6. The method of embodiment 5, wherein the portion of the one or more guanitoxin biosynthetic genes comprises a coding sequence.
[0269] Embodiment 7. The method of any one of embodiments 3 to 6, wherein the one or more nucleic acids each independently comprises a sequence having at least 80% identity to any one of SEQ ID NO: 1 to SEQ ID NO:22, wherein each nucleic acid of the one or more nucleic acids is different. [0270] Embodiment 8. The method of embodiment 7, wherein the one or more nucleic acids each independently comprises a sequence having at least 80% identity to any one of SEQ ID NO: 1 to SEQ ID NON.
[0271] Embodiment 9. The method of embodiment 8, wherein the one or more nucleic acids comprises a first nucleic acid comprising a sequence having at least 80% identity to SEQ ID NO: 1 and a second nucleic acid comprising a sequence having at least 80% identity to SEQ ID NO:2.
[0272] Embodiment 10. The method of embodiment 8, wherein the one or more nucleic acids comprises a first nucleic acid comprising a sequence having at least 80% identity to SEQ ID NON and a second nucleic acid comprising a sequence having at least 80% identity to SEQ ID NON.
[0273] Embodiment 11. The method of any one of embodiments 1 to 10, wherein the guanitoxin-producing bacteria are cyanobacteria.
[0274] Embodiment 12. The method of embodiment 11, wherein the cyanobacteria are Sphaerospermopsis torques-reginae, Chrysosporum ovalisporum, Cuspidothrix, Cylindrospermopsis, Cylindrospermum, Dolichospermum, Microcystis, Oscillatoria, Planktothrix, Phormidium, Anabaena flos-aquae, A. lemmermannii Raphidiopsis mediterranea, Tychonema, or Woronichinia.
[0275] Embodiment 13. The method of any one of embodiments 1 to 12, wherein the aqueous liquid is derived from a lake, river, or pond.
[0276] Embodiment 14. The method of any one of embodiments 1 to 12, wherein the aqueous liquid is derived from a public water system or a private water system.
[0277] Embodiment 15. The method of any one of embodiments 1 to 14, wherein the aqueous liquid is ingested by, inhaled by, or contacted with a subject.
[0278] Embodiment 16. The method of embodiment 15, wherein the subject is treated for guanitoxin-induced toxicity when the one or more guanitoxin biosynthetic genes are detected.
[0279] Embodiment 17. A kit for detecting guanitoxin-producing bacteria in an aqueous liquid, the kit comprising one or more nucleic acids each at least partially complementary to a portion of one or more guanitoxin biosynthetic genes, wherein the one or more guanitoxin biosynthetic genes are GntA, GntB, GntC, GntD, GntE, GntF, GntG, GntH, GntI, GntJ, GntT, or a combination thereof.
[0280] Embodiment 18. The kit of embodiment 17, wherein the one or more guanitoxin biosynthetic genes are GntA, GntJ, GntC, or a combination thereof.
[0281] Embodiment 19. The kit of embodiment 17 or 18, wherein the portion of the one or more guanitoxin biosynthetic genes comprises a coding sequence, a promoter region sequence, a terminator region sequence, or an intergene region sequence.
[0282] Embodiment 20. The kit of embodiment 19, wherein the portion of the one or more guanitoxin biosynthetic genes comprises a coding sequence.
[0283] Embodiment 21. The kit of any one of embodiments 17 to 20, wherein the one or more nucleic acids each independently comprises a sequence having at least 80% identity to any one of SEQ ID NO: 1 to SEQ ID NO:22, wherein each nucleic acid of the one or more nucleic acids is different.
[0284] Embodiment 22. The kit of embodiment 21, wherein the one or more nucleic acids each independently comprises a sequence having at least 80% identity to any one of SEQ ID NO: 1 to SEQ ID NO:4.
[0285] Embodiment 23. The kit of embodiment 22, wherein the one or more nucleic acids comprises a first nucleic acid comprising a sequence having at least 80% identity to SEQ ID NO: 1 and a second nucleic acid comprising a sequence having at least 80% identity to SEQ ID NO:2.
[0286] Embodiment 24. The kit of embodiment 22, wherein the one or more nucleic acids comprises a first nucleic acid comprising a sequence having at least 80% identity to SEQ ID NO:3 and a second nucleic acid comprising a sequence having at least 80% identity to SEQ ID NO:4.
[0287] Embodiment 25. The kit of any one of embodiments 17 to 24, wherein the guanitoxin-producing bacteria are cyanobacteria. [0288] Embodiment 26. The kit of embodiment 25, wherein the cyanobacteria are Sphaerospermopsis torques-reginae, Chrysosporum ovalisporum, Cuspidothrix, Cylindrospermopsis, Cylindrospermum, Dolichospermum, Microcystis, Oscillatoria, Planktothrix, Phormidium, Anabaena flos-aquae, A. lemmermannii Raphidiopsis mediterranea, Tychonema, or Woronichinia.
[0289] Embodiment 27. The kit of any one of embodiments 17 to 26, wherein the aqueous liquid is derived from a lake, river, or pond.
[0290] Embodiment 28. The kit of any one of embodiments 17 to 26, wherein the aqueous liquid is derived from a public water system or a private water system.
[0291] Embodiment 29. The kit of any one of embodiments 17 to 27, wherein the aqueous liquid is ingested, inhaled, or contacted by a subject.
[0292] Embodiment 30. The kit of any one of embodiments 17 to 29, further comprising an enzyme, deoxynucleoside triphosphates (dNTPs), a control DNA, a detectable label, or a combination thereof.
[0293] Embodiment 31. The kit of any one of embodiments 17 to 30, further comprising a therapeutic effective for treating guanitoxin-induced toxicity.
EXAMPLES
Example 1: Introduction to Exemplary Studies
[0294] The complete guanitoxin BGC and pathway and diagnostic intermediates are described herein. Inventors have validated intermediates with synthetic and chemoenzymatic standards, and surveyed environmental metatranscriptomic and metagenomic datasetsto reveal the global prevalence of guanitoxin biosynthetic capability in both rural and populated watersheds. These discoveries are contemplated to accelerate understanding of the distribution and impact of lethal, yet essentially unmonitored, guanitoxin.
Example 2: Results and Discussion
[0295] Sequencing of the genome of the guanitoxin-producing cyanob acteri al strain
Sphaerospermopsis torques-reginae ITEP-024 (GenBank accession no. CP080598)26 acquired from a toxic cyanoHAB in the Tapacura Reservoir, Pernambuco, Brazil (FIGS. 7-8)
27 is described herein. Bioinformatic analysis of the resulting 5.24 Mbp genome with prediction tools such as antiSMASH18 identified 11 natural productBGCs (FIG. 2A, Table 3). None, however, were con si stent with guanitoxin biogenesis involving predicted amino acid and organophosphate biochemistry.
[0296] To identify candidate guanitoxin BGCs, genes associated with the arginine metabolites (S)- \|/-hydroxy-L-arginine (2) and L-enduracididine (3) were under scrutiny. Both 2 and 3 were previously isolated from guanitoxin-producing Anabaena flos-aqiiaeU 1' and converted in vivo to guanitoxin via stable isotope labeling experiments (FIG. IB).25 Subsequently identification of 3 in Sphaerospermopsis torques-reginae ITEP-024 extracts were done, further supporting its intermediacy in guanitoxin biosynthesis (FIGS. 9A-9B). As amino acid 3 and its epimers are incorporated in actinobacterial non-ribosomal peptide antibiotics such as a mannopeptimycin,28 a teixobactin,29 and an enduracidin,30 and constructed via a pyridoxal-5'-phosphate (PLP)-based enzymatic transformations, Sphaerospermopsis torques-reginae ITEP-024 genome was queried for homologous genes to known 3-producing enzymes mppP/mppQ/mppR from the mannopeptimycin biosynthetic pathway.31 33 While the mannopeptimycin biosynthetic pathway did not locate close homologs, it did identify a candidate BGC encoding three PLP-dependent enzymes embedded within a unique 12.5 kilobytes (kb) gene cluster consisting of 10 metabolic enzymes (gntA-J) and a putative transporter (gnlT) (FIG. 2B). The identified candidate gnt BGC is unique and not found in any reference cyanobacterial genome currently available in the National Center for Biotechnology Information (NCBI) database, nor in the metagenome- assembled genomes (MAGs) of the JGI Earth Microbiome Project.34 While most of the encoded enzymes did not have high sequence similarity to characterized homologs (FIGS. 10A-10B, Table 9), initial sequence and structural homology analyses identified multiple biosynthetic functions consistent with the guanitoxin assembly; these included oxidation (gnlA, gntB, gntD\ PLP-dependent reactions (gntC, gntE, gntG\ S- adenosylmethionine (SAM)-dependent methylations (gntF and gntJ), and phosphate biochemistry (gntl and gntH) (FIG. 2B, FIGS. 10A-10B, Table 9). Therefore, a putative guanitoxin biosynthetic pathway was constructed based on previously isolatedchemical intermediates and gnt bioinformatic annotations (FIG. 2C). To experimentally link candidate Gnt enzymes to toxin biosynthesis, the gnt genes were synthesized and then expressed and purified the majority to homogeneity as Escherichia coli codon optimized TV-terminal Hise fusion proteins (FIG. 11).
[0297] Provided herein is a guanitoxin biosynthetic reaction focused on the presumed N,N- dimethylation of 6 to 7 by predicted N-methyltransferase GntF. The chemical structures of putative intermediates 6 and 7 are novel molecules yet to be observed in other characterized biosynthetic pathways, so their interconversion would strongly implicate the gnt BGC. Intermediates 6 and 7 were chemically synthesized from (5)-Gamer's aldehyde in 6 and 5 steps respectively; upon incubation of synthetic 6 with GntF in the presence of excess SAM, its efficient conversion to dimethylated 7 following hydrophilic interaction liquid chromatography mass spectrometry (HILIC-MS) analyses (FIGS. 3A-3B) were observed. It was also observed 7 in Sphaerospermopsis torques-reginae ITEP-024 extracts, lending further physiological relevance to its role in guanitoxin biogenesis (FIGS. 9A-9B). With this encouraging result in hand, it was next sought to connect these key biosynthetic intermediates to their primary metabolic precursors and the mature toxin itself.
[0298] Multiple characterized biosynthetic pathways employ PLP- dependent enzymes to explore their dehydrogenation, oxidation, and transamination chemistries on L-arginine.35 However, none of the three PLP-dependent enzymes (GntC, GntE, GntG) nor the soluble a- ketoglutarate ferrous (Fe2+)- dioxygenase GntD showed any clear activity towards this amino acid (FIGS. 12A-13B). Therefore, initialization of in vitro enzymology efforts with known biosynthetic intermediates 2 and 3 were done, which were chemically synthesized following adaptation of previously established synthetic procedures.36,37 Surprisingly, GntC catalyzed the highly diastereoselective PLP-dependent cyclodehydration of 2 into 3, without any additional enzymes or co-substrates (FIGS. 14A-14D). The stereochemical differences between chemically synthesized substrate and product diastereomers were magnified via 1- fluoro-2,4-dinitrophenyl-5-L-alanine amide (L-FDAA, Marfey's reagent) derivatization and ultra- performance liquid chromatography mass spectrometry (UPLC-MS) analysis, enabling to determine that GntC was highly diastereoselective for the (A')-hydroxy stereoisomer of \|/_ hydroxy-L-arginine while exclusively generating 3 over itSL-a/to-enduracididine epimer (FIGS. 14A-14D). While transmembrane protein GntB proved intractable to soluble protein expression, the in vivo expression of a gntB-gntC- containing plasmid in E. coli successfully produced 3 (FIGS. 15-16B), thus corroborating GntB's initiating role in the construction of 2. Cyanob acteri al 3 biosynthesis presents a divergent reaction strategy to known actinobacterial cyclic arginine biosyntheses that either rely on multiple PLP- dependent enzymes,31 33 or form 6-membered capreomycidine rings from linear hydroxylated arginine precursors (FIGS. 17A-17D).38 41
[0299] Next GntD was established as an enduracididine B- hydroxylase in converting 3 to 4 in the presence of ferrous Fe2+, a- ketoglutarate, and L-ascorbate as previously reported for the Streptomyces enzyme Mannopeptimycin biosynthesis protein O (MppO) (FIGS. 18A- 18C).42 To connect 4 with diagnostic intermediate 6, a two-enzyme cascade with aldolase GntG and transaminase GntE excising glycine and converting the resultant aldehyde to a primary amine, respectively was proposed. The GntE/GntG cascade was validated by performing the reaction in reverse to construct 4 from 6, glycine, a-ketoglutarate, PLP, and the two PLP-dependent enzymes (FIGS. 19A-19B). Applying all four biosyntheticenzymes and their requisite cofactors and co-substrates simultaneously in one pot converted 2 to 6 and the glycine byproduct, with intermediate aldehyde 5 not observed due to its presumed instability and absence of a primary amine for L-FDAA derivatization (FIGS. 4A-4B). Exclusion of individual enzymes halted progression along this four-step pathway in amanner consistent with our biosynthetic proposal (FIGS. 20A-20B).
[0300] After assembling the guanitoxin carbon backbone via six biosynthetic transformations, focus was on the enzymatic construction of the A-hydroxylated methyl phosphate that functions as the anticholinesterase pharmacophore (FIG. 5 A). In the presence of the unusual heme-containing GntA and excess nicotinamide adenine dinucleotide phosphate (NADPH) under ambient aerobic conditions, the selective mono-oxygenation of tertiary amine 7 to product 8 (FIGS. 21A-21B) was observed. This reaction is reminiscent of the homologous heme-oxygenase dcsA gene encoded A®- hydroxylation of L-arginine that initiates a specialized pathway to D-cycloserine in Streptomyces bacteria.43 Ultrafiltration of the GntA reaction mixture, followed by the addition of kinase GntI, (9-m ethyl transf erase GntJ, and their cofactors and co-substrates generated phosphorylatedintermediate 9 and guanitoxin, with identical HILIC-MS properties to those observed in Sphaerospermopsis torques-reginae ITEP-024 extracts (FIGS. 5B, 9A-9B, 22A-22C, 23A-23D). Subsequently investigation was done on recombinant human acetylcholinesterase inhibition with in situ generated 8, 9, and guanitoxin (FIG. 5C). Only the mature neurotoxin possessed inhibitory activity, establishing that the GntJ-catalyzed 0-m ethylation is crucial for toxicity. Overall, the chemical, biochemical, and metabolomic analyses strongly connect guanitoxin biosynthesis to the gnt BGC and provide a genetic handle to investigate this cyanotoxin environmentally.
[0301] After validating the guanitoxin biosynthetic genes in Sphaerospermopsis torques- reginae ITEP-024, next their environmental distribution was explored. As observed previously that no gnt BGC homologs were found in publicly available assembled genomes and metagenomes, instead queries were done for raw, unassembled environmental sequencing data. Specifically, allavailable (108,465) NCBI Sequence Read Archive (SRA) metagenomic (metaG) and metatranscriptomic (metaT) datasets were searched using the SearchSRT tool. The search identified 13 datasets with at least one read aligning to the gnt BGC with translated-nucleotide to protein sequence identity, despite the fact that the computationally efficient SearchSRAtool is sensitivity-limited to detect only highly abundant (metaG) or highly expressed (metaT) genes. Through independent de novo assembly it was confirmed regarding the complete or nearly complete gnt BGC from six of these SRA datasets (FIGS. 6A-6B), including two metaT samples. In the metaT samples, the gnt BGC was discovered as a polycistronictranscript including all genes except gntJ. The abovementioned polycistronic structure suggested that even limited transcriptomic detection of the gnt genes might be diagnostic for active guanitoxin biosynthesis. Therefore, to explore the feasibility of metaT data for environmental gnt BGC surveys, all 2,610 freshwater metatranscriptomic SRA datasets were comprehensively queried using a custom fullsensitivity SRA search pipeline, and ultimately identified 78 gnt BGC hits from Lake Erie, Ohio, United States of America (USA); Lake Mendota, Wisconsin, USA; the Amazon River, Brazil; the Columbia River, Oregon, USA; and the DelawareRiver, Delaware, USA (FIGS. 6A-6B, Table 4). These detections occur in independent sequencing datasets within the same watershed over multiple years, such as in Lake Mendota from 2010-2016, indicating the longterm undetected presence of guanitoxin biosynthetic capacity in publicly accessible freshwater sources.
[0302] Surprisingly, inventors were unable to link the known guanitoxin producer genera as the source taxa of the two environmental metaG-derived gnt BGCs. Guanitoxin producer strains have been rarely isolated in the literature, and the taxonomy of guanitoxin producer genera has been unstable, suggesting thefull taxonomic extent of guanitoxin production is not yet known. To explore if other cyanob acteri al genera could harbor gnt biosynthetic capacity, two environmental metagenome-derived gnt BGCs were linked to their best- supported taxonomic origins. Via full MAG assembly, the gnt BGCs from the Amazon River and Lake Mendota datasets were connected to the Cuspidothrix and Aphanizomenon cyanobacterial genera, respectively (FIGS. 24-25). Whilethese genera are known to produce other cyanotoxins such asanatoxin-a, microcystin, and cylindrospermopsin,44 they have not yet been reported to produce guanitoxin. However, their genomic potential warrants future environmental surveying for guanitoxin biosynthesis.
[0303] The characterization of how guanitoxin is biochemically assembled from the standard amino acid L-arginine represents the first biosynthetic report of a natural organophosphate neurotoxin. Significantly, the discovery of the architecturally unique gnt BGC enabled identification of environmental hotspots in unaware rural and populated areas for the potent yet unmonitored neurotoxin guanitoxin, suggesting that revised cyanoHAB monitoring protocols are warranted in public watersheds vulnerable to toxic harmful algal/cyanobacterial blooms.
Example 3: Materials and Methods
[0304] Sphaerospermopsis torques-reginae ITEP-024 culture
[0305] A non-axenic culture of Sphaerospermopsis torques-reginae strain ITEP-024, a known producer of guanitoxin ((5S)-5-[(dimethylamino)methyl]-l- [hydroxy(methoxy)phosphoryl]oxy}-4,5-dihydro-lH-imidazol-2-amine) was a gift from V. R. Werner (Museum of Natural Sciences, Porto Alegre, Brazil), and was maintained in conditions similar to previous descriptions45.
[0306] In brief, ITEP-024 cultures were grown in 50 mL autoclaved ASM-1 medium46, excepting the final ASM-1 ZnCl and CuCh concentrations were 2.5 pM and 0.01 pM, respectively, and the pH was 7.0-7.4. Cultures were grown in 125 mL borosilicate glass Erlenmeyer flasks, sealed with gas-permeable waxed paper. Cultures were either maintained under ambient laboratory light cycle, light intensity, and temperature on the benchtop, or in lighted incubators under previously described conditions45. Cultures were harvested either by centrifugation, or by GF/F (Whatman, Cytiva) glass fiber filtration of trichomes. All culture manipulations were performed in biosafety or chemical fume hoods as appropriate. Cultures were biologically and chemically inactivated by 10% bleach treatment before disposal.
[0307] Transformation into E. coli
[0308] The plasmids were transformed into A. coli DH10B chemically competent cells for storage and BL21(DE3) for expression. The transformation by heat shock proceeded according to the following protocol: 0.5 μL of plasmid was added to the chemical competent cells and maintained into ice for 30 minutes. After this, the cells were heated to 42 °C for 45 seconds and placed in the ice again for 3 minutes; 900 μL of LB medium was added in the tube and the cells were incubated for 50 minutes at 37 °C and 200 rpm of agitation. After this step, the cells were plated on LB agar plates supplemented with the corresponding antibiotics. The plates were incubated at 37 °C, overnight. An inoculum with colonies was grown overnight in the same condition of LB agar plates at 37 °C and 200 rpm of agitation and purified following the protocol of Plasmid DNA Purification QIAprep Spin Miniprep Kit (QIAGEN). After plasmid purification, concentrations were measured by NanoDrop UV-vis spectrophotometry and stored at -20 °C.
[0309] Extraction and LC-MS analyses
[0310] Lyophilized culture was extracted with 5 mL of ethanol/acetic acid 0.1M (20:80 v/v), sonicated for 1 minute on ice and centrifuged at 5,000 x g for 15 minutes. The supernatant was lyophilized and resuspended in methanol and filtered with a syringe into an autosampler vial. The Hydrophilic Interaction Liquid Chromatography (HILIC) separation was carried out on a SeQuant® ZIC -HILIC, 150 x 2.1 mm, 5 pm, 200 A (Merck) column similar to the method described in previous studies47.
[0311] HILIC Method: Separation was achieved under gradient elution at 0.2 mL/min where elution A was 5 mM ammonium formate containing 0.01% (v/v) formic acid, and elution B was acetonitrile/water (90: 10 v/v) with 0.01% (v/v) formic acid. Elution started with a linear gradient of 90% B to 20% until 35 min, second isocratic gradient of 20% B until 37.50 min and a third isocratic gradient of 90% B until 45 min.
[0312] RP Method: For this analysis a Synergi Polar-RP 4p 250 x 4.6 mm column (Phenomenex) used at 0.75 mL/min with the following method: 0% B (12 min), 0 to 100% B (5 min), 100% B (3 min), 100 to 0% B (2.5 min), 0% B (2.5 min), wherein A = 0.1% aqueous formic acid, and B = 0.1% formic acid in acetonitrile.
[0313] General LC-MS measurements were measured on a Bruker Elute UHPLC system coupled with a Bruker amaZon SL ESI-Ion Trap mass spectrometer in positive mode. Compounds were separated via reversed-phase chromatography on a Bruker Intensity Solo C1846, 2 pm- 2 x 100 mm column with the eluents water + 0.1% formic acid (Solvent A) and acetonitrile + 0.1% formic acid (Solvent B). The LC method uses a flow rate of 0.300 mL/min and the following gradient: 10% to 20% B over 3 minutes, 20% to 45% B over 3 minutes, 45% - 100% B over 2 minutes, hold at 100% B for 2 minutes, 100% - 10% B over 1 minute, hold at 10% B for 2 minutes.
[0314] Semi-preparative HPLC purification was performed using a Shimadzu Prominence preparative HPLC system with a SPD-20A model UV/Vis detector. Analytical HPLC purification was performed using an Agilent 1260 Infinity system with a G1314B model VWD. The monitored wavelength was 210 nm for all runs.
[0315] Sphaerospermopsis torques-reginae ITEP-024 genome sequencing and assembly
[0316] For Illumina genome sequencing: Illumina data for ITEP-024 that had been previously quality and adaptor trimmed, was prepared in a previously described project45. In brief, a Nextera XT (Illumina) sequencing library was prepared and sequenced on a MiSeq instrument to a depth of -26M reads with 300x300 bp paired-end (PE) sequencing. In total, 13.4 Gbp (~2400x coverage) of quality and adaptor trimmed data was used for downstream analyses. The library insert size of was -125-400 bp with a tail up to 800 bp, as determined by Qualimap v2.2.148 analysis of reads aligned to the ITEP-024 genome assembly with bowtie2 v2.4.249. These quality and adaptor trimmed reads are available on NCBI SRA as accession (SRR15608978).
[0317] For nanopore genome sequencing: 50 mL of Sphaerospermopsis torques-reginae ITEP-024 was harvested via centrifugation, and the pellet was resuspended in 500 μL of 10 mM Tris pH 8.0. 7 mg of lysozyme was added, and the suspension incubated at 37 °C for 30 minutes. 1.5 mL of cetrimonium bromide (CTAB) buffer (3% CTAB, 1.4 M NaCl, 20 mM EDTA, 100 mM Tris pH 8.0, 3% polyvinylpolypyrrolidone (PVPP), 0.2% β- mercaptoethanol) was added, followed by 10 μL of proteinase K at 20 mg/mL and a 2-hour 50°C incubation with mixing every 15 minutes. The sample was centrifuged at 15,000 x g for 15 minutes at room temperature, and the supernatant transferred to a new tube and added 400 μL of 5M potassium acetate pH 8.0, the tube was then kept on ice for 30 minutes to precipitate polysaccharides. The tube was centrifuged at 15,000 x g for 15 minutes at 4 °C. Half the volume of the supernatant was transferred to a new tube and extracted with 1 equivalent of phenol: chloroform: isoamyl alcohol (25:24: 1). The tube was centrifuged at 12,000 x g for 5 minutes at 4 °C, and supernatant and precipitate DNA were transferred to a new tube with 1 equivalent in volume of ice-cold isopropanol. The sample was harvested at 12,000 x g for 15 minutes at 4 °C. After discarding the supernatant, the DNA pellet was washed 2 times with 0.5 mL of 75% ethanol. The DNA was air dried for 30 minutes at 37 °C and resuspended in 40 μL of 10 mM Tris pH 8.5. Sample was maintained at -20 °C until sequencing. A Nanopore ligation sequencing library (P/N SQK-LSK109) was prepared from this DNA following the manufacturer’s instructions, and the resulting library sequenced via a Flongle flowcell on an Oxford Nanopore MinlON sequencer. The resulting dataset was checked via NanoPlot vl.27.O50, and had an N50 of -5.5 Kbp and a yield of ~0.2 Gbp. A systematic error was noted with an unexpectedly high (>25%) proportion of palindromic reads. Speculation was made that these errors may be due to the Flongle system being a new product from Oxford Nanopore at the time of the experiment.
[0318] Sphaerospermopsis torques-reginae ITEP-024 genome assembly & annotation
[0319] For the Illumina MiSeq dataset, read quality was checked using FastQC vO.10.1 (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Reads were additionally quality filtered to trim bases with a Phred score below 25 using Prinseq51. To help reduce the effect of the aforementioned Nanopore Flongle palindromic errors, palindromic sequences from the Nanopore dataset were consensus-corrected and trimmed with Canu52. These quality and palindrome trimmed reads are available on NCBI SRA as accession (SRR15608977). A second correction was applied on these Canu trimmed sequences by mapping the Illumina reads with CoLoRMap53. Two de novo hybrid assemblies were produced with Unicycler54 and MaSuRCA55 using both the Illumina and Nanopore datasets as inputs. The generated contigs from both assemblers were improved using S SPACE56 to merge scaffolds, Pilon57 for polishing and GapFiller58 for filling of gaps. 26 total iterations of scaffolding, gap-filling, and polishing were performed. Subsequently, an overall assembly was generated by merging both assemblies with Flye using the subassembly feature59. The quality assessment of the assemblies was done with Quast 5.O.260 and lineage-expected gene completeness statistics and lineage-incongruent gene contamination statistics were measured with CheckM61 (FIG. 9B). Two -185 Kbp plasmids and some limited sequences from presumed contaminating heterotrophic bacteria were detected in some of the assemblies. They however did not have clear specialized metabolism related genes and could not be clearly linked as ITEP-024 sequences, so were not analyzed further. The post-assembly circularization tool Circlator version62 was used to check for circularization of the Sphaerospermopsis torques-reginae ITEP-024 chromosomal genome. However, the software could not provide clear conclusions about the linearity or circularization of the genome. Since linear contigs produced by genome assemblers can contain the absence of short sequences that would join the contig ends, it is difficult to ensure whether the Sphaerospermopsis torques-reginae ITEP-024 genome is linear or circular, or if it was not circularized due to the absence of sequences in both ends. The determination of a linear or circular chromosomal genome for ITEP-024 was unresolved, but as circular chromosomes are most common in the cyanob acteri al phylum, speculation that the genome would be circular was done. The intermediate working gene annotation was performed with Prokka63, while the final gene annotation on the NCBI submitted chromosomal assembly (NCBI accession CP080598.1) was produced via the NCBI Prokaryotic Gene Annotation Pipeline (PGAP).
[0320] Molecular Biology/Biochemical Methods
[0321] gnt cloning
[0322] The enzyme coding sequences were optimized for expression in E. coli and synthesized by GenScript Inc. Synthetic guanitoxin biosynthetic genes were sub cloned into the pET28a(+) kanamycin resistant expression vector containing an N-terminal hexahistidine (Hise) tag. The pET28a(+) plasmids containing synthetic gnt genes were resuspended in 100 μL of sterilized ultrapure water and transformed into E. coli DH10B chemically competent cells for plasmid storage and BL21(DE3) for protein expression as previously described.
[0323] pCOLADuet-gwtB-gwtC vector assembly and transformation [0324] Using their individual pET28a(+) plasmids as templates, gntB and gntC were amplified by PCR using the primers listed in Table 5 and following amplification conditions: For primer set gntB-F/R, the following program was used: an initial denaturation at 98 °C (30 s); 30 cycles of 98 °C (10 s), 70 °C (30 s), and 72 °C (30 s); and a final extension at 72 °C (2 min). For primer set gntC-F/R, the following program was used: an initial denaturation at 98 °C (30 s); 30 cycles of 98 °C (10 s), 62 °C (30 s), and 72 °C (30 s); and a final extension at 72 °C (2 min). For primer set pCOLADuet-F/R, the following program was used: an initial denaturation at 98 °C (30 s); 30 cycles of 98 °C (10 s), 61 °C (30 s), and 72 °C (2 min); and a final extension at 72 °C (2 min). PCR-amplified gntB (957 bp) and gntC (1113 bp) were individually and sequentially added into multiple cloning sites 1 and 2 of pCOLADuet-1 respectively following Gibson Assembly Master Mix protocols (New England Biolabs).
[0325] Gnt protein expression
[0326] A general method was followed for each of the guanitoxin pathway enzymes: GntA*, GntC, GntD, GntE, GntF, GntG, GntI, and GntJ. A 20 mL starter culture of LB media containing 50 pg/mL kanamycin was inoculated with A. coli BL21(DE3) containing the appropriate Gnt-containing expression plasmid from glycerol stocks and shaken overnight at 37 °C and 200 rpm. The next day, 10 mL of starter culture was used to inoculate IL of TB media containing 50 pg/mL kanamycin in a 2.8 L flask and was shaken at 37 °C and 200 rpm until the ODeoo reached 0.9. The incubation temperature was decreased to 18 °C for one hour, followed by the addition of 100 pM isopropyl-P-D-thiogalactopyranoside (IPTG) to induce protein expression. The cultures were incubated at 18 °C and shook at 200 RPM for 20 hours. Cultures were harvested by centrifuging at 2500 x g and 4 °C for 30 minutes. Pellets were resuspended in 30 mL lysis buffer (20 mM Tris-HCl pH 8.0, 1 M NaCl, 20 mM imidazole, and 10% glycerol) and stored at -70 °C until purification.
[0327] * In the case of GntA (heme-dependent N-hydroxylase), a slightly modified growth protocol was used to supply heme required for proper folding of GntA. Once the ODeoo had reached 0.4, 7 pM porcine hemin (Chem-Impex) was added to the culture and was then allowed to continue growing until an ODeoo of 0.9 was reached.
[0328] Gnt protein purification [0329] E. coli cell pellets containing gnt genes were thawed and sonicated on ice (FisherBrand Model 505 Sonic Dismembrator, 3.2 mm microtip, 40% amplitude, 15 s pulse on/45 s pulse off for a total of 7 minutes). The lysate was centrifuged at 16,000 x g for 30 minutes at 4 °C or until the supernatant had clarified. Each protein was purified using an AKTAGo FPLC system at 4 °C and buffers that had been filtered through a 0.22 pM nitrocellulose membrane. Clarified lysate was loaded onto a 5 mL HisTrap FF Column (GE Healthcare Life Sciences) that had been equilibrated with at least 25 mL of Buffer A (20 mM Tris-HCl pH 8.0, 1 M NaCl, and 20 mM imidazole) at a maximum flow rate of 2 mL/min. After loading, the column was rinsed with Buffer A until UV absorbance had returned to baseline. The column was then washed with 10% Buffer B (20 mM Tris-HCl pH 8.0, 1 M NaCl, and 250 mM imidazole) to remove weakly bound protein with at least 25 mL buffer or until UV absorbance returned to baseline. Hise-tagged protein was eluted with a linear gradient from 100% Buffer A to 100% Buffer B over 60 mL, while collecting 5 mL fractions. Fractions were assessed for purity through SDS-PAGE (10% or 12% acrylamide, depending on protein size) and were combined if they were at least 90% pure. Protein was concentrated to a volume of 2.5 mL or less using a 10 kDa or 30 kDa cutoff (based on each protein’s size) Amicon Ultra-15 concentrator. Protein was buffer exchanged into GF Buffer (50 mM HEPES pH 8.0 and 300 mM KC1) using a pre-equilibrated PD-10 gravity flow column, or further purified by size exclusion chromatography using a HiLoad 16/60 Superdex 75 column or HiLoad 16/60 Superdex 200 column, based on protein sizes and possibility of dimers (GE Healthcare Life Sciences) using a 20 mM HEPES (pH 7.5) and 300 mM KC1 buffer. Protein concentration was estimated using the Bradford assay based on a Bovine Serum Albumin standard; if necessary, protein was further concentrated after this exchange. Each protein was aliquoted and stored at -70 °C for future use. The following yields of pure protein were obtained: GntA (60 mg/L), GntC (18 mg/L), GntD (35 mg/L), GntEA (13 mg/L), GntF (225 mg/L), GntG (29 mg/L), GntI (70 mg/L), and GntJ# (180 mg/L).
[0330] #por Qntj (O-methyltransferase) the hexahistidine (Hise) tag was cut off using 10 units of thrombin per mg of recombinant protein incubated overnight, retaining some uncleaved protein as a control. The extent of Hise tag cleavage was assessed by SDS-PAGE (12% acrylamide) and size exclusion chromatography using a HiLoad 16/60 Superdex 75 column was used to purify GntJ as described previously. [0331] For GntE (PLP-dependent aminotransferase), the protein eluted in the absence of PLP and upon fractionation began to aggregate in a concentration-dependent manner. 500 pM PLP and 20% glycerol was added to the combined fractions to stabilize GntE for concentration and buffer exchange. Prior to buffer exchange, any aggregate was removed by centrifuging at 20,000 x g for 10 minutes at 4 °C. After desalting into GF buffer, GntE retained PLP and was not concentrated further. No further aggregation was observed before aliquots were made.
[0332] General procedure for Gnt enzyme assays: Marfey’s Analysis and UPLC-MS conditions
[0333] A 50 μL aliquot of the Gnt reaction mixture (typically 100 μL) was removed and added to 20 μL of a saturated sodium bicarbonate solution. The addition of 100 μL of freshly-prepared 1% w/v of l-fluoro-2,4-dinitrophenyl-5-L-alanine amide (L-FDAA, Marfey’s reagent) in acetone began the derivatization reaction, which was incubated at 37 °C for 90 minutes. After incubation, reactions were quenched by the addition of 25 μL of IN HC1, before centrifuging (15000 rpm for 5 minutes). The clarified supernatant was extracted and subjected to RP -UPLC-MS (Bruker Intensity Solo Cl 8(2), 2.1 x 100 mm) at a flow rate of 0.300 mL/min using the following method: 10 - 20% B (3 min), 20 - 45% B (3 min), 45 - 100% B (2 min), 100% B (2 min), 100 - 10% B (1 min), 10% B (2 min), where A = 0.1% aqueous formic acid, and B = 0.1% formic acid in acetonitrile.
[0334] In vivo GntB/GtnC enzyme assay
[0335] The pCOLA-Duet-1 pET28a plasmids containing no insert (empty pET28a vector), only gntB (pET28a vector), and gntB and gntC in tandem, were transformed into BL21(DE3) E. coli. 5 mL LB starter cultures containing 50 mg/mL kanamycin of BL21(DE3) colonies grew overnight. Cells were centrifuged (3400 rpm, 5 min, 4 °C), decanted, and resuspended in 5 mL M9 minimal media. Resuspended BL21(DE3) cells (1 mL) were inoculated into 50 mL aliquots of M9 minimal media containing 30 mg/mL kanamycin. Cultures were incubated (200 rpm, 37 °C) until reaching an ODeoo of 0.7, then were cooled and incubated for one additional hour (200 rpm, 18 °C). Cultures were induced with 1 mM IPTG to induce GntB and GntB/C protein expression. After 5 days of incubation (200 rpm, 18 °C), 1 mL aliquots of each culture were extracted. Aliquots were sonicated (40% amplitude, 10 seconds on, 10 seconds off, for 1.5 minutes) to lyse the cells, and then cell lysates were centrifuged for 10 minutes (15000 rpm, 4 °C). From the centrifuged cell lysate, 50 μL of the supernatant was aliquoted for Marfey’s derivatization and subsequent UPLC -MS analysis.
[0336] In vitro GNT enzyme assays
[0337] GntC functional assays
[0338] GntC activity assays were conducted in 50 mM K2HPO4 buffer (pH 8.0) using 1 mM substrate, 100 pM PLP, and 50 pM of purified GntC enzyme. Total reaction volumes were brought to 100 μL of total volume with MilliQ water. Assays were incubated at room temperature overnight, then a 50 μL aliquot was extracted for Marfey’s derivatization before subsequent LC-MS analysis.
[0339] GntD functional assay
[0340] GntD activity assays were conducted in 50 mM K2HPO4 buffer (pH 8.0) using 1 mM substrate, 100 pM FeSCU, 2.5 mM a-ketoglutarate, 50 pM L-ascorbic acid and 50 pM of purified GntD enzyme. Total reaction volumes were brought to a total volume of 100 μL using MilliQ water. Assays were incubated at room temperature for 6 hours and 50 μL aliquots were extracted after 90 minutes and 6 hours, derivatized via Marfey’s reagent and analyzed by LC-MS.
[0341] GntD scale up reactions
[0342] Scaled up reactions for NMR characterization were run using the conditions previously described in 1 mL aliquots and 43 total reactions were set up. Scaled reactions ran for 14 hours at room temperature. The reactions were pooled, quenched with an equivalent amount of HPLC-grade methanol, then centrifuged at 11000 rpm for 15 min. The supernatant was concentrated in vacuo to remove methanol and the remaining aqueous reaction was frozen and lyophilized overnight.
[0343] HPLC Purification of GntD product P-hydroxy-L-enduracididine (4)
[0344] The lyophilized aqueous layer was resuspended in MilliQ water and purified by semi-preparative RP-HPLC (Phenomenex Synergi 4pm Polar RP 80 A, 10 x 250mm) at a flow rate of 3.5 mL/min using the following method: 0.5% B (5 min), 0.5-95% B (1 min), 95% B (4 min), 95%-0.5% B (1 min), 0.5% B (4 min), where A = 0.1% aqueous formic acid, and B = 0.1% formic acid in acetonitrile. The peak containing 4 was manually collected (retention time -3.90 min), concentrated in vacuo and lyophilized. The sample was then further purified by analytical RP-HPLC (Phenomenex Synergi 4pm Polar RP 80 A, 4.6 x 250 mm) at a flow rate of 1.0 mL/min. Compound 4 eluted in 0.5% B (retention time -2.85 min) and was manually collected, concentrated in vacuo and lyophilized to afford the product as a white solid. XH NMR (500 MHz, D2O + 0.1% MeOH): 8 4.34 (ddd, J= 9.7, 6.8, 5.5 Hz, 1H), 4.06 (dd, J = 6.7, 3.1 Hz, 1H), 3.90 (d, J= 3.1 Hz, 1H), 3.83 (dd, J= 10.0, 10.0 Hz, 1H), 3.65 (dd, J= 10.3, 5.5 Hz, 1H); °C NMR (500 MHz, D2O + 0.1% MeOH): 8 171.4, 159.4, 71.5, 56.5, 55.9, 44.9; HRMS (TOF) Calculated for C6H13N4O3 189.0982, found 189.0982 (M+H)+
[0345] GntCDGE dependence assay
[0346] The one pot assay to assess if 2 could be converted to 6 was conducted in 50 mM K2HPO4 buffer (pH 8.0) using 1 mM 2. Cofactors used in this assay included 100 pM PLP, 1 mM aKG, 50 pM L-ascorbic acid, 100 pM FeSO4, and 5 mM L-glutamate, with 40 pM of purified GntC, and 25 pM of purified GntD, GntG, and GntE. Assays were incubated at room temperature overnight and then 50 μL aliquots were extracted for Marfey’s derivatization prior to LC-MS analysis.
[0347] GntE/G functional assay
[0348] GntE/G activity assays were performed in 50 mM BGHPO4 buffer (pH 8.0) using 1 mM substrate 6, 100 pM PLP, 1 mM a-ketoglutarate, 1 mM glycine, and 50 pM purified GntG enzyme and 50 pM purified GntE enzyme. Total reaction volumes were brought to 100 μL using MilliQ water. Assays were incubated at room temperature overnight and then a 50 μL aliquot was extracted for Marfey’s derivatization before subsequent LC-MS analysis.
[0349] GntF : Mmethyl transferase
[0350] GntF activity assays were performed in 50 mM Tris buffer (pH 7.4) using 0.1 mM substrate 6, 1 mM k-adenosylmethionine (SAM), and 20 pM purified GntF enzyme. Total reaction volumes were brought to 500 μL with MilliQ water and incubated at 27 °C for 18 hours. The reaction was quenched with one volume of acetonitrile on ice and filtered at 14000 x g, 4 °C for 10 minutes using both 3 kDa cutoff filters and 0.2 gm filters. The supernatant was then removed and subjected to LC-MS analysis.
[0351] GntA: A-hydroxylase
[0352] GntA activity assays were performed in 50 mM Tris buffer (pH 7.4) using 0.1 mM substrate 7, 5 mM NADPH, and 20 pM purified GntA enzyme. Total reaction volumes were brought to 500 μL with MilliQ water and incubated at 27 °C for 18 hours. The reaction was quenched with one volume of acetonitrile on ice and filtered at 14000 x g, 4 °C for 10 minutes using 0.2 pm filters. The supernatant was then removed and subjected to LC-MS analysis.
[0353] GntI: kinase
[0354] GntI activity assays were performed using the filtrate of the GntA enzymatic assay, sequentially adding 2 mM ATP, 100 mM NaCl, 2 mM MgCh and 20 pM purified GntI enzyme. The reaction volume was brought to 200 μL with 50 mM Tris buffer pH 7.4 and incubated at 37 °C for 30 minutes. The reaction mixture was quenched with one volume of acetonitrile on ice and filtered at 14000 x g, 4 °C for 10 minutes using 0.2 pm filters. Supernatant was then removed and subjected to LC-MS analysis.
[0355] GntJ : (9-methyl transferase
[0356] GntJ activity assays were performed using the filtrate of the GntI enzymatic assay, sequentially adding 1 mM of SAM and 20 pM of GntJ enzyme without the Hise tag. The reaction volume was brought to 100 μL with 50 mM Tris buffer pH 7.4 and incubated at 27 °C for 18 hours. The reaction mixture was quenched with one volume of acetonitrile on ice and filtered at 14000 x g, 4 °C for 10 minutes using 0.2 pm filters. Supernatant was then removed and subjected to LC-MS analysis.
Example 4: Guanitoxin biosynthetic gene cluster (BGC) in environmental sequencing [0357] Initial searches for guanitoxin BGC
[0358] Using the SearchSRA tool49, 64-68 108,465 SRA metagenomic and metatranscriptomic datasets were searched for the guanitoxin biosynthetic gene cluster using a translated nucleotide (diamond) search with the GNT A-J peptide sequences. Hits within the ‘results.zip’ file provided by SearchSRA were filtered to those reads with a >= 90% protein sequence identity using a custom script
(http s : //github . com/ photocyte/ gnt paper/ search SR A workfl ow) .
[0359] Full sensitivity search of metatranscriptomic data for the guanitoxin BGC
[0360] All freshwater Illumina sequencing metatranscriptomic samples on SRA were selected via the following search on the NCBI web interface: “(lake[All Fields] OR pond[All Fields] OR reservoir[All Fields] OR river[All Fields] OR stream[All Fields] OR bay[All Fields] or bog[All Fields] or cyano[All Fields]) AND "ecological metagenomes" [Organism] NOT "marine metagenome" [Organism] NOT "seawater metagenome" [Organism] NOT "marine sediment metagenome" [Organism] AND "biomol rna" [Properties] AND "platform illumina" [Properties]”. These search terms were empirically derived, as there was no single metadata field which applied to all freshwater metatranscriptomic samples. Metadata for these datasets was exported using the NCBI SRA Run Selector tool in CSV format. A brief script was used to filter this CSV file to a list of SRA target accession ids. A custom Nextflow69 workflow (https://github.com/photocyte/gnt_paper/metaT_search_workflow) was used to download and search these datasets. The workflow was run with parameters sra_ids_file target_SRAs.txt — fasta ,/target_genes/gnt_cds.fna -resume -with-trace -with- report”. In brief, SRA reads were downloaded using fasterq-dump. Datasets resulting in .fastq files with a filesize of <160 KB were discarded and not analyzed further. Next, reads were aligned to the gnt BGC genes (gntA-J) coding nucleotide (CDS) regions, using bowtie249 with parameters “—very-sensitive-local — no-unal”. Mapped reads were summarized and processed from the resulting .bam files using a custom Jupyter70 notebook (https://github.com/photocyte/gnt_paper/metaT_search_workflow/jupyter). In brief, reads were parsed into and manipulated using pandas71'72 dataframes, while extensive metadata from the SRA database was re-added using pysradb73. The results of this analysis are shown in Table 3 and FIG. 9B.
[0361] Metatranscriptomic de novo assembly
[0362] De novo assembling of gnt BGCs from the identified metatranscriptomic SRA data was done, using a custom Nextflow69 workflow. The workflow was run with parameters targets SRR5249014;SRR5249015;SRR1601415;SRR1601417;SRR1601416;SRR5834679;SRR583 4616;SRR5194978 —peps gnt_peps.faa — nucl gnt cds.fna -with-trace -resume”. In brief, the workflow downloads SRA reads in .fastq format from each of the SRA datasets which had reads mapping to all 8 gnt CDS sequences (see Full sensitivity search of metatranscriptomic data for the guanitoxin BGC methods; Table 3) using fasterq-dump (NCBI sra-tools v2.10.8) via a bioconda environment74. De novo transcriptome assembly was performed on the downloaded .fastq files using rnaSPAdes v3.14.175 via a bioconda/quay.io Singularity container76. The resulting de novo transcriptome assemblies were filtered to just the gnt BGC contigs using similarity seaches with the gnt CDSs and GNT peptides (externally provided to the workflow) via tblastn/BLAST+77 via a Singularity container. Lastly, Prokka63 via a Singularity container was used to annotate the selected gnt BGCs. Externally provided GNT peptides were provided to Prokka to propagate our standardized gnt BGC naming scheme.
[0363] Metagenome Assembled Genomes & taxonomic identification of GNT BGC source
[0364] For metagenomic datasets, the read adaptors were trimmed, and quality filtered with Cutadapt78. The metagenome sequences were assembled with metaSPAdes79 and the resulting scaffolds were clustered into bins using the automated binning by MetaBAT80. The Metagenome Assembled Genomes (MAGs) were filtered based on its sequence completeness above 70% and contamination below 10% measured by CheckM61. The quality assessment of the assemblies was done with Quast 5.O.260. GNT gene clusters were screened with BLAST+77 using an in-house script.
[0365] Taxonomic classification and phylogenomic analyses were performed with GTDB- Tk81, using the Genome Taxonomy Database (GTDB)82. The pipeline generated the tree through the identification and alignment of 120 bacterial single-copy conserved marker genes, then inferred the phylogeny of the concatenated sequences using FastTree83 with the WAG+GAMMA models and maximum likelihood algorithm. Drawing of trees and annotations were done with iTOL v484 and Inkscape (https://inkscape.org/). The MAGs and its closest related genomes extracted from the NCBI Genbank Database85 had their average nucleotide identities calculated with OrthoANI vl.486.
Example 5: Chemical Synthesis
[0366] Synthesis of primary amine intermediate 6 [0367] A solution of (5)-Garner’s aldehyde (0.50 mL, 2.31 mmol) and dibenzylamine (0.47 mL, 2.43 mmol) in methanol (10 mL) had acetic acid (0.1 mL) added and was stirred for 15 minutes at room temperature. To this solution, sodium cyanob orohydri de (0.145 g, 2.31 mmol) was added over a few minutes and stirred at room temperature under an argon atmosphere for 20 h. The reaction mixture was cooled to 0 °C and an aqueous solution of saturated sodium bicarbonate (25 mL) was added, followed by ethyl acetate (50 mL). The layers were separated and the aqueous layer was further extracted with ethyl acetate (2 x 50 mL). Pooled organic layers were washed with brine (50 mL), dried over magnesium sulfate, filtered and concentrated in vacuo. The crude reaction mixture was purified by silica flash chromatography and eluted over a gradient of 9: 1 to 4: 1 hexanes:ethyl acetate + 0.1% triethylamine. Pooled fractions were concentrated in vacuo, yielding the desired product as a clear light yellow oil (0.514 g, 54%). Rf 0.9 (2: 1 hexane:ethyl acetate); IR (CH2Q2 cast) 3062, 2977, 2935, 2877, 2802, 1695, 1454, 1386, 1253, 1170, 1087, 1027 cm'1; 'H-NMR (CD3OD; 500 MHz, mixture of 2 rotamers) 8 7.36 - 7.28 (m, 8H), 7.26 - 7.20 (m, 2H), 4.04 - 3.99 (m, 0.4H), 3.89 - 3.84 (m, 0.6H), 3.84 - 3.69 (m, 4H), 3.40 (d, J= 13.5 Hz, 1H), 3.32 (d, J= 11.2 Hz, 1H), 2.56 (d, J= 9.8 Hz, 1H), 2.49 (d, J= 12.0 Hz, 1H), 1.50 - 1.46 (m, 10H), 1.40 - 1.36 (m, 5H), 1.31 (s, 1H); 13C-NMR (CD3OD; 125 MHz, mixture of 2 rotamers) 8 153.4, 140.8, 130.2, 130.0, 129.3, 129.3, 128.2, 128.1, 94.8, 81.6, 81.2, 79.5, 67.4, 67.1, 60.3, 59.9, 57.4, 57.1, 56.6, 28.8, 28.7, 28.0, 27.1, 24.6, 23.2; HRMS (ESI) Calculated for C25H35N2O3 411.2642, found 411.2637 (M+H)+.
[0368] FIG. 26 shows the intermediates involved in synthesis of primary amine intermediate 6. A solution of SI-1 (FIG. 26) (0.328 g, 0.80 mmol) in 4 N aqueous HC1 (10 mL) and methanol (1 mL) was stirred at room temperature for 2 h. The reaction mixture was concentrated in vacuo, using water and toluene co-evaporations to remove additional HC1 and water respectively. The crude material was resuspended in toluene (15 mL) and had triethylamine (0.56 mL, 3.99 mmol) and A,A’-di-Boc-U/-pyrazole-l-carboxamidine (0.273 g, 0.88 mmol) sequentially added. The reaction mixture was heated to 55 °C and stirred for 12 h, then was cooled to room temperature, diluted with ethyl acetate (50 mL) and washed with water (2 x 25 mL). The organic layer was dried over magnesium sulfate, filtered, and concentrated in vacuo. The crude reaction mixture was purified by silica flash chromatography and eluted over a gradient of 9: 1 to 4: 1 hexanes:ethyl acetate + 0.1% triethylamine. Pooled fractions were concentrated in vacuo, yielding the desired product as a sticky white foam (0.283 g, 69%). Rf 0.65 (2: 1 hexane:ethyl acetate); IR (CH2Q2 cast) 3416, 2974, 1725, 1638, 1416, 1357, 1147 cm'1; 'H-NMR (CD3OD; 500 MHz) 6 7.90 (s, 1H), 7.38 (d, J= 7.1 Hz, 4H), 7.27 (t, J= 7.5 Hz, 4H), 7.20 (t, J= 13 Hz, 2H), 5.49 (s, 1H), 4.14 (dq, J = 8.7, 4.3 Hz, 1H), 3.77 (d, J= 13.6 Hz, 2H), 3.58 (dd, J= 11.5, 3.6 Hz, 1H), 3.47 (dd, J= 11.5, 4.5 Hz, 1H), 3.42 (d, J= 13.5 Hz, 2H), 2.73 (dd, J= 13.0, 10.3 Hz, 1H), 2.51 (dd, J = 13.1, 4.8 Hz, 1H), 1.61 (s, 9H), 1.42 (s, 9H); °C-NMR (CD3OD; 125 MHz) 8 164.2, 157.6, 154.0, 140.5, 130.1, 129.3, 128.1, 84.5, 80.3, 79.5, 64.2, 59.8, 55.7, 52.5, 28.5, 28.3; HRMS (ESI) Calculated for C28H41N4O5 513.3071, found 513.3068 (M+H)+.
[0369] To a 0 °C solution of SI-2 (FIG. 26) (0.091 g, 0.18 mmol) in dry CH2CI2 (10 mL) was sequentially added DIPEA (62 μL, 0.36 mmol) and methanesulfonyl chloride (15 μL, 0.20 mmol). The reaction mixture was slowly warmed to room temperature over 21 h, then quenched by the addition of saturated NH4CI (20 mL) and diluted with CH2Q2 (10 mL). The layers were separated and the organic layer was further washed with water (20 mL), and brine (20 mL), dried over magnesium sulfate, filtered and concentrated in vacuo. The crude reaction mixture was purified by silica flash chromatography and eluted over a stepwise gradient of 99: 1 to 49: 1 chlorofomrmethanol + 0.1% triethylamine. Pooled fractions were concentrated in vacuo, yielding the desired product as a white solid (0.083 g, 95%). Rf 0.35 (49:1 chlorofornrmethanol + 1 drop triethylamine); IR (CH2CI2 cast) 3412, 2977, 2809, 1754, 1702, 1649, 1604, 1537, 1446, 1373, 1311, 1144 cm'1; 'H-NMR (CD3OD; 500 MHz) 6 7.30 - 7.23 (m, 8H), 7.20 - 7.15 (m, 2H), 3.88 (dd, J= 9.8, 5.5 Hz, 1H), 3.59 - 3.48 (m, 5H), 3.26 - 3.21 (m, 1H), 2.52 (dd, J= 13.0, 5.7 Hz, 1H), 2.41 (dd, J= 12.7, 7.1 Hz, 1H), 1.45 (s, 9H), 1.41 (s, 9H); °C-NMR (CD3OD; 125 MHz) 6 165.3, 152.1, 151.8, 140.6, 130.1, 130.0, 129.4, 128.2, 84.1, 81.2, 60.5, 60.2, 58.5, 53.2, 46.3, 28.7, 28.4, 28.3, 28.2; HRMS (ESI) Calculated for C28H39N4O4 495.2966, found 495.2958 (M+H)+.
[0370] A solution of SI-3 (FIG. 26) (0.045 g, 0.091 mmol) in IN aqueous HC1 (10 mL) was stirred at room temperature for 16 h, then concentrated in vacuo, using water and methanol co-evaporations to remove residual solvents. The crude reaction mixture was resuspended in methanol (10 mL) and had 20% Pd(OH)2 (15 mg, wet) and 10% Pd/C (30 mg) sequentially added. The reaction mixture was bubbled with hydrogen gas using a balloon (1 atm) and monitored by LCMS (Cl 8 RP-HPLC) for consumption of the di-benzylated starting material (consumed after 30 minutes) and the mono-benzylated intermediate (consumed after 22 h overnight incubation). The reaction mixture was filtered through a pad of Celite, rinsed with methanol (30 mL), then 0.1% aqueous acetic acid (30 mL) and concentrated in vacuo. 1 N aqueous HC1 (5 mL) was added to the filtrate to obtain the HC1 salt, then concentrated in vacuo and lyophilized. The desired product was obtained as a light yellow solid (0.017 g, 99%). 'H-NMR (D2O + 0.1% CH3OH; 500 MHz): 64.42 (dddd, J= 10.9, 5.7, 5.7, 5.7 Hz, 1H), 3.94 (t, J= 10.2 Hz, 1H), 3.54 (dd, J= 10.5, 6.0 Hz, 1H), 3.31 - 3.19 (m, 2H); 13C-NMR (D2O + 0.1% CH3OH; 500 MHz): 8 159.8, 52.3, 46.0, 42.1; HRMS (ESI) Calculated for C4H11N4 115.0978, found 115.0976 (M+H)+.
[0371] Synthesis of dimethylamine intermediate 7
[0372] A solution of (S)-Garner’s aldehyde (0.560 g, 2.44 mmol) and dimethylamine (1.28 mL, [2.0 M solution in THF], 2.56 mmol) in methanol (10 mL) had acetic acid (0.1 mL) added and was stirred for 5 minutes at room temperature. To this solution, sodium cyanob orohydri de (0.169 g, 2.69 mmol) was added over a few minutes and stirred at room temperature under an argon atmosphere for 24 h. The reaction mixture was cooled to 0 °C and an aqueous solution of saturated sodium bicarbonate (25 mL) was added, followed by ethyl acetate (50 mL). The layers were separated and the aqueous layer was further extracted with ethyl acetate (2 x 50 mL). Pooled organic layers were washed with brine (50 mL), dried over magnesium sulfate, filtered and concentrated in vacuo. The crude reaction mixture was purified by silica flash chromatography and eluted over a gradient of 50: 1 to 10: 1 ethyl acetate: methanol + 0.1% triethylamine. Pooled fractions were concentrated in vacuo, yielding the desired product as a clear colorless oil (0.313 g, 50%). Rf 0.18 (10: 1 ethyl acetate: methanol + 0.1% triethylamine); IR (CH2CI2 cast) 3433, 2977, 2825, 2773, 1697, 1641, 1461, 1386, 1254, 1173, 1074 cm'1; 'H-NMR (CD3OD; 500 MHz, mixture of 2 rotamers) 8 4.01 - 3.89 (m, 3H), 2.53 - 2.44 (m, 1H), 2.42 - 2.27 (m, 7H), 1.52 (s, 3H), 1.50 - 1.44 (m, 12H); °C-NMR (CD3OD; 125 MHz, mixture of 2 rotamers) 8 153.2, 94.8, 94.6, 81.8, 81.3, 79.5, 67.7, 67.3, 62.8, 62.0, 56.8, 56.6, 46.3, 46.1, 28.8, 28.7, 28.1, 27.3, 24.5, 23.2; HRMS (ESI) Calculated for C13H27N2O3 259.2016, found 259.2014 (M+H)+.
[0373] FIG. 27 shows the intermediates involved in Synthesis of dimethylamine intermediate 7. A solution of SI-4 (FIG. 27) (0.245 g, 0.95 mmol) in 4 N aqueous HC1 (10 mL) and methanol (1 mL) was stirred at room temperature for 2.5 h. The reaction mixture was concentrated in vacuo, using water and toluene co-evaporations to remove additional HC1 and water respectively. The crude material was resuspended in toluene (15 mL) and had triethylamine (0.66 mL, 4.74 mmol) and 7V,A’-di-Boc-U/-pyrazole-l-carboxamidine (0.324 g, 1.04 mmol) sequentially added. The reaction mixture was heated to 55 °C and stirred for 17 h, then was cooled to room temperature, diluted with ethyl acetate (50 mL) and washed with water (2 x 25 mL). The organic layer was dried over magnesium sulfate, filtered, and concentrated in vacuo. The crude reaction mixture was purified by silica flash chromatography and eluted over a stepwise gradient of 50: 1 to 33 : 1 to 20: 1 to 9: 1 ethyl acetate: methanol + 0.1% triethylamine. Pooled fractions were concentrated in vacuo, yielding the desired product as a light yellow oil (0.175 g, 51%). Rf 0.15 (9: 1 ethyl acetate: methanol + 0.1% triethylamine); IR (CH2CI2 cast) 3324, 3139, 2978, 1726, 1639, 1565, 1462, 1415, 1320, 1257, 1161, 1053 cm'1; 'H-NMR (CD3OD; 500 MHz) 6 7.90 (s, 1H), 4.26 - 4.20 (m, 1H), 3.62 (d, J= 4.1 Hz, 2H), 2.58 (dd, J= 12.8, 9.0 Hz, 1H), 2.46 (dd, J= 12.8, 5.3 Hz, 1H), 2.31 (s, 6H), 1.53 (s, 9H), 1.46 (s, 9H); °C-NMR (CD3OD; 125 MHz) 8 164.5, 157.7, 154.1, 84.4, 80.3, 63.5, 61.3, 51.6, 46.0, 28.5, 28.2; HRMS (ESI) Calculated for C16H33N4O5 361.2445, found 361.2440 (M+H)+.
[0374] To a 0 °C solution of SI-5 (FIG. 27) (0.122 g, 0.34 mmol) in dry CH2CI2 (10 mL) was sequentially added DIPEA (118 μL, 0.68 mmol) and methanesulfonyl chloride (29 μL, 0.37 mmol). The reaction mixture was slowly warmed to room temperature over 18 h, then quenched by the addition of saturated NH4CI (20 mL) and diluted with CH2Q2 (10 mL). The layers were separated and the organic layer was further washed with water (20 mL), and brine (20 mL), dried over magnesium sulfate, filtered and concentrated in vacuo. The crude reaction mixture was purified by silica flash chromatography and eluted over a gradient of 20: 1 to 9: 1 chlorofornrmethanol + 0.1% triethylamine. Pooled fractions were concentrated in vacuo, yielding the desired product as a white solid (0.047 g, 41%). Rf 0.4 (9: 1 chloroform:methanol + 0.1% triethylamine); IR (CH2Q2 cast) 3411, 2980, 1643, 1600, 1512, 1474, 1340, 1257, 1164, 1106 cm'1; 'H-NMR (CD3OD; 500 MHz) 8 4.10 (dddd, J = 9.4, 6.7, 6.7, 6.7 Hz, 1H), 3.92 (dd, J= 10.7, 9.4 Hz, 1H), 3.55 (dd, J= 10.6, 6.4 Hz, 1H), 2.60 (dd, J = 12.4, 6.4 Hz, 1H), 2.49 (dd, J= 12.4, 7.1 Hz, 1H), 2.37 (s, 6H), 1.53 (s, 9H), 1.49 (s, 9H); 13C-NMR (CD3OD; 125 MHz) 8 154.7, 152.8, 152.5, 84.5, 81.8, 64.7, 50.0, 49.8, 45.9, 28.4, 28.3; HRMS (ESI) Calculated for C22H45N8O4 485.3558, found 485.3557 (2M-2Boc+H)+. [0375] A solution of SI-6 (FIG. 27) (0.032 g, 0.093 mmol) in IN aqueous HC1 (5 mL) was stirred at room temperature for 14 h, then concentrated in vacuo and lyophilized. The desired product was obtained as a light yellow solid (0.019 g, 95%). 'H-NMR (D2O + 0.1% CH3OH;
500 MHz): 6 4.53 (dddd, J= 9.6, 9.6, 5.3, 5.3 Hz, 1H), 3.96 (dd, J= 10.2, 10.2 Hz, 1H), 3.52 - 3.42 (m, 2H), 3.36 (dd, J= 13.5, 4.8 Hz, 1H), 2.95 (s, 6H); 13C-NMR (D2O + 0.1% CH3OH; 500 MHz): 8 159.9, 60.4, 50.3, 46.9, 44.5, 42.8; HRMS (ESI) Calculated for C6HI5N4 143.1291, found 143.1291 (M+H)+.
[0376] Synthesis of y-hydroxy-L-arginine diastereomers SI-7 and SI-8
[0377] FIG. 28 shows the synthesis of y-hydroxy-L-arginine diastereomers SI-8. This procedure was adapted from a literature reference88. Briefly, a solution of Boc-L-Asp-OtBu (2.008 g, 6.94 mmol) and 1,1’ -carbonyl diimidazole (1.490 g, 6.95 mmol) in nitromethane (40 mL) was stirred at room temperature under Ar gas for 45 minutes. Potassium Zc/V-butoxide (1.558 g, 13.88 mmol) was added at once and the resulting solution was stirred for an additional 4.5 hours at room temperature. The reaction mixture was quenched by the addition of 50% aqueous glacial acetic acid (50 mL), then extracted with ethyl acetate (3 x 60 mL). Pooled organic layers were washed with water (50 mL), saturated aqueous sodium bicarbonate (50 mL), water (50 mL), then brine (50 mL). The organic layer was dried over magnesium sulfate, filtered, and concentrated in vacuo. The crude material was resuspended in methanol (20 mL) then stirred and cooled to 0 °C. Sodium borohydride (0.262 g, 6.94 mmol) was added portion-wise to this solution over 2 minutes at 0 °C, stirred at this temperature for 45 minutes, then slowly warmed to room temperature over 16 hours. The reaction mixture was quenched by the addition of IN aqueous HC1 until a pH of 3, concentrated in vacuo, then resuspended in water (25 mL) and extracted with EtOAc (3 x 25 mL). Pooled organic layers were washed with brine (25 mL), dried over magnesium sulfate, filtered, and concentrated in vacuo. The crude reaction mixture was purified by silica flash chromatography and the two desired diastereomers were eluted over a slow gradient of 50: 1 to 50:3 dichloromethane:diethyl ether. If additional silica flash chromatography was needed to further purify the two diastereomers, an isocratic 18: 1 : 1 toluene: tetrahydrofuran: ethyl acetate eluant system was used. Pooled fractions were concentrated in vacuo, yielding both desired products as flaky white solids. [0378] SI-7 (FIG. 29) (S)-diastereomer (0.628 g, 27%). Rf 0.65 (9:1 CH2Cl2:Et2O); IR (CH2C12 cast) 3412, 2980, 2932, 1695, 1558, 1506, 1369, 1253, 1155 cm'1; 1H NMR (500 MHz, CDCh) 6 5.42 (d, J= 7.5 Hz, 1H), 4.72 (d, J= 4.3 Hz, 1H), 4.50 (dd, J= 12.0, 8.3 Hz, 1H), 4.45 - 4.30 (m, 3H), 1.95 (ddd, J= 13.9, 11.0, 3.1 Hz, 1H), 1.57 - 1.50 (m, 1H), 1.47 (s, 9H), 1.46 (s, 9H); 13C NMR (126 MHz, CDCh) 6 171.1, 157.3, 83.2, 81.4, 80.0, 65.2, 50.7, 38.7, 28.4, 28.1; HRMS (ESI) Calculated for CI4H27N2O7 335.1813, found 335.1831 (M+H)+.
[0379] SI-8 (FIG. 28) (A)-diastereomer (0.579 g, 25%). Rf 0.5 (9: 1 CH2Cl2:Et2O); IR (CH2C12 cast) 3419, 2979, 2918, 1695, 1557, 1384, 1369, 1252, 1155 cm'1; XH NMR (500 MHz, CDCh) 6 5.42 (s, 1H), 4.54 (dt, J= 9.5, 4.8 Hz, 1H), 4.45 (d, J= 5.9 Hz, 2H), 4.27 (d, J= 6.3 Hz, 1H), 3.36 (s, 1H), 2.08 (d, J= 14.2 Hz, 1H), 1.90 (ddd, J= 14.6, 9.0, 6.0 Hz, 1H), 1.48 (s, 9H), 1.45 (s, 9H); 13C NMR (126 MHz, CDCh) 6 171.0, 155.9, 83.1, 80.8, 80.3, 66.3, 51.5, 37.1, 28.4, 28.1; HRMS (ESI) Calculated for CI4H27N2O7 335.1813, found 335.1828 (M+H)+.
[0380] Synthesis of (5)-g-hydroxy-Z-arginine 2
[0381] FIG. 29 shows the intermediates for Synthesis of (S)-y-hydroxy-L-arginine 2. This procedure was adapted from a literature reference88. A solution of SI-7 (FIG. 29) (0.454 g, 1.36 mmol) in MeOH (15 mL) and acetic acid (0.078 mL, 1.36 mmol) had 10% Pd/C (0.145 g, 0.136 mmol) added to it and was sparged with Ar gas, then left stirring under a H2 atmosphere for 15 hours. The reaction mixture was filtered through a pad of Celite and concentrated in vacuo. The crude reaction mixture was resuspended in toluene (20 mL) and had A,7V’-di-Boc-lH-pyrazole-l-carboxamidine (0.464 g, 1.49 mmol) and triethylamine (0.95 mL, 6.79 mmol) added sequentially. The reaction mixture was heated to 55 °C and stirred for 17 hours, then quenched by the addition of a saturated aqueous NH4C1 solution (25 mL). Organic components were extracted with EtOAc (3 x 25 mL), dried over magnesium sulfate, filtered and concentrated in vacuo. The crude reaction mixture was purified by silica flash chromatography using an eluent of 4: 1 hexanes:ethyl acetate + 0.1% triethylamine. Pooled fractions were concentrated in vacuo, yielding the desired product as a clear light yellow oil (0.374 g, 50%). Rf 0.6 (2: 1 hexane:ethyl acetate); IR (CH2C12 cast) 3337, 2980, 2932, 1728, 1645, 1621, 1368, 1155, 1055, 1027 cm'1; XH NMR (500 MHz, CDCh) 8 11.44 (s, 1H), 8.70 (t, J= 5.3 Hz, 1H), 5.46 (d, J= 8.0 Hz, 1H), 4.99 - 4.77 (m, 1H), 4.46 - 4.26 (m, 1H), 3.84 - 3.73 (m, 1H), 3.73 - 3.64 (m, 1H), 3.28 (ddd, J= 13.0, 7.7, 4.1 Hz, 1H), 1.88 (ddd, J= 13.9, 10.7, 3.4 Hz, 1H), 1.59 - 1.53 (m, 1H), 1.49 (s, 9H), 1.48 (s, 9H), 1.46 (s, 9H), 1.44 (s, 9H); 13C NMR (126 MHz, CDCh) 6 171.8, 163.4, 156.9, 156.8, 153.1, 83.3, 82.5, 80.5, 79.5, 77.4, 67.1, 51.3, 46.5, 39.0, 28.4, 28.4, 28.2, 28.1; HRMS (ESI) Calculated for C25H47N4O9 547.3338, found 547.3329 (M+H)+.
[0382] A solution of SI-9 (FIG. 29) (0.037 g, 0.068 mmol) in IN aqueous HC1 (5 mL) was stirred for 3 hours then concentrated in vacuo, using additional water washes to remove excess HC1. The crude reaction mixture was resuspended in water, and an aqueous solution of sodium hydroxide (0.014 g, 0.34 mmol) was added and stirred for 2 hours. The reaction mixture was neutralized by the addition of IN HC1 until pH 7.0, then lyophilized. The desired product was obtained as a white solid alongside sodium chloride and was quantified using 'H NMR with a MeOH internal standard (0.012 g, 93%). 1 H NMR (500 MHz, D2O + 0.1% MeOH) 8 4.19 (dd, J= 5.9, 5.9 Hz, 1H), 3.99 (dddd, J= 8.2, 8.2, 4.1, 4.0 Hz, 1H), 3.34 (dd, J= 14.7, 3.8 Hz, 1H), 3.23 (dd, J= 14.6, 7.3 Hz, 1H), 2.15 - 2.04 (m, 2H); °C-NMR (126 MHz, D2O + 0.1% MeOH) 8 172.2, 157.4, 66.6, 51.2, 46.9, 33.0; HRMS (ESI) Calculated for C6HI5N4O3 191.1139, found 191.1136 (M+H)+.
[0383] Synthesis of L-enduracididine (3)
[0384] FIG. 30 shows the intermediates involved in the Synthesis of L-enduracididine (3). To a 0 °C solution of SI-9 (FIG. 29) (0.246 g, 0.45 mmol) in dry CH2Q2 (10 mL) was sequentially added DIPEA (0.235 mL, 1.35 mmol), then methanesulfonyl chloride (0.038 mL, 0.50 mmol). The resulting solution was stirred at 0 °C then slowly warmed to room temperature over 17 hours. The reaction mixture was diluted with additional CH2Q2 (10 mL), quenched by the addition of saturated aqueous NH4CI (20 mL), then the layers were separated. Organic materials were extracted using subsequent washes with CH2CI2 (2 x 25 mL), then pooled organic layers were washed with brine (25 mL), dried with magnesium sulfate, filtered, and concentrated in vacuo. The crude reaction mixture was purified using silica flash chromatography over a gradient of 2: 1 to 1 : 1 to 1 :2 hexanes:EtOAc + 0.1% triethylamine. Pooled fractions were concentrated in vacuo, yielding the desired product as a sticky clear colorless oil (0.083 g, 35%). Rf 0.2 (1 : 1 hexane:ethyl acetate + 1 drop triethylamine); IR (CH2CI2 cast) 3349, 2979, 2933, 1756, 1713, 1653, 1606, 1532, 1384, 1370, 1142 cm'1; XH NMR (500 MHz, CD3OD) 8 4.29 (t, J= 9.4 Hz, 1H), 4.05 (d, J= 8.5 Hz, 1H), 3.81 (dd, J= 12.6, 9.1 Hz, 1H), 3.54 (dd, J= 12.7, 3.1 Hz, 1H), 2.12 (ddd, J= 11.3, 9.4, 3.6 Hz, 1H), 1.95 (dd, J= 12.3, 12.0 Hz, 1H), 1.56 (s, 9H), 1.49 (s, 9H), 1.47 (s, 9H), 1.44 (s, 9H); 13C-NMR (126 MHz, CD3OD) 6 172.6, 157.8, 152.3, 152.2, 152.2, 84.9, 83.1, 81.6, 80.6, 55.6, 52.3, 49.9, 37.0, 28.8, 28.4, 28.4, 28.2; HRMS (ESI) Calculated for C25H45N4O8 529.3232, found 529.3261 (M+H)+.
[0385] A solution of SI-10 (FIG. 30) (0.067 g, 0.13 mmol) in IN HC1 (10 mL) was stirred for 3 hours and then concentrated in vacuo, using additional water washes to remove excess HC1, then lyophilized. The desired product was obtained as a white solid (0.032 g, quantitative). 'H NMR (500 MHz, D2O + 0.1% MeOH) 8 4.34 (dddd, J= 6.8, 6.8, 6.1, 6.1 Hz, 1H), 4.08 (dd, J= 92, 5.6 Hz, 1H), 3.88 (dd, J= 9.7, 9.7 Hz, 1H), 3.42 (dd, J= 10.1, 6.0 Hz, 1H), 2.26 (ddd, J= 14.8, 8.8, 5.8 Hz, 1H), 2.18 (ddd, J= 13.9, 6.7, 6.7 Hz, 1H); 13C- NMR (126 MHz, D2O + 0.1% MeOH) 8 171.4, 159.6, 52.0, 50.0, 48.3, 35.5; HRMS (ESI) Calculated for C6HI3N4O2 173.1033, found 173.1030 (M+H)+.
[0386] Synthesis of (A)-g-hydroxy-L-arginine (SI-12)
[0387] This procedure was adapted from a literature reference88. A solution of SI-8 (0.200 g, 0.60 mmol) in MeOH (10 mL) and acetic acid (0.034 mL, 0.60 mmol) had 10% Pd/C (0.064 g, 0.060 mmol) added to it and was sparged with Ar gas, then left stirring under a H2 atmosphere for 15 hours. The reaction mixture was filtered through a pad of Celite and concentrated in vacuo. The crude reaction mixture was resuspended in toluene (20 mL) and had 7V,7V’-di-Boc-lH-pyrazole-l-carboxamidine (0.204 g, 0.66 mmol) and triethylamine (0.42 mL, 2.99 mmol) added sequentially. The reaction mixture was heated to 55 °C and stirred for 17 hours, then quenched by the addition of a saturated aqueous NH4CI solution (25 mL). Organic components were extracted with EtOAc (3 x 25 mL), dried over magnesium sulfate, filtered and concentrated in vacuo. The crude reaction mixture was purified by silica flash chromatography using an eluent of 4: 1 hexanes:ethyl acetate + 0.1% triethylamine. Pooled fractions were concentrated in vacuo, yielding the desired product as a clear light yellow oil (0.142 g, 43%). Rf 0.6 (2: 1 hexane:ethyl acetate); IR (CH2C12 cast) 3339, 2980, 2920, 1726, 1645, 1621, 1368, 1157, 1054, 1027 cm'1; XH NMR (500 MHz, CDCI3) 8 11.42 (s, 1H), 8.69 (s, 1H), 5.48 - 5.38 (m, 1H), 5.22 (s, 1H), 4.21 (d, J= 7.9 Hz, 1H), 3.92 (d, J= 9.0 Hz, 1H), 3.59 (dd, J= 15.5, 5.9 Hz, 1H), 3.43 (dd, J= 13.5, 6.7 Hz, 1H), 1.99 - 1.92 (m, 1H), 1.86 (ddd, J= 14.4, 8.7, 6.4 Hz, 1H), 1.49 (s, 9H), 1.46 (s, 18H), 1.44 (s, 9H); 13C NMR (126 MHz, CDCL) 8 171.7, 162.9, 157.7, 155.8, 153.1, 83.7, 82.2, 80.0, 79.7, 69.2, 52.1, 47.5, 37.6, 28.5, 28.3, 28.2, 28.1; HRMS (ESI) Calculated for C25H47N4O9 547.3338, found 547.3328 (M+H)+.
[0388] FIG. 31 shows the intermediates involved in the Synthesis of (A)-y-hydroxy-L- arginine (SI-12). A solution of SI-11 (FIG. 31) (0.034 g, 0.062 mmol) in IN aqueous HC1 (5 mL) was stirred for 3 hours then concentrated in vacuo, using additional water washes to remove excess HC1. The crude reaction mixture was resuspended in water, and an aqueous solution of sodium hydroxide (0.012 g, 0.31 mmol) was added and stirred for 2 hours. The reaction mixture was neutralized by the addition of IN HC1 until pH 7.0, then lyophilized. The desired product was obtained as a white solid alongside sodium chloride and was quantified using 1 H NMR with a MeOH internal standard (0.010 g, 88%). 1 H NMR (500 MHz, D2O + 0.1% MeOH) 6 4.08 (ddd, 7.4, 7.2, 3.6 Hz, 1H), 3.83 (dd, J= 8.1, 5.3 Hz, 1H), 3.33 (dd, J= 14.9, 3.1 Hz, 1H), 3.21 (dd, J= 14.6, 7.4 Hz, 1H), 2.10 (ddd, J= 14.9, 5.3, 2.9 Hz, 1H), 1.83 (dt, J= 14.8, 9.3 Hz, 1H); °C-NMR (126 MHz, D2O + 0.1% MeOH) 8 174.3, 157.3, 68.4, 53.5, 47.0, 34.1; HRMS (ESI) Calculated for C6H15N4O3 191.1139, found 191.1135 (M+H)+.
[0389] Synthesis of L-a/to-enduracididine (SI-14)
[0390] To a 0 °C solution of SI-11 (FIG. 31) (0.124 g, 0.23 mmol) in dry CH2CI2 (10 mL) was sequentially added DIPEA (0.12 mL, 0.68 mmol), then methanesulfonyl chloride (0.019 mL, 0.25 mmol). The resulting solution was stirred at 0 °C for 2 hours, then slowly warmed to room temperature over 17 hours. The reaction mixture was diluted with additional CH2Q2 (10 mL), quenched by the addition of saturated aqueous NH4CI (20 mL), then the layers were separated. Organic materials were extracted using subsequent washes with CH2CI2 (2 x 25 mL), then pooled organic layers were washed with brine (25 mL), dried with magnesium sulfate, filtered, and concentrated in vacuo. The crude reaction mixture was purified using silica flash chromatography over a gradient of 4: 1 to 1 : 1 hexanes:EtOAc + 0.1% triethylamine. Pooled fractions were concentrated in vacuo, yielding the desired product as a sticky clear colorless oil (0.094 g, 78%). Rf 0.2 (1 : 1 hexane:ethyl acetate + 1 drop triethylamine); IR (CH2CI2 cast) 3366, 2979, 2929, 1747, 1714, 1606, 1539, 1384, 1369, 1311, 1252, 1149 cm'1; XH NMR (500 MHz, CD3OD) 84.30 (dddd, J= 12.2, 8.3, 3.3, 3.1 Hz, 1H), 4.09 (dd, J= 8.9, 5.4 Hz, 1H), 3.84 (dd, J= 12.8, 9.2 Hz, 1H), 3.55 (dd, J= 12.8, 3.5 Hz, 1H), 2.23 (ddd, J= 13.9, 5.4, 3.0 Hz, 1H), 1.90 (ddd, J= 13.9, 8.8, 8.8 Hz, 1H), 1.56 (s, 9H), 1.49 (s, 9H), 1.48 (s, 9H), 1.45 (s, 9H); 13C-NMR (126 MHz, CD3OD) 8 172.7, 157.8,
152.4, 152.4, 152.4, 85.0, 83.1, 81.8, 80.6, 56.4, 53.3, 37.2, 28.8, 28.4, 28.4, 28.3; HRMS (ESI) Calculated for C25H45N4O8 529.3232, found 529.3266 (M+H)+.
[0391] FIG. 32 shows the intermediates involved in Synthesis of L-allo-enduracididine (SI- 14). A solution of SI-13 (FIG. 32) (0.069 g, 0.13 mmol) in IN HC1 (10 mL) was stirred for 3 hours and then concentrated in vacuo, using additional water washes to remove excess HC1, then lyophilized. The desired product was obtained as a white solid (0.032 g, quantitative). 'H NMR (500 MHz, D2O + 0.1% MeOH) 8 4.31 (ddd, J= 13.4, 6.6 Hz, 1H), 4.09 (dd, J= 7.0 Hz, 1H), 3.89 (dd, J= 9.7 Hz, 1H), 3.44 (dd, J= 10.0, 6.4 Hz, 1H), 2.35 (ddd, J= 14.5, 7.2 Hz, 1H), 2.14 (ddd, J= 14.0, 6.5 Hz, 1H); °C-NMR (126 MHz, D2O + 0.1% MeOH) 8
171.5, 159.5, 52.2, 50.2, 48.0, 35.2; HRMS (ESI) Calculated for C6HI3N4O2 173.1033, found 173.1038 (M+H)+.
Example 5: Primer Design
[0392] Primers which targeted conserved regions of guanitoxin biosynthetic genes were designed as follows:
[0393] Allelic CDS sequences (e.g. coding sequences) for guanitoxin biosynthetic genes (3-5 alleles per gene) were harvested from assemblies of the Gnt biosynthetic gene cluster (BGC) from public environmental freshwater metagenomic and metatranscriptomic sequencing datasets. These datasets were selected, as they had initial hits for the Gnt BGC via the SearchSRA tool (https://www.searchsra.org/). The datasets included (alongside the original Gnt CDS sequences from ITEP-024) are shown in Table 10.
[0394] These assemblies were created from the datasets with SPAdes or rnaSPAdes, respectively (https://github.com/ablab/spades). Genes were annotated within these assemblies with Prokka (https://github.com/tseemann/prokka), and the Gnt genes found within this gene annotation using blastp or tblastn via Sequence Server (https://sequenceserver.com). The multiple CDSs per gene were input into Primer-BLAST (https://www.ncbi.nlm.nih.gov/tools/primer-blast/), under the “Primers common for a group of sequences" mode, with default parameters excepting the primer melting temperatures (Tm) were: Min = 55°C, Max = 65°C, and Max Tm difference = 5° . If Primer-BLAST was unable to find a suitable primer pair, the allele with the least sequence identity to the canonical Gnt gene from Sphaerospermopsis torques-reginae ITEP-024 was removed, and Primer-BLAST was rerun with success.
TABLES
[0395] Table
Figure imgf000115_0001
table for compounds 1 - 4 and 6 - 8
Figure imgf000115_0002
* from (45); spectra ran in D2O + 1% acetic acid-t/7 all other spectra were collected in D2O + 0.1 % CH3OH and referenced to methanol (5 3.34)
[0396] Table 2. 13C NMR table for compounds 1 - 4 and 6 - 8
Figure imgf000115_0003
* from (45); spectra ran in D2O + 1% acetic acid-t/7 all other spectra were collected in D2O + 0.1 % CH3OH and referenced to methanol (5 49.0) [0397] Table 3. anti SMASH annotation of the Sphaerospermopsis torques-reginae ITEP- 024 genome. In total, 11 candidate BGC clusters were detected by antiSMASH. Notably, one location overlapped with the previously characterized anabaenopeptin and spumigin BGCs45. Tl-PKS = type-1 PKS, hglE-KS = heterocyte glycolipid synthase-like PKS.
Figure imgf000116_0001
[0398] Table 4. Sequence Read Archive (SRA) metagenomic and metatranscriptomic datasets with reads that match the gut BGC. AS=alignment score, geo loc name, env biome, env feature, lat lon, collection date, represents data shown in the NCBI SRA metadata field of identical name.
Figure imgf000117_0001
Figure imgf000118_0001
Figure imgf000119_0001
Figure imgf000120_0001
Figure imgf000121_0001
Figure imgf000122_0001
Figure imgf000123_0001
Figure imgf000124_0001
[0399] Table 5. Primer sequences
Figure imgf000124_0002
[0400] Table 6. Primer Sequences
Figure imgf000124_0003
Figure imgf000125_0001
Figure imgf000126_0001
[0401] Table 7. Nucleic acid sequences of guanitoxin biosynthetic genes
Figure imgf000126_0002
Figure imgf000127_0001
Figure imgf000128_0001
Figure imgf000129_0001
Figure imgf000130_0001
Figure imgf000131_0001
Figure imgf000132_0001
Figure imgf000133_0001
Figure imgf000134_0001
Figure imgf000135_0001
Figure imgf000136_0001
Figure imgf000137_0001
Figure imgf000138_0001
Figure imgf000139_0001
Figure imgf000140_0001
Figure imgf000141_0001
Figure imgf000142_0001
Figure imgf000143_0001
Figure imgf000144_0001
Figure imgf000145_0001
[0402] Table 8. Protein sequences of enzymes in the guanitoxin biosynthetic pathway
Figure imgf000145_0002
Figure imgf000146_0001
Figure imgf000147_0001
[0403] Table 9. Proposed functions and similarity for proteins in the Gnt biosynthetic gene cluster (BGC). Gnt protein sequences were compared by Basic Local Alignment Search Tool (BLAST) against publicly available data.
Figure imgf000148_0003
[0404] Table 10. Metagenomic and metatranscriptomic sequencing datasets for Gnt biosynthetic gene cluster (BGC)
Description
Figure imgf000148_0001
Dataset identifier
Figure imgf000148_0002
Figure imgf000149_0001
REFERENCES
[0405] (1) Ho, J. C.; Michalak, A. M.; Pahlevan, N. Widespread global increase in intense lake phytoplankton blooms since the 1980s. Nature 2019, 574 (7780), 667 670.
[0406] (2) Huisman, J.; Codd, G. A.; Paerl, H. W.; Ibelings, B. W.; Verspagen, J. M. H.; Visser, P. M. Cyanob acteri al blooms. Nat. Rev. Microbiol. 2018, 16 (8), 471 483.
[0407] (3) Plaas, H. E.; Paerl, H. W. Toxic cyanobacteria: A growing threat to water and air quality. Environ. Sci. Technol. 2021, 55 (1), 44 64.
[0408] (4) Jochimsen, E. M.; Cookson, S. T. Liver failure and death after exposure to microcystins at a hemodialysis center in Brazil. N. Engl. J. Med. 1998, 338, 873-878.
[0409] (5) Mishra, D. R.; Kumar, A.; Ramaswamy, L.; Boddula, V. K.; Das, M. C.; Page, B. P.; Weber, S. J. CyanoTRACKER: A cloud-based integrated multi -platform architecture for global observation of cyanob acteri al harmful algal blooms. Harmful Algae 2020, 96, 101828.
[0410] (6) Rousso, B. Z.; Bertone, E.; Stewart, R.; Hamilton, D. P. A systematic literature review of forecasting and predictive models for cyanobacteria blooms in freshwater lakes. Water Res. 2020, 182, 115959.
[0411] (7) Seegers, B. N.; Werdell, P. J.; Vandermeulen, R. A.; Salls, W.; Stumpf, R. P.; Schaeffer, B. A.; Owens, T. J.; Bailey, S. W.; Scott, J. P.; Loftin, K. A. Satellites for long-term monitoring of inland U.S. lakes: The MERIS time series and application for chlorophyll-a. Remote Sens. Environ. 2021, 266, 112685.
[0412] (8) Baker, L.; Sendall, B. C.; Gasser, R. B.; Menjivar, T.; Neilan, B. A.; Jex, A. R. Rapid, multiplex-tandem PCR assay for automated detection and differentiation of toxigenic cyanob acteri al blooms. Mol. Cell. Probes 2013, 27 (5 6), 208 214. [0413] (9) Tillett, D.; Dittmann, E.; Erhard, M.; von Dohren, H.; Borner, T.; Neilan, B. A. Structural organization of microcystin biosynthesis in Microcystis aeruginosa PCC7806: An integrated peptide polyketide synthetase system. Chem. Biol. 2000, 7 (10), 753 764.
[0414] (10) Mihali, T. K.; Kellmann, R.; Muenchhoff, J.; Barrow, K. D.; Neilan, B. A. Characterization of the gene cluster responsible for cylindrospermopsin biosynthesis. Appl. Environ. Microbiol. 2008, 74 (3), 716 722.
[0415] (11) Kellmann, R.; Mihali, T. K.; Jeon, Y. J.; Pickford, R.; Pomati, F.; Neilan, B. A. Biosynthetic intermediate analysis and functional homology reveal a saxitoxin gene cluster in cyanobacteria. Appl. Environ. Microbiol. 2008, 74 (13), 4044 4053.
[0416] (12) Mejean, A.; Mann, S.; Maldiney, T.; Vassiliadis, G.; Lequin, O.; Ploux, O. Evidence that biosynthesis of the neurotoxic alkaloids anatoxin-a and homoanatoxin-a in the cyanobacterium Oscillatoria PCC 6506 occurs on a modular polyketide synthase initiated by L-proline. J. Am. Chem. Soc. 2009, 131 (22), 7512 7513.
[0417] (13) Fiore, M. F.; de Lima, S. T.; Carmichael, W. W.; McKinnie,S. M. K.; Chekan, J. R.; Moore, B. S. Guanitoxin, re-naming a cyanobacterial organophosphate toxin. Harmful Algae 2020, 92, 101737.
[0418] (14) Hyde, E. G.; Carmichael, W. W. Anatoxin-a(s), a naturally occurring organophosphate, is an irreversible active site- directed inhibitor of acetylcholinesterase (EC 3.1.1.7). J. Biochem. Toxicol. 1991, 6 (3), 195 201.
[0419] (15) Carmichael, W. W. Health effects of toxin-producing cyanobacteria: "The CyanoHABs." Hum. Ecol. Risk Assess. Int. J. 2001, 7 (5), 1393 1407.
[0420] (16) Henriksen, P.; Carmichael, W. W.; An, J.; Moestrup, 0. Detection of an anatoxin-a(s)-like anticholinesterase in natural blooms and cultures of cyanobacteria/blue- green algae from Danish lakes and in the stomach contents of poisoned birds. Toxicon 1997, 35 (6), 901 913.
[0421] (17) Chatziefthimiou, A. D.; Richer, R.; Rowles, H.; Powell, J. T.; Metcalf, J. S. Cyanotoxins as a potential cause of dog poisonings in desert environments. Vet. Rec. 2014, 174 (19), 484 485. [0422] (18) Blin, K.; Shaw, S.; Kloosterman, A. M.; Charlop-Powers, Z.; van Wezel, G. P.; Medema, M. H.; Weber, T. antiSMASH 6.0: improving cluster detection and comparison capabilities. Nucleic Acids Res. 2021, 49 (Wl), W29 W35.
[0423] (19) Krzywinski, M.; Schein, J.; Birol, t.; Connors, J.; Gascoyne, R.; Horsman, D.; Jones, S. J.; Marra, M. A. Circos: An information aesthetic for comparative genomics. Genome Res. 2009, 19 (9), 1639 1645.
[0424] (20) Gilchrist, C. L. M.; Chooi, Y.-H. Clinker & clustermap .js: Automatic generation of gene cluster comparison figures. Bioinformatics 2021, 37 (16), 2473 2475.
[0425] (21) Carmichael, W. W.; Gorham, P. R. Anatoxins from clones of Anabaena flos- aquae isolated from lakes of western Canada. Int. Ver. Fur Theor. Angew. Limnol. 1978, 21 (1), 285 295.
[0426] (22) Matsunaga, S.; Moore, R. E.; Niemczura, W. P.; Carmichael, W. W. Anatoxin- a(s), a potent anticholinesterase from Anabaena flos-aquae. J. Am. Chem. Soc. 1989, 111 (20), 8021 8023.
[0427] (23) Fernandes, K.; Dorr, F. A.; Pinto, E. Stability analyses by HPLC-MS of guanitoxin isolated from Sphaerospermopsis torques-reginae. J. Braz. Chem. Soc. 2021, 32, 1559-1567.
[0428] (24) Moore, B. S.; Ohtani, I.; de Koning, C. B.; Moore, R. E.; Carmichael, W. W. Biosynthesis of anatoxin-a(s). Origin of the carbons. Tetrahedron Lett. 1992, 33 (44), 6595 6598.
[0429] (25) Hemscheidt, T.; Burgoyne, D. L.; Moore, R. E. Biosynthesis of anatoxin-a(s). (2S,4S)-4-Hydroxyarginine as an intermediate. J. Chem. Soc. Chem. Commun. 1995, 2, 205- 206.
[0430] (26) Lima, S. T.; Alvarenga, D. O.; Etchegaray, A.; Fewer, D. P.; Jokela, J.; Varani, A. M.; Sanz, M.; Dorr, F. A.; Pinto, E.; Sivonen, K.; Fiore, M. F. Genetic organization of anabaenopeptin and spumigin biosynthetic gene clusters in the cyanobacterium Sphaerospermopsis torques-reginae ITEP-024. ACS Chem. Biol. 2017, 12 (3), 769 778. [0431] (27) Molica, R. J. R.; Oliveira, E. J. A.; Carvalho, P. V. V. C.; Costa, A. N. S. F.; Cunha, M. C. C.; Melo, G. L.; Azevedo, S. M. F. O. Occurrence of saxitoxins and an anatoxin- a(s)-like anticholinesterase in a Brazilian drinking water supply. Harmful Algae 2005, 4 (4), 743 753.
[0432] (28) He, H.; Williamson, R. T.; Shen, B.; Graziani, E. I.; Yang, H. Y.; Sakya, S. M.; Petersen, P. J.; Carter, G. T. Mannopeptimycins, novel antibacterial glycopeptides from Streptomyces hygroscopicus, LL-AC98. J. Am. Chem. Soc. 2002, 124 (33), 9729 9736.
[0433] (29) Ling, L. L.; Schneider, T.; Peoples, A. J.; Spoering, A. L.; Engels, I.; Conlon, B. P.; Mueller, A.; Schaberle, T. F.; Hughes, D. E.; Epstein, S.; Jones, M.; Lazarides, L.; Steadman, V. A.; Cohen, D. R.; Felix, C. R.; Fetterman, K. A.; Millett, W. P.; Nitti, A. G.; Zullo, A. M.; Chen, C.; Lewis, K. A new antibiotic kills pathogens without detectable resistance. Nature 2015, 517 (7535), 455 459.
[0434] (30) Asai, M.; Muroi, M.; Sugita, N.; Kawashima, H.; Mizuno, K.; Miyake, A. Enduracidin, a new antibiotic. II Isolation and characterization. J. Antibiot. (Tokyo) 1968, 21 (2), 138 146.
[0435] (31) Han, L.; Schwabacher, A. W.; Moran, G. R.; Silvaggi, N. R. Streptomyces wadayamensis MppP is a pyridoxal 5'- phosphate-dependent L-arginine a-deaminase, y- hydroxylase in the enduracididine biosynthetic pathway. Biochemistry 2015, 54 (47), 7029 7040.
[0436] (32) Han, L.; Vuksanovic, N.; Oehm, S. A.; Fenske, T. G.; Schwabacher, A. W.; Silvaggi, N. R. Streptomyces wadayamensis MppP is a PLP-dependent oxidase, not an oxygenase. Biochemistry 2018, 57 (23), 3252 3264.
[0437] (33) Burroughs, A. M.; Hoppe, R. W.; Goebel, N. C.; Sayyed, B. H.; Voegtline, T. J.; Schwabacher, A. W.; Zabriskie, T. M.; Silvaggi, N. R. Structural and functional characterization of MppR, an enduracididine biosynthetic enzyme from Streptomyces hygroscopicus: Functional diversity in the acetoacetate decarboxylase-like superfamily. Biochemistry 2013, 52 (26), 4492 4506.
[0438] (34) Nayfach, S.; Roux, S.; Seshadri, R.; Udwary, D.; Varghese, N.; Schulz, F.; Wu, D.; Paez-Espino, D.; Chen, I.-M.; Huntemann, M.; Palaniappan, K.; Ladau, J.; Mukherjee, S.; Reddy, T. B. K.; Nielsen, T.; Kirton, E.; Faria, J. P.; Edirisinghe, J. N.; Henry, C. S.; Jungbluth, S. P.; Chivian, D.; Dehal, P.; Wood-Charlson, E. M.; Arkin, A. P.; Fringe, S. G.; Visel, A.; IMG/M Data Consortium; Woyke, T.; Mouncey, N. J.; Ivanova, N. N.; Kyrpides, N. C.; Eloe- Fadrosh, E. A. A genomic catalog of arth's microbiomes. Nat. Biotechnol. 2021, 39 (4), 499 509.
[0439] (35) Du, Y.-L.; Ryan, K. S. Pyridoxal phosphate-dependent reactions in the biosynthesis of natural products. Nat. Prod. Rep. 2019, 36 (3), 430 457.
[0440] (36) Rudolph, J.; Hannig, F.; Theis, H.; Wischnat, R. Highly efficient chiral-pool synthesis of (2S,4R)-4-hydroxyornithine. Org. Lett. 2001, 3 (20), 3153 3155.
[0441] (37) Giltrap, A. M.; Dowman, L. J.; Nagalingam, G.; Ochoa, J. L.; Linington, R. G.; Britton, W. J.; Payne, R. J. Total synthesis of teixobactin. Org. Lett. 2016, 18 (11), 2788 2791.
[0442] (38) Yin, X.; Zabriskie, T. M. VioC is a non-heme iron, a- ketoglutarate-dependent oxygenase that catalyzes the formation of 3S-hydroxy-L-arginine during viomycin biosynthesis. ChemBioChem 2004, 5 (9), 1274 1277.
[0443] (39) Ju, J.; Ozanick, S. G.; Shen, B.; Thomas, M. G. Conversion of (2S)-arginine to (2S,3R)-capreomycidine by VioC and VioD from the viomycin biosynthetic pathway of Streptomyces sp. strain ATCC11861. ChemBioChem 2004, 5 (9), 1281 1285.
[0444] (40) Yin, X.; McPhail, K. L.; Kim, K.; Zabriskie, T. M. Formation of the nonproteinogenic amino cid 2S,3R-capreomycidine by VioD from the viomycin biosynthesis pathway. ChemBioChem 2004, 5 (9), 1278 1281.
[0445] (41) Chang, C.-Y.; Lyu, S.-Y.; Liu, Y.-C.; Hsu, N.-S.; Wu, C.-C.; Tang, C.-F.; Lin, K.-H.; Ho, J.-Y.; Wu, C.-J.; Tsai, M.-D.; Li, T.-L. Biosynthesis of streptolidine involved two unexpected intermediates produced by a dihydroxylase and a cyclase through unusual mechanisms. Angew. Chem. Int. Ed. 2014, 53 (7), 1943 1948.
[0446] (42) Haltli, B.; Tan, Y.; Magarvey, N. A.; Wagenaar, M.; Yin, X.; Greenstein, M.; Hucul, J. A.; Zabriskie, T. M. Investigating B-hydroxyenduracididine formation in the biosynthesis of the mannopeptimycins. Chem. Biol. 2005, 12 (11), 1163 1168. [0447] (43) Kumagai, T.; Takagi, K.; Koyama, Y.; Matoba, Y.; Oda, K.; Noda, M.; Sugiyama, M. Heme protein and hydroxyarginase necessary for biosynthesis of D-cycloserine. Antimicrob. Agents Chemother. 2012, 56 (7), 3682 3689.
[0448] (44) ITRC (Interstate Technology & Regulatory Council), Strategies for Preventing and Managing Harmful Cyanob acteri al Blooms (Interstate Technology & Regulatory Council, HCB Team, Washington D.C., 2021; www.itrcweb.org),
[0449] (45) S. T. Lima, D. O. Alvarenga, A. Etchegaray, D. P. Fewer, J. Jokela, A. M. Varani, M. Sanz, F. A. Dorr, E. Pinto, K. Sivonen, M. F. Fiore, Genetic organization of anabaenopeptin and spumigin biosynthetic gene clusters in the cyanobacterium Sphaerospermopsis torques-reginae ITEP-024. ACS Chem. Biol. 12, 769-778 (2017).
[0450] (46) P. R. Gorham, J. McLachlan, U. T. Hammer, W. K. Kim, Isolation and culture of toxic strains of Anabaena flos-aquae (Lyngb.) de Breb. Int. Ver. Fur Theor. Angew. Limnol. Verhandlungen. 15, 796-804 (1964).
[0451] (47) F. A. Dorr, V. Rodriguez, R. Molica, P. Henriksen, B. Krock, E. Pinto, Methods for detection of anatoxin-a(s) by liquid chromatography coupled to electrospray ionizationtandem mass spectrometry. Toxicon. 55, 92-99 (2010).
[0452] (48) K. Okonechnikov, A. Conesa, F. Garcia-Alcalde, Qualimap 2: advanced multisample quality control for high-throughput sequencing data. Bioinformatics. 32, 292-294 (2016).
[0453] (49) B. Langmead, S. L. Salzberg, Fast gapped-read alignment with Bowtie 2. Nat Methods. 9, 357-359 (2012).
[0454] (50) W. De Coster, S. D’Hert, D. T. Schultz, M. Cruts, C. Van Broeckhoven, NanoPack: visualizing and processing long-read sequencing data. Bioinformatics. 34, 2666- 2669 (2018).
[0455] (51) R. Schmieder, R. Edwards, Quality control and preprocessing of metagenomic datasets. Bioinformatics. 27, 863-864 (2011). [0456] (52) S. Koren, B. P. Walenz, K. Berlin, J. R. Miller, N. H. Bergman, A. M. Phillippy, Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722-736 (2017).
[0457] (53) E. Haghshenas, F. Hach, S. C. Sahinalp, C. Chauve, CoLoRMap: Correcting long reads by mapping short reads. Bioinforma. Oxf. Engl. 32, i545— i551 (2016).
[0458] (54) R. R. Wick, L. M. Judd, C. L. Gorrie, K. E. Holt, Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads. PLOS Comput. Biol. 13, el005595 (2017).
[0459] (55) A. V. Zimin, D. Puiu, M.-C. Luo, T. Zhu, S. Koren, G. Marcais, J. A. Yorke, J. Dvorak, S. L. Salzberg, Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm. Genome Res., 27, 787-792 (2017).
[0460] (56) M. Boetzer, C. V. Henkel, H. J. Jansen, D. Butler, W. Pirovano, Scaffolding pre-assembled contigs using SSPACE. Bioinforma. Oxf. Engl. 27, 578-579 (2011).
[0461] (57) B. J. Walker, T. Abeel, T. Shea, M. Priest, A. Abouelliel, S. Sakthikumar, C. A. Cuomo, Q. Zeng, J. Wortman, S. K. Young, A. M. Earl, Pilon: An integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLOS ONE. 9, el 12963 (2014).
[0462] (58) M. Boetzer, W. Pirovano, Toward almost closed genomes with GapFiller. Genome Biol. 13, R56 (2012).
[0463] (59) M. Kolmogorov, J. Yuan, Y. Lin, P. A. Pevzner, Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540-546 (2019).
[0464] (60) A. Gurevich, V. Saveliev, N. Vyahhi, G. Tesler, QUAST: quality assessment tool for genome assemblies. Bioinformatics. 29, 1072-1075 (2013).
[0465] (61) D. H. Parks, M. Imelfort, C. T. Skennerton, P. Hugenholtz, G. W. Tyson, CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043-1055 (2015). [0466] (62) M. Hunt, N. D. Silva, T. D. Otto, J. Parkhill, J. A. Keane, S. R. Harris, Circlator: automated circularization of genome assemblies using long sequencing reads. Genome Biol. 16, 294 (2015).
[0467] (63) T. Seemann, Prokka: rapid prokaryotic genome annotation. Bioinformatics. 30, 2068-2069 (2014).
[0468] (64) K. Levi, M. Rynge, E. Abeysinghe, R. A. Edwards, in Proceedings of the Practice and Experience on Advanced Research Computing (ACM, Pittsburgh PA LISA, 2018; https://dl.acm.org/doi/10.1145/3219104.3229278), pp. 1-7.
[0469] (65) P. J. Torres, R. A. Edwards, K. A. McNair, PARTIE: a partition engine to separate metagenomic and amplicon projects in the Sequence Read Archive. Bioinformatics. 33, 2389-2391 (2017).
[0470] (66) J. Towns, T. Cockerill, M. Dahan, I. Foster, K. Gaither, A. Grimshaw, V. Hazlewood, S. Lathrop, D. Lifka, G. D. Peterson, R. Roskies, J. R. Scott, N. Wilkins-Diehr, XSEDE: Accelerating scientific discovery. Comput. Sci. Eng. 16, 62-74 (2014).
[0471] (67) C. A. Stewart, T. M. Cockerill, I. Foster, D. Hancock, N. Merchant, E. Skidmore, D. Stanzione, J. Taylor, S. Tuecke, G. Turner, M. Vaughn, N. I. Gaffney, in Proceedings of the 2015 XSEDE Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure (ACM, 2015; https://dl.acm. org/citation.cfm?doid=2792745.2792774), p. 29.
[0472] (68) B. Buchfink, C. Xie, D. H. Huson, Fast and sensitive protein alignment using DIAMOND. Nat. Methods. 12, 59-60 (2015).
[0473] (69) P. Di Tommaso, M. Chatzou, E. W. Floden, P. P. Barja, E. Palumbo, C. Notredame, Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35, 316-319 (2017).
[0474] (70) T. Kluyver, B. Ragan-Kelley, F. Perez, B. Granger, M. Bussonnier, J. Frederic, K. Kelley, J. Hamrick, J. Grout, S. Corlay, P. Ivanov, D. Avila, S. Abdalla, C. Willing, J. development team, in Positioning and Power in Academic Publishing: Players, Agents and Agendas, F. Loizides, B. Scmidt, Eds. (IOS Press, 2016; https://eprints.soton.ac.uk/403913/), pp. 87-90. [0475] (71) The pandas development team, pandas-dev/pandas: Pandas 1.0.3 (Zenodo, 2020; https://zenodo.org/record/3715232).
[0476] (72) W. McKinney, in Proceedings of the 9th Python in Science Conference, S. van der Walt, J. Millman, Eds. (2010), pp. 56-61.
[0477] (73) S. Choudhary, pysradb: A Python package to query next-generation sequencing metadata and data from NCBI Sequence Read Archive (2019), doi:10.12688/fl000research,18676.1.
[0478] (74) B. Griming, R. Dale, A. Sjodin, B. A. Chapman, J. Rowe, C. H. Tomkins-Tinch, R. Valieris, J. Koster, Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat. Methods. 15, 475-476 (2018).
[0479] (75) E. Bushmanova, D. Antipov, A. Lapidus, A. D. Przhibelskiy, maSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data. GigaScience. 8, gizlOO (2019).
[0480] (76) G. M. Kurtzer, V. Sochat, M. W. Bauer, Singularity: Scientific containers for mobility of compute. PLOS ONE. 12, e0177459 (2017).
[0481] (77) C. Camacho, G. Coulouris, V. Avagyan, N. Ma, J. Papadopoulos, K. Bealer, T. L. Madden, BLAST+: architecture and applications. BMC Bioinformatics. 10, 421 (2009).
[0482] (78) M. Martin, Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal. 17, 10-12 (2011).
[0483] (79) S. Nurk, D. Meleshko, A. Korobeynikov, P. A. Pevzner, metaSPAdes: a new versatile metagenomic assembler. Genome Res. 27, 824-834 (2017).
[0484] (80) D. D. Kang, J. Froula, R. Egan, Z. Wang, MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ. 3, el 165 (2015).
[0485] (81) P.-A. Chaumeil, A. J. Mussig, P. Hugenholtz, D. H. Parks, GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinforma. Oxf. Engl., btz848 (2019). [0486] (82) D. H. Parks, M. Chuvochina, D. W. Waite, C. Rinke, A. Skarshewski, P.-A. Chaumeil, P. Hugenholtz, A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat. Biotechnol. 36, 996-1004 (2018).
[0487] (83) M. N. Price, P. S. Dehal, A. P. Arkin, FastTree 2 - Approximately maximumlikelihood trees for large alignments. PLOS ONE. 5, e9490 (2010).
[0488] (84) I. Letunic, P. Bork, Interactive Tree Of Life (iTOL) v4: recent updates and new developments. Nucleic Acids Res. 47, W256-W259 (2019).
[0489] (85) K. Clark, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, E. W. Sayers, GenBank. Nucleic Acids Res. 44, D67-D72 (2016).
[0490] (86) I. Lee, Y. Ouk Kim, S.-C. Park, J. Chun, OrthoANI: An improved algorithm and software for calculating average nucleotide identity. Int. J. Syst. Evol. Microbiol. 66, 1100— 1103 (2016).
[0491] (87) R. J. R. Molica, E. J. A. Oliveira, P. V. V. C. Carvalho, A. N. S. F. Costa, M. C. C. Cunha, G. L. Melo, S. M. F. O. Azevedo, Occurrence of saxitoxins and an anatoxin-a(s)- like anticholinesterase in a Brazilian drinking water supply. Harmful Algae. 4, 743-753 (2005).
[0492] (88) A. M. Giltrap, L. J. Dowman, G. Nagalingam, J. L. Ochoa, R. G. Linington, W. J. Britton, R. J. Payne, Total synthesis of teixobactin. Org. Lett. 18, 2788-2791 (2016).
[0493] (89) S. Matsunaga, R. E. Moore, W. P. Niemczura, W. W. Carmichael, Anatoxin- a(s), a potent anticholinesterase from Anabaena flos-aquae. J. Am. Chem. Soc. I l l, 8021- 8023 (1989).
[0494] (90) K. Levi, M. Rynge, E. Abeysinghe, and R. A. Edwards, “Searching the Sequence Read Archive using Jetstream and Wrangler,” in Proceedings of the Practice and Experience on Advanced Research Computing, Pittsburgh PA USA, Jul. 2018, pp. 1-7. doi: 10.1145/3219104.3229278.
[0495] (91) Stella T. Lima et al., “Biosynthesis of Guanitoxin Enables Global Environmental Detection in Freshwater Cyanobacteria,” J. Am. Chem. Soc., vol. 144, no. 21, pp. 9372-9379, Jun. 2022, doi: 10.1021/jacs.2c01424. [0496] (92) A. Bankevich et al., “SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing,” J. Comput. Biol., vol. 19, no. 5, pp. 455-477, Apr. 2012, doi: 10.1089/cmb.2012.0021.
[0497] (93) E. Bushmanova, D. Antipov, A. Lapidus, and A. D. Przhibelskiy, “maSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data,” Sep. 2018, doi: 10.1101/420208.
[0498] (94) T. Seemann, “Prokka: rapid prokaryotic genome annotation,” Bioinformatics, vol. 30, no. 14, pp. 2068-2069, Jul. 2014, doi: 10.1093/bioinformatics/btul53.
[0499] (95) A. Priyam et al., “Sequenceserver: A Modem Graphical User Interface for Custom BLAST Databases,” Mol. Biol. EvoL, vol. 36, no. 12, pp. 2922-2924, Dec. 2019, doi: 10.1093/molbev/mszl85.
[0500] (96) J. Ye, G. Coulouris, I. Zaretskaya, I. Cutcutache, S. Rozen, and T. L. Madden, “Primer-BLAST: A tool to design target-specific primers for polymerase chain reaction,” BMC Bioinformatics, vol. 13, no. 1, p. 134, Jun. 2012, doi: 10.1186/1471-2105-13-134.

Claims

WHAT IS CLAIMED IS:
1. A method of detecting guanitoxin-producing bacteria in an aqueous liquid, the method comprising detecting one or more guanitoxin biosynthetic genes in the aqueous liquid, wherein the one or more guanitoxin biosynthetic genes are GntA, GntB, GntC, GntD, GntE, GntF, GntG, GntH, GntI, GntJ, GntT, or a combination thereof.
2. The method of claim 1, wherein the one or more guanitoxin biosynthetic genes are GntA, GntJ, GntC, or a combination thereof.
3. The method of claim 1, comprising contacting the aqueous liquid with one or more nucleic acids, wherein each of the one or more nucleic acids are at least partially complementary to a portion of the one or more guanitoxin biosynthetic genes.
4. The method of claim 3, wherein the detecting comprises performing a PCR method, an isothermal amplification method, a sequencing method, or a combination thereof.
5. The method of claim 3, wherein the portion of the one or more guanitoxin biosynthetic genes comprises a coding sequence, a promoter region sequence, a terminator region sequence, or an intergene region sequence.
6. The method of claim 5, wherein the portion of the one or more guanitoxin biosynthetic genes comprises a coding sequence.
7. The method of 3, wherein the one or more nucleic acids each independently comprises a sequence having at least 80% identity to any one of SEQ ID NO: 1 to SEQ ID NO:22, wherein each nucleic acid of the one or more nucleic acids is different.
8. The method of claim 7, wherein the one or more nucleic acids each independently comprises a sequence having at least 80% identity to any one of SEQ ID NO: 1 to SEQ ID NO:4.
9. The method of claim 8, wherein the one or more nucleic acids comprises a first nucleic acid comprising a sequence having at least 80% identity to SEQ ID NO: 1 and a second nucleic acid comprising a sequence having at least 80% identity to SEQ ID NO:2.
10. The method of claim 8, wherein the one or more nucleic acids comprises a first nucleic acid comprising a sequence having at least 80% identity to SEQ ID NO:3 and a second nucleic acid comprising a sequence having at least 80% identity to SEQ ID NON.
11. The method of claim 1, wherein the guanitoxin-producing bacteria are cyanobacteria.
12. The method of claim 11, wherein the cyanobacteria are Sphaerospermopsis torques-reginae, Chrysosporum ovalisporum, Cuspidothrix, Cylindrospermopsis, Cylindrospermum, Dolichospermum, Microcystis, Oscillatoria, Planktothrix, Phormidium, Anabaena flos-aquae, A. lemmermannii Raphidiopsis mediterranea, Tychonema, or Woronichinia.
13. The method of claim 1, wherein the aqueous liquid is derived from a lake, river, or pond.
14. The method of claim 1, wherein the aqueous liquid is derived from a public water system or a private water system.
15. The method of claim 1, wherein the aqueous liquid is ingested by, inhaled by, or contacted with a subject.
16. The method of claim 15, wherein the subject is treated for guanitoxin- induced toxicity when the one or more guanitoxin biosynthetic genes are detected.
17. A kit for detecting guanitoxin-producing bacteria in an aqueous liquid, the kit comprising one or more nucleic acids each at least partially complementary to a portion of one or more guanitoxin biosynthetic genes, wherein the one or more guanitoxin biosynthetic genes are GntA, GntB, GntC, GntD, GntE, GntF, GntG, GntH, GntI, GntJ, GntT, or a combination thereof.
18. The kit of claim 17, wherein the one or more guanitoxin biosynthetic genes are GntA, GntJ, GntC, or a combination thereof.
19. The kit of claim 17, wherein the portion of the one or more guanitoxin biosynthetic genes comprises a coding sequence, a promoter region sequence, a terminator region sequence, or an intergene region sequence.
20. The kit of claim 19, wherein the portion of the one or more guanitoxin biosynthetic genes comprises a coding sequence.
21. The kit of claim 17, wherein the one or more nucleic acids each independently comprises a sequence having at least 80% identity to any one of SEQ ID NO: 1 to SEQ ID NO:22, wherein each nucleic acid of the one or more nucleic acids is different.
22. The kit of claim 21, wherein the one or more nucleic acids each independently comprises a sequence having at least 80% identity to any one of SEQ ID NO: 1 to SEQ ID NO:4.
23. The kit of claim 22, wherein the one or more nucleic acids comprises a first nucleic acid comprising a sequence having at least 80% identity to SEQ ID NO: 1 and a second nucleic acid comprising a sequence having at least 80% identity to SEQ ID NO:2.
24. The kit of claim 22, wherein the one or more nucleic acids comprises a first nucleic acid comprising a sequence having at least 80% identity to SEQ ID NO:3 and a second nucleic acid comprising a sequence having at least 80% identity to SEQ ID NO:4.
25. The kit of claim 17, wherein the guanitoxin-producing bacteria are cyanobacteria.
26. The kit of claim 25, wherein the cyanobacteria are Sphaerospermopsis torques-reginae, Chrysosporum ovalisporum, Cuspidothrix, Cylindrospermopsis, Cylindrospermum, Dolichospermum, Microcystis, Oscillatoria, Planktothrix, Phormidium, Anabaena flos-aquae, A. lemmermannii Raphidiopsis mediterranea, Tychonema, or Woronichinia.
27. The kit of claim 17, wherein the aqueous liquid is derived from a lake, river, or pond.
28. The kit of claim 17, wherein the aqueous liquid is derived from a public water system or a private water system.
29. The kit of claim 17, wherein the aqueous liquid is ingested, inhaled, or contacted by a subject.
30. The kit of claim 17, further comprising an enzyme, deoxynucleoside triphosphates (dNTPs), a control DNA, a detectable label, or a combination thereof.
31. The kit of claim 17, further comprising a therapeutic effective for treating guanitoxin-induced toxicity.
PCT/US2023/062430 2022-02-11 2023-02-10 Methods and compositions for detecting guanitoxin producing bacteria WO2023154891A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202263267862P 2022-02-11 2022-02-11
US63/267,862 2022-02-11

Publications (2)

Publication Number Publication Date
WO2023154891A2 true WO2023154891A2 (en) 2023-08-17
WO2023154891A3 WO2023154891A3 (en) 2023-10-05

Family

ID=87565157

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/062430 WO2023154891A2 (en) 2022-02-11 2023-02-10 Methods and compositions for detecting guanitoxin producing bacteria

Country Status (1)

Country Link
WO (1) WO2023154891A2 (en)

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070048756A1 (en) * 2005-04-18 2007-03-01 Affymetrix, Inc. Methods for whole genome association studies
PT1888766E (en) * 2005-05-31 2013-04-18 Newsouth Innovations Pty Ltd Detection of hepatotoxic cyanobacteria

Also Published As

Publication number Publication date
WO2023154891A3 (en) 2023-10-05

Similar Documents

Publication Publication Date Title
KR102647766B1 (en) Class II, type V CRISPR systems
Alberti et al. Triggering the expression of a silent gene cluster from genetically intractable bacteria results in scleric acid discovery
KR20190059966A (en) S. The Piogenes CAS9 mutant gene and the polypeptide encoded thereby
Shi et al. Comparative genome mining and heterologous expression of an orphan NRPS gene cluster direct the production of ashimides
Lai et al. Characterization and regulation of the osmolyte betaine synthesizing enzymes GSMT and SDMT from halophilic methanogen Methanohalophilus portucalensis
Mejean et al. In vitro reconstitution of the first steps of anatoxin-a biosynthesis in Oscillatoria PCC 6506: from free L-proline to acyl carrier protein bound dehydroproline
Fewer et al. Nostophycin biosynthesis is directed by a hybrid polyketide synthase-nonribosomal peptide synthetase in the toxic cyanobacterium Nostoc sp. strain 152
Bittencourt-Oliveira et al. Diversity of microcystin-producing genotypes in Brazilian strains of Microcystis (Cyanobacteria)
EP1639086A2 (en) Rna interferases and methods of use thereof
Gunasekera et al. Transcriptomic analyses elucidate adaptive differences of closely related strains of Pseudomonas aeruginosa in fuel
Vigliotta et al. Natural merodiploidy involving duplicated rpoB alleles affects secondary metabolism in a producer actinomycete
Hemmerlin et al. A cytosolic Arabidopsis D-xylulose kinase catalyzes the phosphorylation of 1-deoxy-D-xylulose into a precursor of the plastidial isoprenoid pathway
Reimmann et al. PchC thioesterase optimizes nonribosomal biosynthesis of the peptide siderophore pyochelin in Pseudomonas aeruginosa
Wallwey et al. Genome mining reveals the presence of a conserved gene cluster for the biosynthesis of ergot alkaloid precursors in the fungal family Arthrodermataceae
Mundt et al. CdpC2PT, a reverse prenyltransferase from Neosartorya fischeri with a distinct substrate preference from known C2-prenyltransferases
Ooi et al. RNA lariat debranching enzyme
Zukher et al. Ribosome-controlled transcription termination is essential for the production of antibiotic microcin C
Hua et al. Offloading role of a discrete thioesterase in type II polyketide biosynthesis
US8372601B2 (en) Compositions and methods for the synthesis of APPA-containing peptides
WO2023154891A2 (en) Methods and compositions for detecting guanitoxin producing bacteria
JP6748108B2 (en) Production of aromatic compounds
KR20230112679A (en) Genetically engineered bacteria capable of producing cytokinins with isoprenoid side chains
Liu et al. A novel deaminase involved in chloronitrobenzene and nitrobenzene degradation with Comamonas sp. strain CNB-1
Standaert et al. Identification of parallel and divergent optimization solutions for homologous metabolic enzymes
Müller et al. A unique mechanism for methyl ester formation via an amide intermediate found in myxobacteria

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23753720

Country of ref document: EP

Kind code of ref document: A2