WO2022214617A1 - 2-aminoadenine modified nucleic acids, cells comprising them, and methods of producing them - Google Patents

2-aminoadenine modified nucleic acids, cells comprising them, and methods of producing them Download PDF

Info

Publication number
WO2022214617A1
WO2022214617A1 PCT/EP2022/059320 EP2022059320W WO2022214617A1 WO 2022214617 A1 WO2022214617 A1 WO 2022214617A1 EP 2022059320 W EP2022059320 W EP 2022059320W WO 2022214617 A1 WO2022214617 A1 WO 2022214617A1
Authority
WO
WIPO (PCT)
Prior art keywords
seq
cell
datz
purz
nucleic acid
Prior art date
Application number
PCT/EP2022/059320
Other languages
French (fr)
Inventor
Dariusz CZERNECKI
Marc Delarue
Pierre Alexandre KAMINSKY
Original Assignee
Institut Pasteur
Centre National De La Recherche Scientifique (Cnrs)
Sorbonne Universite
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institut Pasteur, Centre National De La Recherche Scientifique (Cnrs), Sorbonne Universite filed Critical Institut Pasteur
Publication of WO2022214617A1 publication Critical patent/WO2022214617A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/005Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from viruses
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K31/00Medicinal preparations containing organic active ingredients
    • A61K31/70Carbohydrates; Sugars; Derivatives thereof
    • A61K31/7088Compounds having three or more nucleosides or nucleotides
    • A61K31/711Natural deoxyribonucleic acids, i.e. containing only 2'-deoxyriboses attached to adenine, guanine, cytosine or thymine and having 3'-5' phosphodiester links
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N7/00Viruses; Bacteriophages; Compositions thereof; Preparation or purification thereof
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2795/00Bacteriophages
    • C12N2795/00011Details
    • C12N2795/10011Details dsDNA Bacteriophages
    • C12N2795/10021Viruses as such, e.g. new isolates, mutants or their genomic sequences
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2795/00Bacteriophages
    • C12N2795/00011Details
    • C12N2795/10011Details dsDNA Bacteriophages
    • C12N2795/10311Siphoviridae
    • C12N2795/10322New viral proteins or individual genes, new structural or functional aspects of known viral proteins or genes

Definitions

  • Bacteriophages have long been known to use modified bases in their DNA to prevent cleavage by the host’s restriction endonucleases.
  • cyanophage S-2L is unique because its genome has all its adenines (A) systematically replaced by 2-aminoadenines (Z).
  • 2- aminoadenine also makes a base-pair with thymine (Z:T) but with three hydrogen bonds instead of the two used in adenine: thymine (A:T) Watson-Crick classical base-pair. Therefore, replacement of A by Z can provide a more stable nucleic acid molecule which can also be endowed with an increased resistance to endonucleases.
  • the examples demonstrate that a cluster of three genes found in a cyanophage called S- 2L, when used in combination, is the minimum set of genes that should be used to introduce in a living organism a new base called 2-aminoadenine, or diaminopurine, in its DNA.
  • the examples demonstrate identification and full characterization of datZ and mazZ, and demonstrate how these genes may be used in combination with purZ to incorporate 2-aminoadenine into DNA.
  • Nucleic acids in which 2-aminoadenine is incorporated is more stable than natural DNA, and therefore may display increased and longer information storage capacities.
  • this invention provides a recombinant cell or virus comprising a genome comprising a 2-aminoadenine/thymine (Z/T) to (adenine/thymine (A/T) +
  • a virus does not naturally comprise a datZ gene, does not naturally comprise a mazZ gene, and does not naturally comprise a purZ gene if the virus does not naturally comprise the genes in its genome.
  • a cell does not naturally comprise a datZ gene, does not naturally comprise a mazZ gene, and does not naturally comprise a purZ gene if the cell does not naturally comprise the genes in its genome and if the cell also is not naturally infected by a virus that introduces the genes into the cell.
  • the recombinant cell may be a prokaryotic cell or a eukaryotic cell. In some preferred embodiments, it is a prokaryotic cell. In a preferred embodiment, the recombinant cell is an E. coli cell.
  • the virus may be a virus that infects eukaryotic cells or it may be a phage that infects prokaryotic cells.
  • the virus is a bacteriophage.
  • the virus is a phage that infects E. coli, such as T7 or lambda.
  • the Z/T to (A/T + Z/T) ratio is the ratio of the number of Z/T pairs to the sum of A/T and Z/T pairs in the genome. This value may be determined by measuring the Z content and determining the ratio of Z to (A+Z) in the genome.
  • the occurrence of Z is homogenous throughout the genome. In some embodiments, the occurrence of Z is not homogenous throughout the genome. In some embodiments, the occurrence of Z in the genome correlates with regions of the genome that were recently replicated.
  • the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.10. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.15. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.20. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.25.
  • the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.30. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.35. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.40. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.45.
  • the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.50. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.55. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.60. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.65.
  • the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.70. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.75. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.80. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.85.
  • the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.90. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.95. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.98. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.99.
  • the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 1%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 2%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 3%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 4%.
  • the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 5%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 6%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 7%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 8%.
  • the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 9%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 10%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 11%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 12%.
  • the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 13%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 14%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 15%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 16%.
  • the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 17%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 18%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 19%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 20%.
  • the recombinant cell or virus is based on a starting cell or virus that does not naturally comprise a datZ gene, does not naturally comprise a mazZ gene, and does not naturally comprise a purZ gene. Therefore, the starting cell or virus does not naturally comprise Z in its nucleic acid. It is a feature of certain embodiments of this invention that by expressing a DatZ, a MazZ and a PurZ in a recombinant cell it is possible to produce a nucleic acid comprising Z.
  • the recombinant cell or virus comprises a coding sequence for a polypeptide sequence that is at least 80% identical to DatZ (SEQ ID NO: 9), a coding sequence for a polypeptide sequence that is at least 80% identical to MazZ (SEQ ID NO: 10), and a coding sequence for a polypeptide sequence that is at least 80% identical to PurZ (SEQ ID NO: 11), wherein the coding sequences are operatively linked to regulatory control elements for expression in the recombinant cell.
  • the regulatory control elements may comprise a promoter and/or a terminator.
  • the coding sequences may be present on one, two, or three different nucleic acid molecules, such as for example plasmids.
  • the recombinant cell or virus comprises a coding sequence for a polypeptide sequence that is at least 85% identical to DatZ (SEQ ID NO: 9), a coding sequence for a polypeptide sequence that is at least 85% identical to MazZ (SEQ ID NO: 10), and a coding sequence for a polypeptide sequence that is at least 85% identical to PurZ (SEQ ID NO: 11).
  • the recombinant cell or virus comprises a coding sequence for a polypeptide sequence that is at least 90% identical to DatZ (SEQ ID NO: 9), a coding sequence for a polypeptide sequence that is at least 90% identical to MazZ (SEQ ID NO: 10), and a coding sequence for a polypeptide sequence that is at least 90% identical to PurZ (SEQ ID NO: 11).
  • the recombinant cell or virus comprises a coding sequence for a polypeptide sequence that is at least 95% identical to DatZ (SEQ ID NO: 9), a coding sequence for a polypeptide sequence that is at least 95% identical to MazZ (SEQ ID NO: 10), and a coding sequence for a polypeptide sequence that is at least 95% identical to PurZ (SEQ ID NO: 11).
  • the recombinant cell or virus comprises a coding sequence for a polypeptide sequence that is at least 98% identical to DatZ (SEQ ID NO: 9), a coding sequence for a polypeptide sequence that is at least 98% identical to MazZ (SEQ ID NO: 10), and a coding sequence for a polypeptide sequence that is at least 98% identical to PurZ (SEQ ID NO: 11).
  • the recombinant cell or virus comprises a coding sequence for a polypeptide sequence that is at least 99% identical to DatZ (SEQ ID NO: 9), a coding sequence for a polypeptide sequence that is at least 99% identical to MazZ (SEQ ID NO: 10), and a coding sequence for a polypeptide sequence that is at least 99% identical to PurZ (SEQ ID NO: 11).
  • the coding sequence for a polypeptide sequence that is at least 80% identical DatZ is selected from datZ (SEQ ID NO: 1) and codon optimized datZ (SEQ ID NO: 5), and/or the coding sequence for a polypeptide sequence that is at least 80% identical
  • MazZ is selected from mazZ (SEQ ID NO: 2) and codon optimized mazZ (SEQ ID NO: 6), and/or the coding sequence for a polypeptide sequence that is at least 80% identical PurZ (SEQ ID NO: 11) is selected from purZ (SEQ ID NO: 3) and codon optimized purZ (SEQ ID NO: 7).
  • the recombinant cell or virus comprises a coding sequence for DatZ (SEQ ID NO: 9), a coding sequence for MazZ (SEQ ID NO: 10), and a coding sequence for PurZ (SEQ ID NO: 11).
  • one or more of the coding sequences are codon optimized.
  • the coding sequences are present on one or more plasmids.
  • the coding sequences are present on one or more chromosomes.
  • the recombinant cell comprises DatZ (SEQ ID NO: 9), MazZ
  • compositions comprising a plurality of the recombinant cells of the invention.
  • the composition may be a plurality of recombinant cells present in or on a culture medium, a plurality of recombinant cells frozen in a container, or a plurality of recombinant cells in a lyophilized composition.
  • the recombinant cell is not a cyanophage S-2L.
  • the recombinant cell is not a cell that is naturally infected by a cyanophage S-2L.
  • the recombinant virus is not a cyanophage. [0031] In some embodiments, the recombinant virus is not a Vibrio phage.
  • the recombinant cell is not a cyanobacteria.
  • the recombinant cell is not a member of the genus Vibrio.
  • the recombinant cell is not a cyanobacteria or a member of the genus Vibrio, and the recombinant virus is not a cyanophage or a Vibrio phage.
  • the recombinant cell is a bacteria.
  • nucleic acid comprising 2-aminoadenine (Z).
  • the methods may comprise providing a recombinant cell as described herein and isolating the nucleic acid comprising Z from the cell.
  • the isolated nucleic acid is a plasmid.
  • the plasmid may be a bacterial plasmid and may comprise an origin of replication and/or a gene encoding a selectable marker.
  • the isolated nucleic acid is a chromosome.
  • the isolated nucleic acid is a total cell nucleic acid preparation.
  • the isolated nucleic acid may comprise nucleic acid may be present in a composition comprising molecules that comprise Z and others that do not.
  • Nucleic acid libraries are also provided.
  • a “nucleic acid library” is a plurality of nucleic acids in any form that are all obtained from the genome of a reference organism. Reference may be made to the “coverage” of the genome of the reference organism by the library. The coverage is the subset of the genome that is represented by at least one copy in the library. Thus, a coverage of 50% indicates that the library comprises at least one copy of 50% of the positions in the genome.
  • the nucleic acid library is isolated. In some embodiments, it is provided in cloned form, and may be present in a host such as a bacteria or phage host. In some embodiments, the nucleic acid library is provided in the form of purified nucleic acid.
  • the isolated nucleic acid library comprises at least 50% coverage of the genome of a reference organism or virus, wherein the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.05, and wherein the reference organism does not naturally comprise a datZ gene, does not naturally comprise a mazZ gene, and does not naturally comprise a purZ gene.
  • the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.10.
  • the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.15.
  • the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.20.
  • the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.25.
  • the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.30.
  • the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.35. In some embodiments, the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.40. In some embodiments, the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.45. In some embodiments, the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.50. In some embodiments, the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.55.
  • the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.60. In some embodiments, the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.65. In some embodiments, the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.70. In some embodiments, the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.75. In some embodiments, the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.80.
  • the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.85. In some embodiments, the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.90. In some embodiments, the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.95. In some embodiments, the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.98. In some embodiments, the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.99.
  • the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 1%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 2%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 3%. In some embodiments, the nucleic acid library comprises a 2- aminoadenine (Z) content of at least 4%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 5%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 6%.
  • the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 7%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 8%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 9%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 10%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 11%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 12%.
  • the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 13%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 14%. In some embodiments, the nucleic acid library comprises a 2- aminoadenine (Z) content of at least 15%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 16%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 17%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 18%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 19%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 20%.
  • the reference organism is prokaryotic. In some embodiments, the reference organism is eukaryotic. In some embodiments, the reference organism is E. coli. In some embodiments, the reference virus infects eukaryotic cells. In some embodiments, the reference organism is a phage, such as a bacteriophage.
  • the reference virus is not a cyanophage S-2L.
  • the reference cell is not a cell that is naturally infected by a cyanophage S-2L.
  • the reference virus is not a cyanophage. [0046] In some embodiments, the reference virus is not a Vibrio phage.
  • the reference cell is not a cyanobacteria.
  • the reference cell is not a member of the genus Vibrio.
  • the reference cell is not a cyanobacteria or a member of the genus Vibrio, and the reference virus is not a cyanophage or a Vibrio phage.
  • the reference cell is a bacteria.
  • the recombinant cells disclosed herein may be used to make a stabilized nucleic acid because nucleic acids having Z incorporated therein may have increased stability. Accordingly, in another aspect this invention provides methods of making a stabilized nucleic acid. The methods may comprise providing a cell that does not naturally comprise a datZ gene, does not naturally comprise a mazZ gene, and does not naturally comprise a purZ gene; and expressing recombinant DatZ, MazZ and PurZ proteins in the cell for a period of time sufficient for incorporation of 2-aminoadenine (Z) into nucleic acid in the cell to form a stabilized nucleic acid comprising 2-aminoadenine (Z).
  • the methods further comprise isolating the stabilized nucleic acid from the cell.
  • the stabilized nucleic acid is endogenous to the cell.
  • the stabilized nucleic acid is heterologous to the cell.
  • the stabilized nucleic acid is a viral nucleic acid.
  • the cell is not a cyanophage S-2L.
  • the cell is not a cell that is naturally infected by a cyanophage S-2L.
  • the cell is not a cyanobacteria.
  • the cell is not a member of the genus Vibrio.
  • the cell is not a cyanobacteria or a member of the genus Vibrio, and the recombinant virus is not a cyanophage or a Vibrio phage.
  • the cell is a bacterium.
  • the methods comprise introducing into the cell a recombinant coding sequence for a polypeptide sequence that is at least 80% identical to DatZ (SEQ ID NO: 9), introducing into the cell a recombinant coding sequence for a polypeptide sequence that is at least 80% identical to MazZ (SEQ ID NO: 10), and introducing into the cell a recombinant coding sequence for a polypeptide sequence that is at least 80% identical to PurZ (SEQ ID NO: 11), wherein the coding sequences are operatively linked to regulatory control elements for expression is the recombinant cell.
  • the methods comprise introducing into the cell a recombinant coding sequence for a polypeptide sequence that is at least 85% identical to DatZ (SEQ ID NO: 9), a coding sequence for a polypeptide sequence that is at least 85% identical to MazZ (SEQ ID NO: 10), and a coding sequence for a polypeptide sequence that is at least 85% identical to PurZ (SEQ ID NO: 11).
  • the methods comprise introducing into the cell a recombinant coding sequence for a polypeptide sequence that is at least 90% identical to DatZ (SEQ ID NO: 9), a coding sequence for a polypeptide sequence that is at least 90% identical to MazZ (SEQ ID NO: 10), and a coding sequence for a polypeptide sequence that is at least 90% identical to PurZ (SEQ ID NO: 11).
  • the methods comprise introducing into the cell a recombinant coding sequence for a polypeptide sequence that is at least 95% identical to DatZ (SEQ ID NO: 9), a coding sequence for a polypeptide sequence that is at least 95% identical to MazZ (SEQ ID NO: 10), and a coding sequence for a polypeptide sequence that is at least 95% identical to PurZ (SEQ ID NO: 11).
  • the methods comprise introducing into the cell a recombinant coding sequence for a polypeptide sequence that is at least 98% identical to DatZ (SEQ ID NO: 9), a coding sequence for a polypeptide sequence that is at least 98% identical to MazZ (SEQ ID NO: 10), and a coding sequence for a polypeptide sequence that is at least 98% identical to PurZ (SEQ ID NO: 11).
  • the methods comprise introducing into the cell a recombinant coding sequence for a polypeptide sequence that is at least 99% identical to DatZ (SEQ ID NO: 9), a coding sequence for a polypeptide sequence that is at least 99% identical to MazZ (SEQ ID NO: 10), and a coding sequence for a polypeptide sequence that is at least 99% identical to PurZ (SEQ ID NO: 11).
  • the coding sequence for a polypeptide sequence that is at least 80% identical DatZ is selected from datZ (SEQ ID NO: 1) and codon optimized datZ (SEQ ID NO: 5), wherein the coding sequence for a polypeptide sequence that is at least 80% identical MazZ (SEQ ID NO: 10) is selected from mazZ (SEQ ID NO: 2) and codon optimized mazZ (SEQ ID NO: 6), and wherein the coding sequence for a polypeptide sequence that is at least 80% identical PurZ (SEQ ID NO: 11) is selected from purZ (SEQ ID NO: 3) and codon optimized purZ (SEQ ID NO: 7).
  • the method comprises introducing into the cell a recombinant coding sequence for DatZ (SEQ ID NO: 9), introducing into the cell a recombinant coding sequence for MazZ (SEQ ID NO: 10), and introducing into the cell a recombinant coding sequence for PurZ (SEQ ID NO: 11).
  • the coding sequences are present on one or more plasmids.
  • the coding sequences are present on one or more chromosomes.
  • the methods may further comprise introducing a heterologous starting nucleic acid into the cell in order to make a Z-enriched form of the starting nucleic acid.
  • the Z-enriched form has an increased stability.
  • increased stability is defined by measuring the thermal denaturation profile of the DNA and its Tm, and comparing with the Tm of a DNA of same composition and same length but with 100% of A and no Z.
  • the Tm is increased by at least 5%, at least 10%, at least 15%, at least 20%, or at least 25%.
  • the heterologous starting nucleic acid is a viral nucleic acid.
  • the virus is not a cyanophage.
  • the virus is not a Vibrio phage.
  • the invention encompasses the use of the recombinant cell or virus, or the composition according to the present disclosure to store DNA. DNA may be used as medium for the storage of information; if the DNA is made more stable, the information storage will last longer and will be safer.
  • the invention also encompasses the use of the recombinant cell or virus, or the composition according to the present disclosure, in phage therapy.
  • the genome of a phage may be engineered, through the addition of the three genes datZ, mazZ and datZ in its genome, to have enough of its adenines substituted into 2-aminoadenine in its DNA, so that it will acquire an increased resistance to its host endonucleases.
  • FIG. Watson-Crick base pairs and natural variations thereof. Hydrogen bonds are marked by a dotted orange line a Classical DNA base pairs, universal to all three domains of life and most viruses b Other types of base pairs with three hydrogen bonds found in some organisms and viruses. Additional chemical groups are in red. 2-aminoadenine : thymine (Z:T, left); guanine : 5-hydroxymethylcytosine (G:hmc, center); archaeosine : cytosine (G+:C, right). The Z:T pair, first found in cyanophage S-2L, replaces completely the usual A:T pair in the genome.
  • FIG. 1 Functional characterisation of S-2L PrimPol.
  • a Schematic diagram of S-2L PrimPol constructs showing its different domains with their respective amino-acid range (to scale)
  • b-d Results of DNA polymerase activity tests of S-2L PrimPol with either dATP or dZTP as the incoming dNTP, using templates with dT ioGG (b and c) or dT 12 (d) overhang b
  • Different buffers with various pHs, noted below the triangles Effect of different divalent ions, at 5 ⁇ M each d Effect of growing concentrations of nucleotides (lanes 3-8) and pre-incubation of reactional mixture for DatZ WT (lanes 9-10) and I22A mutant (lanes 11-12).
  • Residues conserved with other AEPs and of known function are indicated with a grey dot underneath; residues conserved only between the closest relatives of PrimPol and of potential catalytic importance for primase activity - with a black dot.
  • the double-hatted residue D87 could be involved in both polymerase (known) and/or primase (suggested) activities b Structure of PP- N190 in ribbon and surface representation, with two symmetric molecules in the crystallographic asymmetric unit, each coloured with an grey-black gradient. Calcium ions are shown by grey spheres, with water molecules forming their hydration shells shown as black ones.
  • the catalytic site of molecule A is shown in grey stick representation and indicated with a dotted circle c Zoom on the catalytic site of PP-N 190. Residues highlighted in (a) are shown in stick representation and labelled, maintaining the same colour code. The experimental 2F o -F C electron density around these residues (black mesh) is contoured at 1 sigma.
  • FIG. 1 HPLC analysis of S-2L DatZ dephosphorylation products. Nucleotide standards are in black, products eluted after incubation of the corresponding triphosphates with DatZ are in grey. Each sample was eluted separately, using an amount of 40 nmol. The enzyme is active exclusively with dATP and removes from it all phosphates: it is therefore a triphosphohydrolase specific of dATP, or dATPase.
  • Figure 5 Three-dimensional structure of S-2L DatZ. a Ribbon representation of a DatZ monomer in a light grey-dark grey gradient, with bound dA in stick (yellow). The Zn 2+ ion is shown as a grey sphere b A close-up on the catalytic pocket of DatZ with the experimental 2F o -F c electron density contoured at 2.5 sigmas around bound ligands: dA and Zn 2+ (black mesh). Additionally, the anomalous density at Zn 2+ absorption edge (dark grey mesh) is contoured at 10 sigmas.
  • Residue 122 provides direct specificity towards the adenine nucleobase, creating a steric hindrance for chemical groups in position 2 of the purine ring.
  • Other residues highlighted in the text are Zn 2+ -coordinating ones, W20 and P79.
  • c Structure of the full DatZ hexamer, top and side views, in surface representation protomers form a compact, particularly stable disc in an alternating, zigzagging pattern. Two of the six symmetrical cavities leading to buried dA molecules are visible in the side view and highlighted by the white dotted circles d
  • FIG. 6 Catalytic centre of S-2L DatZ with the substrate and cofactors and the mechanism of tri-dephosphorylation, a Model of the reaction centre made by superposition of two of the structures solved in this work.
  • the first structure defines dATP and residue R19 interacting with the a-phosphate; hydrogen atoms were omitted for clarity.
  • the second structure provides catalytic ions A and B (spheres), bound water molecules that are likely to take part in the reaction (light grey) and the metal coordinating residues. Interacting atoms, ions and groups of interests are shown by dashed lines of corresponding colour. The distance between the two Co 2+ ions is 5.2 A.
  • Figure 7 Additional polymerase activity tests on S-2L PrimPol constructs, using nucleotide dATP/dATGC mix or dZTP/dZTCG. a Polymerisation assay of the three PrimPol constructs, with a negative control without any polymerase in the first lane. The incubation was conducted for 20 min, using the dTi2 overhang template b Polymerisation assay for the first 124nt of PrimPoTs native gene. Results are shown for a negative control without any polymerase (lane l),a positive control with E.
  • FIG. 8 AEP and its ligands a Three known AEP structures with bound dsDNA, viewedfrom the same perspective. Below are: the protein name, PDB code of the structure and the organism or plasmid of origin.
  • the DNA molecule seems to bend in an L-shape at the catalytic site b
  • the ionic bonds are visualised by the black lines c Distances between the residues and bound ions shown in (b), with the code given below the graph. They were measured in the course of 212 ns of the simulation and averaged in the 2 ns frame.
  • the novel ion in site C interacts with the g-phosphate of the nucleotide in the initiation site; the binding is stable and similar across all simulations.
  • Figure 9 Further tests on S-2L DatZ catalytic activity and its I22A mutant.
  • the panels are constructed as in Figure 4. a HPLC analysis of nucleotides obtained after incubation of DatZ with dADP and dAMP, showing no discernible dephosphorylation products b Same analysis for dATP, dZTP and dGTP incubated with DatZ I22A. Compared to the wild-type enzyme, the mutant shows reduced dATPase activity and improved, although still marginal, dZTPase activity. dATP to dA tri- dephosphorylation occurs in a single step.
  • FIG. 10 Catalytic centre of S-2L DatZ dATPase with bound substrate, product and cofactors. Colour code as in Figure 5b and Figure 6a; residues K81 and K116, balancing the charge of the triphosphate, are also displayed. Water molecules and hydrogen atoms are omitted for clarity.
  • a Structure of DatZ with dA and Co 2+ The 2Fo-Fc electron density map is contoured at 1 sigma around dA and Co 2+ ions in the binding sites named A and B (black mesh).
  • the anomalous signal at the wave- length of data collection is contoured at 10 sigmas (dark grey mesh) b Structure of DatZ with dATP and partially occupied Zn 2+ , using the same representation.
  • the 2Fo-Fc electron density map is contoured at 1 sigma around dATP and Zn 2+ ion in the binding site A (black mesh). Residual amounts of penta- coordinated Zn 2+ can be identified by the anomalous signal at Zn edge, contoured here at 5 sigmas (dark greymesh).
  • Figure 11 Sequence multialignment of close DatZ homologues co-occurring with purZ gene in related phages. Numbering above the alignment refers to S-2L DatZ.
  • ctbf_3 Compost metagenome, Bacteroidetes bacterium, phage ZP6, Podoviridae sp. ctpVR23, Chloroflexi bacterium, four unnamed sequences from Tara database and phage vB_OspP_OH.
  • the supposedly bacterial sequences are short and match other viral sequences on the full length of the genome.
  • S-2L_DatZ (SEQ ID NO: 9); phiVC8 (SEQ ID NO: 17); SSEA01000061 (SEQ ID NO: 18); MG641885 (SEQ ID NO: 19); MG674163 (SEQ ID NO: 20); MH622939 (SEQ ID NO: 21); RCUU0 1000438 (SEQ ID NO: 22); QQVW01000181 (SEQ ID NO: 23); MK203850 (SEQ ID NO: 24);MN582112 (SEQ ID NO: 25); RPPI01000007 (SEQ ID NO: 26); CENI 14966_2 (SEQ ID NO: 27); CEUI32_2 2 (SEQ ID NO: 28); CEUX 95133_1 (SEQ ID NO: 29);CEPX324866_1 (SEQ ID NO: 30); MT028492 (SEQ ID NO: 31).
  • FIG. 12 A model of dZ in the catalytic pocket of DatZ and its mutant I22A. a The distance between dZ nitrogen atom of the amino group in position 2 and the closest atom of 122 (Cy)is shown by a dashed line. It is too short to allow for correct dZ binding b The distance between the same nitrogen atom and the closest atom of the side chain in mutant I22A (Cp) is longer by 1.5 A (dashed line).
  • FIG. 13 Structural classification of available AEP enzymes with Dali a Dendrogram of AEP superfamily, derived by hierarchical clustering of the similarity matrix data.
  • PDB codes are atopof the branches; PP-N190 is marked by a black triangle.
  • Archaeo-eukaryotic PriS and bacterial NHEJ primases are monophyletic, and group together in so-called AEP proper clade.
  • Primase-Polymerases (PrimPols) are more divergent and spread-out across all threedomains of life, viruses and plasmids.
  • the AEP domain of S-2L PrimPol shares a recent ancestor with plasmidic RepB’.
  • b The similarity matrix data.
  • PDB codes are indicated to the left and below, organism names further to the left, protein families above and similarity scale to the right.
  • PP- N190 is highlighted with a black cross. Each square represents with its colour how close structurally a pair of AEP proteins is, varying from light grey (no similarity) to dark grey (high closeness).
  • Figure 14 Structural multialignment of S-2L DatZ and all other HD phosphohydrolases whose crystal structure is available in the PDB. Organism names and PDB codes are indicated on the left. The meaning of the dots below the alignment is the same as in Figure 11 ; the additional conserved residue E93 is marked by a grey dot. Empty circles highlight highly conserved residues with slightly shifted backbone positions with respect to S-2L (connected full circles), but with superimposed similar functional groups. Occasional unstructured and unbuilt regions in the middle were aligned using sequence information alone while non-superposable N- and C-termini were ignored; in particular, an extended, but structured N-terminus of L.
  • S-2L_DatZ SEQ ID NO: 9
  • E.coli_2PAU SEQ ID NO: 32
  • B.megaterium_5TK7 SEQ ID NO: 33
  • L.innocua_3MZO (SEQ ID NO: 34); A.tumefaciens_2GZ4 (SEQ ID NO: 35);
  • M.magnetotactitum_3KH 1 (SEQ ID NO: 36); P.furiosus_lXX7 ((SEQ ID NO: 37);P.horikoshii_2CQZ (SEQ ID NO: 38); A.fulgidus_lYOY (SEQ ID NO: 39); S.cerevisiae_5YOX (SEQ ID NO: 40); H.sapiens_4DMB ((SEQ ID NO: 41).
  • Figure 15 Non-rooted maximum-likelihood phylogenetic tree of HD phosphohydrolases for all available molecular structures. The tree was calculated using the alignment from the Figure 14. Organism names and PDB codes are to the right of the corresponding branches. Enzymes divide into three groups: archaeal, eukaryotic and bacterial/S- 2L, suggesting an acquisition of the datZ gene by S-2L’s ancestor from a bacterium. Numbers on the nodes are bootstrap percentage values; the reference distance corresponds to an average 0.5 substitution per site. The topology of the bootstrap consensus tree is identical, supporting the result. [0082] Figure 16. Common structural features across the HD phosphohydrolase family. Comparison of S-2L DatZ with E.
  • DatZ is represented as in Figure 6a; Yfbr E72A mutant structure with bound dAMP is taken from PDB 2PAU, with A72 residue swapped with natural E72 from PDB 2PAQ; OxsA is from PDB 5TK7.
  • YfbR and OxsA models can be inferred from DatZ structures (shownby dotted contours), suggesting that the metal ion site B is universal and has similar coordination across the whole protein family.
  • Site C, observed for OxsA would be a result of a switch from mostly conserved positively charged residue corresponding to DatZ’s K116 to a negatively charged one, justifying the need for a third divalent cation.
  • FIG. Genome of S-2L and its Z-cluster conserved in other Siphoviridae .
  • mazZ-1 and mazZ-2 solid and stripped light grey, respectively.
  • mazZ-1 and mazZ-2 solid and stripped light grey, respectively.
  • the reference distance for the PurZ/DatZ phylogenetic tree corresponds to an average 0.2 substitution per site.
  • FIG. 18 Catalytic properties of S-2L PurZ investigated by HPLC. Reactants of PurZ are visualised on absorbance chromatograms. Peaks corresponding to pure compounds (grey) are indicated with black labels.
  • A Products of PurZ catalysis with dGMP, Asp and dATP/ ATP (left and right panel respectively), taken at three consecutive time steps. The enzyme generates products corresponding to (d)ADP and sadGMP (N6-succino-2-amino deoxyguanidylo monophosphate) and shows no discrimination between the two adenosine triphosphate variants.
  • B Comparison of the expected activity between PurZ and a typical AdSS family representative from E. coli.
  • FIG. 19 Structure of S-2L PurZ, a N6-succino-2-amino deoxyguanidylate synthase with ligands dGMP and dATP.
  • A Ribbon representation of a PurZ monomer in a light-dark grey gradient, with dGMP and dATP shown in stick.
  • B Catalytic pocket of PurZ with the experimental 2Fo-Fc electron density contoured around the reactants at 1 sigma (black mesh). Surrounding residues are defined in the text; R146 from the second protein subunit is stabilizing the phosphate of dGMP in the first subunit’s catalytic pocket.
  • C
  • PurZ dimer in surface representation: the two domains are coloured in light-darkk grey gradients. White dotted circles point to the opposite catalytic cavities. D. Surface representation of PurZ coloured using experimental B-factors, with the corresponding scale bar below. The flexible loop above the catalytic cleft (left) define the aspartate loop. The interface between the dimer (right) is particularly rigid, suggesting a constitutive dimeric form of PurZ.
  • Figure 20 HPLC analysis of S-2L MazZ nucleotide triphosphate specificity and dephosphorylation products. Nucleotide standards are in black, the products eluted after incubation of the corresponding triphosphates with MazZ are in light grey. Each sample was eluted separately, after an injection of 40 nmol. The enzyme is selective towards dGTP and GTP, removing their two terminal b- and g-phosphates.
  • FIG. 21 Structure of S-2L MazZ with bound dGDP and Mn 2+ ions.
  • A A tetramer of MazZ, that constitutes the crystallographic asymmetric unit. Two tight dimers further form a dimer; each of the four catalytic pockets with the reactant and three catalytic ions is created from the two chains of a tight dimer.
  • B Close-up on the catalytic pocket. The product of dGTP dephosphorylation is identified as dGDP in the crystal, next to the catalytic Mn 2+ ions. The determinants of guanine specificity (N2 and 06) and residues coordinating the ions are placed on one protein chain.
  • the three Mn 2+ ions are hexa-coordinated by the negatively charged protein residues, deoxynucleotide phosphates and water molecules (omitted for clarity).
  • the 2Fo-Fc electron density around the ligands is contoured at 1 sigma (black mesh); the anomalous signal attesting for the presence of Mn 2+ ions is contoured at 3 sigmas (dark grey mesh).
  • C A single protein chain of MazZ, coloured in light-dark grey gradient.
  • D Surface representation of MazZ coloured using experimental B-factors with a scale-bar represented on the right. The whole tetramer is very rigid, except for the N- and C-termini and the solvent-exposed D43-H46 flexible loop, fully modelled only for the chain A.
  • FIG. 22 Metabolic pathway of 2-aminoadenine and its reconstitution in unrelated bacteria and phages.
  • A Cellular original pool of nucleotides is represented by the box to the left, the one modified through expression of the viral Z-cluster - to the right. Three dots represent unmodified dTTP and dCTP. Structures of the involved S-2L proteins are shown next to their respective reaction arrows (to scale). Host enzymes (names in grey) finalise the dZTP pathway. Thin, grey arrows stand for no modification. The dashed grey arrow stands for potential use of the standard dNTP pool by PrimPol in absence of the Z-cluster, highlighting its lack of specificity. B.
  • FIG. 23 Identical fold shared by S-2L MazZ, bacterial MazG and HisE proteins. The tight dimer part is shown for four exemplary structures, viewed from the same perspective. For each enzyme both chains are shown in ribbon representation. One chain is additionally coloured: a-helices in dark grey, loops in light grey and b-strands in dark grey. For the coloured chain, the secondary structure elements are numbered. Below each image, the organism of origin, protein name and the PDB code are Indicated.
  • FIG. 24 Structural multi-alignment between S-2L PurZ and all 14 other unique adenylosuccinate/succino-2-amino-deoxy adenylate synthases available in PDB. Organism names and PDB codes are indicated on the left. Circles below the alignment mark positions of residues of interest, divided into three categories: residues strictly conserved across AdSS family (light grey); residues conserved in the usual AdSS enzymes but not in phages (dark grey); more loosely conserved residues with two similar variants or with only occasional mutations (star). Empty circles highlight highly conserved residues with shifted backbone position with respect to S-2L’s ones (connected full circles), but with similar superposed functional groups.
  • S-2L_PurZ (SEQ ID NO: 11); phiV8_PurZ_6FKO (SEQ ID NO: 42); P.horikoshii_5K7X (SEQ ID NO: 43); C.jejuni_3R7T (SEQ ID NO: 44); B.anthracis_4M9D (SEQ ID NO: 45); E.coli_2GCQ (SEQ ID NO: 46); Y.pestis_3HID (SEQ ID NO: 47); B.thailandensis_3UE9 (SEQ ID NO: 48); L.pneumophila_6C25 (SEQ ID NO: 49); P.falciparum_lP9B (SEQ ID NO: 50); A.thaliana_lDJ2 (SEQ ID NO: 51); T.aestivum_lDJ3 (SEQ ID NO: 52); C.neoformans_5I
  • Figure 25 Visualisation of relationships between S-2L PurZ and homologous proteins.
  • A conserveed residues from Fig. 24 are mapped onto S-2L PurZ structure, using the same colour code. Nucleotide substrates - dGMP and dATP - are in light grey.
  • B Visualisation of a S-2L- specific C-terminal insertion in form of an alpha-helix (grey, left) and bacterio-eukaryotic insertions (archaeo-viral deletions) of two large segments, shown on E. coli AdSS (black, right).
  • Figure 26 Non-rooted maximum-likelihood phylogenetic tree of AdSS representatives. The tree was generated using the structural alignment from Fig. S2. Enzymes are divided into four clades: eukaryotic, bacterial, archaeal and viral, the two last ones sharing a recent ancestor. The reference distance corresponds to an average 0.2 substitution per site. The topology of the bootstrap consensus tree is identical, supporting the result presented here.
  • Figure 27 Sequence multialignment of MazZ-1 homologues (S-2L-like). Numbering above the alignment refers to S-2L MazZ.
  • sequences from phages SH-Ab, PMBT28 and Kokobel2 are fused to an HNH domain; sequence from Siphoviridae sp. is fused to an unidentified domain, and HNH domain is found on a separate domain upstream.
  • Catalytic residues of S-2L MazZ are marked with dark grey dots, R83 stabilising an intermediate product - in dark grey, residues coordinating 2-amino group, 06 and sugar moiety - in light grey.
  • S-2L MazZ SEQ ID NO : 10
  • Sinobact. SEQ ID NO : 56
  • Caudovir. SEQ ID NO : 57
  • SH-Ab SEQ ID NO : 58
  • PMBT28 SEQ ID NO : 59
  • Kokobel2 SEQ ID NO : 60).
  • FIG. 28 Sequence multialignment of MazZ-2 homologues ( ⁇ VC8-like). Numbering above the alignment refers to ⁇ VC8 MazZ. Putative catalytic residues of are marked with dark grey dots and grey dot (star), corresponding to S-2L MazZ E35, E38, E50, D53 and R83.
  • phiVC8 SEQ ID NO : 61
  • Bacteroidetes SEQ ID NO : 62
  • ctpVR23 SEQ ID NO : 63
  • ZP6 SEQ ID NO : 64
  • vB_OspP_OH SEQ ID NO : 65
  • Chloroflexi SEQ ID NO : 66.
  • Figure 29 Results of DNA polymerase activity tests of E. coli Poll (Klenow fragment) with either dATP or dZTP as the incoming dNTP. Nucleotide concentrations are given in ⁇ M concentration under the triangles.
  • All living organisms use the same elementary bricks for their genetic material, namely four, and only four, nucleobases: adenine (A), thymine (T), guanine (G) and cytosine (C).
  • nucleobases adenine (A), thymine (T), guanine (G) and cytosine (C).
  • viruses of bacteria bacteriophages or phages
  • Most of the observed DNA modifications occur at position 5 of pyrimidines or position 7 of purines that face the major groove of the DNA double helix 1,3 .
  • Methylation on N4 of cytosine or N6 of adenine are also observed in viruses 2,4 .
  • DNA containing 5-hydroxymethylcytosine has long been known to exist in phages T2, T4 and T6 5 , along with the enzyme (deoxycytidylate hydroxymethylase) responsible for its biosynthesis 6 ; more complicated post-replicative pathways of thymine hypermodification were recently found in phages and recreated in vitro 7 .
  • enzyme deoxycytidylate hydroxymethylase
  • Cyanophage S-2L is a Synechococcus phage from the double-stranded DNA Siphoviridae family. It was first isolated and described in 1977 12 and its genome was shown to contain no adenine nor any of its 7-deaza derivatives. Instead, it uses 2-aminoadenine (2,6- diaminopurine or Z) that has an additional amino group in position 2 compared to adenine 13 . The A:T base pair, with two hydrogen bonds, is therefore replaced by the Z:T base pair that has three hydrogen bonds, as in the G:C base pair (Fig. 1). This feature, combined with an unusually high GC content of S-2L genome, explains its exceptionally high melting point 12 . It is believed that the A-to-Z substitution arose as a form of host evasion tactics, rendering S-2L’s DNA resistant to the DNA-targeting proteins of its host, especially endonucleases 14,15 .
  • Example 2 Materials and Methods for Examples 3 to 9
  • the genomic sequence of cyanophage S-2L was obtained from NCBI’s database (AX955019). Potential ORFs were identified using ORFFinder 53 (>150 nt, genetic code 11). Targeted ORFs were assessed for possible homology with known proteins using BLAST. The genomic positions of genes involved in phage replication is provided in Table 2; nucleotide sequences of native and codon-optimised genes pplA and datZ are provided herein above. Protein disorder of PrimPol was predicted with DISOPRED 27 .
  • Proteins of interest were isolated by purification of the lysate on Ni-NTA column (suspension buffer as washing buffer, 500 mM imidazole in elution buffer). They were further diluted to 150 mM NaCl and repurified on HiTrap Heparin (for PrimPol) or HiTrap Q (for DatZ) columns (1 M NaCl and no imidazole in elution buffer).
  • Histidine tags were removed from the proteins by incubation with his-tagged TEV enzyme overnight. After removing TEV on Ni-NTA column, proteins were further purified on Superdex 200 10/300 column with 25 mM Tris-HCl pH 8, 150 mM NaCl (for PrimPol-N190 crystallisation a 16 mM concentration of NaCl was used). All purification columns were from Life Sciences. Protein purity was assessed on an SDS gel (BioRad). The enzymes were concentrated to 7-19 mg ml "1 with Amicon Ultra 10k and 30k MWCO centrifugal filters (Merck), flash frozen in liquid nitrogen and stored directly at -80°C, with no glycerol added.
  • Selenomethionine (SeMet) version of PP-N190 was prepared using the same expression strain and construct. Bacteria grew in medium with 6 g L -1 Na 2 HP0 4 , 3 g L -1 KH2PO4, 1 g L -1 NH4CI, 0.5 g L -1 NaCl, 2 mM MgS0 4 , 100 ⁇ M CaCl 2 and 0.4% glucose, supplemented with metal solution (5000x): 5 g L -1 FeCl 2 , 184 mg L -1 CaCb, 64 mg L -1 H3BO3, 40 mg L -1 MnCb, 18 mg L -1 C0CI2, 4 mg L -1 CuCb, 340 mg L -1 ZnCb, 605 mg L -1 Na2MoO4, 1.3 m ⁇ L -1 and 0.8% cone.
  • metal solution 5000x
  • Radioactivity polymerase activity tests if not stated otherwise for a particular condition, were executed in 200 mM Tris-HCl pH 8 and 50 mM MgCl 2 , with 50 nM of dTioGG overhang DNA template, 50 nM of a-32P 5 ’-labeled DNA primer complementary to template upstream sequence, 250 ⁇ M dNTP mix and 1 ⁇ M of PrimPol (20 min of incubation) at 37°C.
  • Fluorescence polymerase activity tests were executed in 20 mM Tris-HCl pH 7 and 5 mM MgCl2, with 3 ⁇ M of dTi2 overhang DNA template, 1.5 ⁇ M of FAM 5’-labeled DNA primer, 500 ⁇ M dNTP mix, 0.5 ⁇ M of Prim Pol constructs (10 min of incubation) and 1 ⁇ M of DatZ (8 min) at 37°C.
  • the Klenow polymerase used as a control was at 5 U in 50 m ⁇ (10 min incubation).
  • DNA was hybridized by heating up to 95°C and gradually cooling to reaction temperature. Reactions were terminated by adding two volumes of a buffer containing 10 mM EDTA, 98% formamide, 0.1% xylene cyanol and 0.1% bromophenol blue, and stored in 4°C. Products were preheated at 95°C for 10 min, before being separated with polyacrylamide gel electrophoresis and visualised by FAM fluorescence or radioactivity on Typhoon FLA 9000 imager.
  • oligonucleotides were ordered from Eurogentec, chemicals from Sigma-Aldrich, Klenow polymerase from Takara Bio, standard dNTPs from Fermentas (Thermo Fisher Scientific) and dZTP from TriLink BioTechnologies.
  • nucleotides were injected on the column and eluted with 3 min of isocratic flow of the suspension buffer followed by a linear gradient of 0-200 mM NH4CI over 10 min (1ml min -1 ). Eluted nucleotides were detected by absorbance at 260 nm, measured in arbitrary units [mAu] High- purity nucleotides and chemicals were bought from Sigma-Aldrich, and HPLC-quality acetonitrile was from Serva. Crystallography and structural analysis
  • the structure of PrimPol-N190 was solved by SAD technique using SeMet derivative of the protein and data sets collected at the selenium edge (0.9807 A) using the SHELX C/D/E programs 60 .
  • the structure of DatZ was solved by the sulphur-SAD (S-SAD) technique at 1.7712 ⁇ wavelength.
  • the anomalous double difference Fourier map for Zn was calculated from data collected at 9.67 and 9.66 keV (Zn peak and pre edge).
  • DatZ ultrahigh resolution structure was obtained by merging 3 individual datasets taken on the same crystal. Structures of DatZ with bound Co 2+ and dATP were obtained by growing crystals with 10 mM C0CI2 and 10 mM EDTA, respectively (the latter at pH 7).
  • Example 3 A DNA primase-polymerase nonspecific of A or Z
  • AEP is the eukaryotic and archaeal counterpart of DnaG, the bacterial primase superfamily 17 18 , to which it is structurally unrelated. Its members are found in all domains of life, including viruses, and are involved in several DNA transactions including not only DNA priming and replication, but also DNA repair through non-homologous end-joining (NHEJ) 18 .
  • NHEJ non-homologous end-joining
  • AEP proteins are often fused or physically interact with DNA helicases, and also with partners containing helix bundle domains (like PriCT-1, PriCT-2, PriL or PriX) that interact with the template ssDNA 17 19 22 .
  • AEP is capable of replicating the whole genome of the NrS-1 phage 23 .
  • AEP is not officially included yet in the standard DNA polymerase classification encompassing polymerases from families A, B, C, D, X, Y and RT 24,25 despite an incentive to do so 18 , members of the AEP superfamily share the classical Klenow fold with families A, B and Y DNA polymerases 26 .
  • BLAST searches 28 indicate it matches best the VirE family of single-stranded DNA-binding proteins of function not described in the literature 29 .
  • homology detection combined with structure prediction performed with HHpred 30 finds high-scoring similarity between viral hexameric DNA helicase structures, the closest being from bovine papillomavirus (2GXA).
  • Residue Y63 plays the role of a steric gate for ribonucleotides, allowing only dNTPs in the catalytic site 34 .
  • Residues E85, D87 that can vary to Asp and Glu, respectively coordinate a divalent metal ion (M 2+ ) in the B site, that positions the triphosphate of the incoming nucleotide (dNTP) during polymerisation; this triphosphate is further stabilized by interactions with T112, K115, HI 18 and R157 (possibly varying respectively to Ser, Arg, Asn and Lys) 35 38 .
  • the three negatively charged residues E85, D87 and D146 are crucial for the polymerase and primase activity, as shown in the related human PrimPol 41 .
  • S-2L PP-N190 we noticed a significant positional shift of residue D87 compared to other AEP structures, along with the conservation among the close relatives of the neighbouring residue D88, which is exposed to the solvent.
  • incoming (d)NTP’s conformation is largely conserved across all 8 unique AEP structures with a bound nucleotide (PDB IDs: 1V34, 2ATZ, 2FAQ, 3PKY, 5L2X, 50F3, 6JON, 6R5D).
  • PDB IDs: 1V34, 2ATZ, 2FAQ, 3PKY, 5L2X, 50F3, 6JON, 6R5D the catalytic site is open to the solvent and there is no selection on the incoming nucleotides; after superposition with these structures, PP-N190 presents no structural feature that could lead to a Z vs A specificity during the polymerase reaction.
  • Example 6 DatZ: a triphosphohydrolase specific of dATP
  • the base moiety of dA snugly fits in the catalytic pocket below a relatively flexible element (as indicated by higher B-factors), with the P79 residue on its tip (Fig. 5b).
  • a catalytic divalent ion is found in the vicinity of dA’s free 5 ’-OH group, even though no divalent ion was added in buffers during purification or crystallisation.
  • the side chain of residue 122 is ideally positioned to sterically exclude the amino group in position 2 of the purine ring of G or Z and provides an immediate explanation for the observed specificity of the enzyme.
  • W20 side chain constitutes a steric hindrance for the 2’ hydroxyl group of any ribose-based nucleotide.
  • Example 8 A two-metal-ion mechanism of DatZ
  • DatZ uses a typical two-metal-ion mechanism to dephosphorylate dATP. While the ion B 2+ stabilizes the leaving 05’ atom and one oxygen of the a-phosphate (P ⁇ ), ion A 2+ positions a hydroxide (OH ) in an attacking position opposite to 05’.
  • Example 9 The active site of DatZ: conservation and mutagenesis
  • a number of phages that contain a close homologue of purZ gene in their genome also contain a homologue of datZ. Looking for the conservation of residues crucial for both a dATPase activity and absence of dZTPase activity, as identified by the present structural studies, we built a multialignment of these closely -related DatZ sequences (Fig. 11). We found that all residues stabilizing both catalytic metal ions are strictly conserved, as well as R19, K81 and K116 interacting with a-, b- and g-phosphates. Residues W20, 122 and P79, interacting with the base, are conserved or involve conservative substitutions. Additionally, residues Q29, A32 and G74 are strictly conserved among close DatZ homologues, highlighting their possible importance for protein structure (tertiary or quaternary) and/or its dynamics.
  • AEP Due to the divergent nature of the AEP superfamily, its classification is far from trivial. The universal presence of its members, encompassing all three domains of life, viruses and plasmids, testifies about its ancient origin 19 . Advanced sequence-based computational methods divided the superfamily into four clades: AEP proper, NCLDV-herpesvirus primase, PrimPol, and BT4734-like 17 . In another approach using sequence clustering, AEPs were distributed into multiple groups, with the newly defined PrimPol-PV 1 supergroup 19 . S-2L PrimPol belongs to the Anabaena (. Nostocaceae ) A113500-like (sub)family within the PrimPol clade or the PrimPol - PV 1 supergroup, depending on the classification.
  • the human HD phosphohydrolase HDDC2 (HD domain- containing protein 2) shows a metal coordination identical to the one seen in S-2L DatZ; it is the only other homologue structure with two ions (Mg 2+ ) present in both sites A and B (PDB ID 4DMB).
  • PDB ID 4DMB two ions
  • uracil (U) - are used to encode genetic information in DNA or RNA polymers, respectively, and their metabolic pathway is conserved across all branches of cellular life " This principle can be extended to the vast majority of smaller biological entities, such as viruses, which are important agents of evolution (2) . Despite that, a genetic material may be subject to many natural nucleobase modifications. These modified nucelotides can either constitute an additional, epigenetic information, or exist as a consequence of an arms race with restriction-modification systems in the hosts.
  • dsDNA double-stranded DNA
  • phages T2, T4 and T6 systematically substitute 5- hydroxymethylcytosine (5hmC) for cytosine (3)
  • phage 9g contains archaeosine (G+) which replaces a quarter of its genomic guanine (4) ', enabling its DNA to resist 71 % of cellular restriction enzymes (5)
  • dATP-specific triphosphatase eliminates dATP from the pool of available dNTPs substrates.
  • Such alteration proved essential for conferring the nucleotide specificity to the otherwise non-discriminative DNA primase-polymerase (PrimPol) of cyanophage S-2L, that we identified as the sole DNA polymerase of S-2L.
  • PrimaryPol DNA primase-polymerase
  • Example 13 Materials and Methods Used in Examples 14-19
  • S-2L DNA was isolated from phage lysate of Synechococcus elongatus culture, adapting the techniques commonly used for phage X DNA.
  • the S-2L genomic library was prepared by successive DNA fragmentation, adapter ligation and amplification by GATC Biotech. Libraries were sequenced using Illumina HiSeq. 14,198,980 sequenced reads (2 x 150 bp) were obtained, covering 4,259,694,000 bases. Resulting reads were mapped against the GenBank AX955019.1 S-2L reference sequence using Minimap2 v2.17 (47) . Variant calling was carried out by Freebayes vl.3.2 (48) and later filtered by VCFLIB (49) .
  • a consensus sequence was generated using VCF Consensus Builder vO.1.0 (50) .
  • Annotation of the consensus sequence was earned out by translated BLAST (17) .
  • Representation of the S-2L genome was made with SnapGene Viewer (51) . Phages related to S-2L through purZ and datZ genes were found by homology searches using NCBI BLAST (17) .
  • proteins were further purified on Superdex 200 10/300 column with 25 mM Tris-HCl pH 8, 300 mM NaCl. All purification columns were from Life Sciences. Protein purity was assessed on an SDS gel (BioRad). The enzymes were concentrated to 10-19.5 mg ml -1 with Amicon Ultra 10k and 30k MWCO centrifugal filters (Merck), flash frozen in liquid nitrogen and stored directly at -80°C, with no glycerol added.
  • nucleotides were injected on the column and eluted with 3 min of isocratic flow of the suspension buffer followed by a linear gradient of 0-200 mM NH4C1 over 12 min (1ml min -1 ). Eluted nucleotides were detected by absorbance at 260 nm, measured in arbitrary units [mAu] High- purity nucleotides and chemicals were bought from Sigma Aldrich, and HPLC-quality acetonitrile was from Serva.
  • a suspension buffer 25 mM Tris-HCl pH 8, 0.5% acetonitrile
  • DNA of plasmids and phages was digested to nucleosides with Nucleoside Digestion Mix (NEB) and separated on Amicon Ultra-0.5 mF 10K centrifugal filters. Nucleosides were analysed by FCMS, with standard nucleoside controls (Sigma Aldrich) or dZ (Biosynth Carbosynth). dZ was found to elute at the same position than dG, but with strikingly different MS/MS profiles.
  • NEB Nucleoside Digestion Mix
  • the A-to-Z substitution rate was taken as a ratio of dZ content to dZ+dA; it was further normalized to newly synthesized plasmid fraction by isolating plasmids prior to induction and accounting for the difference in the DNA yield.
  • Nucleotide constraints for structure refinement were obtained using Grade Web Server (57) .
  • the structure of S-2L PurZ was solved by molecular refinement with ⁇ VC8 PurZ model (PDB ID: 6FM1).
  • the structure of MazZ was solved using anomalous signal from bound Mn 2+ ions that guided automatic model-building in Phenix’ AutoSol, with final manual reconstruction steps using Coot (58) .
  • PurZ quaternary structure analysis was performed using PISA (59) .
  • Fluorescence polymerase activity test was executed in 20 mM Tris-HCl pH 7 and 5 mM MgCl 2 , with 3 ⁇ M of dT24 overhang DNA template, 1 ⁇ M of FAM 5 ’-labeled DNA primer and various concentrations of either dATP or dZTP.
  • the Klenow polymerase was added to final concentration of 5 U in 50 pi.
  • the assay was conducted at 37°C for 5 min.
  • DNA was hybridized by heating up to 95°C and gradually cooling to reaction temperature.
  • Reactions were terminated by adding two volumes of a buffer containing 10 mM EDTA, 98% formamide, 0.1% xylene cyanol and 0.1% bromophenol blue, and stored in 4°C. Products were preheated at 95°C for 10 min, before being separated with polyacrylamide gel electrophoresis and visualised by FAM fluorescence on Typhoon FLA 9000 imager. All oligonucleotides were ordered from Eurogentec, chemicals from Sigma-Aldrich, Klenow polymerase from Takara Bio, dATP from Fermentas (Thermo Fisher Scientific) and dZTP from TriLink BioTechnologies.
  • the replication-related block includes pplA encoding a non- selective primase-polymerase (PrimPol).
  • PrimaryPol a non- selective primase-polymerase
  • the genes of N6-succino-2- amino-2’-deoxyadenylate synthetase (purZ) and dATP-specific HD phosphohydrolase (datZ) involved in 2-aminoadenine metabolism Importantly, the new sequences of all these genes are identical with the previous ones.
  • NCBI BLAST 17
  • Fig. 17C we show the relevant sections of their genomes, along with their phylogeny constructed on datZ and purZ.
  • cyanophage S-2L has a unique genome composition, much different from closely related Siphoviridae phages.
  • the mazZ gene is always co-conserved with datZ and purZ, although in one of two possible isoforms.
  • Example 16 Structure of S-2L PurZ with bound dGMP and dATP at 1.7 A resolution
  • Enzymes from AdSS family are known to be functional dimers (19, 20) or even tetramers (21) . Even though there is one molecule of S-2L PurZ in the crystallographic asymmetric unit, a dimer can be reconstructed by using a crystallographic two-fold axis (Fig. 19C). As expected, there is a large surface interaction area of 2488 A2 between the pair of proteins. Although the enzyme crystallized in a different space group from the one observed in ⁇ VC8 enzyme’s crystals, this observation can be extended to the original PurZ structure as well: analysis of crystal contacts between symmetry-related molecules with PISA indicates indeed the presence of a stable dimer in this case. [0238] Very low B-factors (Fig.
  • Residues G22, S23, N49, A50, T132, QI 90, V241 and the backbone atoms of Y21 form a pocket where the base moiety of dGMP is placed.
  • Y21-S23 are positioned next to the amino group of dGMP in position 2, which is absent in the IMP substrate of AdSS.
  • the sidechain of S23 in the vicinity of this amino group appears to be specific of phagic PurZ and their immediate homologues as a mutation from a conserved aspartate and is likely to be responsible for the guanine specificity. Its negative partial charge is further stabilized by the close R280 sidechain.
  • Residues G130, S 131 , R146, ⁇ 203, C204, T205 and R240 complete the dGMP pocket.
  • R146 is protruding from the dimer's other molecule, forming an ion pair with the negative charge of the ⁇ -phosphate, a feature already described for E. colt AdSS as well (23) .
  • V241 is ideally placed to sterically interfere with the ribose 2'-OH group in ribonucleotides, establishing preference towards the substrate in the deoxy- form. Although it is conserved in both AdSS and PurZ, its position is spatially shifted in the latter: the reason behind this difference is examined in the discussion section.
  • Example 17 A (d)GTP-specific S-2L MazZ belonging to MazG-like pyrophosphatase family
  • the third element of Z-cluster, MazZ is a MazG-like pyrophosphatase. Closely related MazG(-like) and HisE proteins share an evolutionary history with dimeric dUTPases, and all three constitute the all-a NTP pyrophosphohydrolase (NTP-PPase) superfamily (24) .
  • NTP-PPase NTP pyrophosphohydrolase
  • MazG and MazG- like proteins play house-cleaning or related functions, like degrading an alarmone (p)ppGpp (25) or aberrant dUTP (26) .
  • HisE members are closely related and are involved in bacterial histidine synthesis pathway as a phosphoribosyl- ATP pyrophosphatase (27) , often fusing with HisI that catalyzes the following reaction in the pathway (28) .
  • Varying specificity of all NTP- PPase enzymes is reflected in the divergence of residues in contact with the ligand - only the catalytic residues and fold-related hydrophobic ones are consistently conserved across the superfamily (24) .
  • the former were identified to coordinate up to three Mg 2+ ions, two of which were proposed to participate in the two-metal-ion mechanism (29, 30) .
  • S-2L MazZ shows no substantial activity with other deoxy nucleotides, including dZTP: the enzyme is a comparatively weak dTTPase, and trace of activity in dATP dephosphorylation starts to be visible only with incubation times 8 times longer than in usual conditions. Thus, S-2L MazZ evidently exerts strong discrimination on base moiety, but seemingly none on the 2’ -OH group.
  • Example 18 Structure of S-2L MazZ, an NTP-PPase with a MazG-HisE fold
  • NTP-PPase enzymes The basic unit of NTP-PPase enzymes is a four- or five-a-helical chain; in a notable exception of A. fulgidus MazG-like protein, an extended helix structurally compensates for the lack of another despite its opposite directionality (PDB ID: 2P06).
  • A. fulgidus MazG-like protein an extended helix structurally compensates for the lack of another despite its opposite directionality (PDB ID: 2P06).
  • two of these 4-helical chains intertwine, forming a tight dimer with two symmetric catalytic sites made from both subunits.
  • the ancestor of dimeric dUTPases underwent gene duplication and fusion, creating a covalent equivalent of such a dimer that subsequently lost one redundant catalytic site (24) .
  • ions A, B and C respectively; they are strictly equivalent to the Mg 2+ ions found in C. jejuni dimeric dUTPase (PDB 1W2Y), another member of the superfamily (29)
  • the ion A is coordinated by residues E34, E35 and E38; ion B by the E50 and D53; and ion C by E38 and E50.
  • Example 19 Full 2-aminoadenine metabolic pathway in S-2L and successful DNA conversion of unrelated bacteria and phages
  • the finishing steps of dZTP synthesis are carried out by non-specific host enzymes participating in dATP production, as their genes are absent in the viral genomes; they may even be additionally upregulated by the infected host cell sensing and fighting the depletion of its dATP reserve.
  • the post-infection composition of cellular dNTP pool thus consists of dATP being replaced by dZTP, and the Primase-Polymerase of S-2L, non-specific to A or Z, readily inserts the surrogate base in the cyanophage DNA in front of any instructing thymine.
  • residue V241 mentioned above as important for ribose discrimination, is also present in standard AdSS family representatives.
  • phages PurZ an insertion deforms the loop that contains it, pulling it slightly closer to the C2' ribose atom - from 4.3-5.3 ⁇ as seen in AdSS (PDB IDs: 1P9B, 2DGN, 2GCQ, 4M9D, 5134, 5K7X) to 3.7 A (S-2L) and 3.6 ⁇ ( ⁇ VC8).
  • DNA polymerases of E. coli and phages T7 and T4 allow for incorporation of 2-aminoadenine, which was partially documented before (41, 42) .
  • Klenow fragment of E. coli Poll a family A DNA polymerase participating in plasmid replication while not being the main replicative polymerase (43) ', does not discriminate between A and Z whatsoever (Fig. 29).
  • Fig. 29 we expect that this relaxed specificity holds true for most DNA polymerases, as a simple exclusion mechanism of the additional 2-amino group in a purine would hamper guanine incorporation.
  • Table 1 Diffraction data collection and Model Refinement statistics. Numbers in parenthesis refer to the highest-resolution shell.
  • Table 4 Diffraction data collection and Model Refinement statistics. Numbers in parenthesis refer to the highest-resolution shell.
  • Table 5 Position of replication-related protein genes on the new, high-coverage S-2L genome sequence (MW334946). Genes datZ, mazZ and purZ are highly compact, overlapping at their very ends.
  • Example 21 References Cited in Examples 12-20
  • Siphoviridae insights from dairy phages. Molecular Microbiology 39, 213-223 (2001).

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Biochemistry (AREA)
  • Genetics & Genomics (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Engineering & Computer Science (AREA)
  • Virology (AREA)
  • Biophysics (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Veterinary Medicine (AREA)
  • Immunology (AREA)
  • Public Health (AREA)
  • Animal Behavior & Ethology (AREA)
  • Biomedical Technology (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Epidemiology (AREA)
  • Pharmacology & Pharmacy (AREA)
  • General Engineering & Computer Science (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

A recombinant cell or virus comprising a genome with a ratio of 2-aminoadenine (Z) to the sum of adenine plus 2-aminoadenine (Z/(Z+A)) of at least 0.05; an isolated nucleic acid sequence comprising at least 50% coverage of the genome of a reference organism or virus wherein the nucleic acid library comprises a Z to (Z+A) ratio of at least 0.05, and wherein the cell, virus or reference organism does not naturally comprise a datZ gene, mazZ gene, and purZ gene. A method of making a stabilized nucleic acid, comprising: providing a cell that does not naturally comprise a datZ gene, mazZ gene, and purZ gene; expressing recombinant DatZ, MazZ, PurZ proteins in the cell for a period of time sufficient for incorporation of 2-aminoadenine into nucleic acid in the cell to form a stabilized nucleic acid comprising 2-aminoadenine instead of adenine and to develop an improved resistance to endonucleases.

Description

2-AMINOADENINE MODIFIED NUCLEIC ACIDS, CELLS COMPRISING THEM, AND METHODS OF PRODUCING THEM
BACKGROUND
[0001] Bacteriophages have long been known to use modified bases in their DNA to prevent cleavage by the host’s restriction endonucleases. Among them, cyanophage S-2L is unique because its genome has all its adenines (A) systematically replaced by 2-aminoadenines (Z). 2- aminoadenine also makes a base-pair with thymine (Z:T) but with three hydrogen bonds instead of the two used in adenine: thymine (A:T) Watson-Crick classical base-pair. Therefore, replacement of A by Z can provide a more stable nucleic acid molecule which can also be endowed with an increased resistance to endonucleases. Despite the theoretical advantages of such modified nucleic acids, to date their availability has been limited. There is a need in the art for methods or biochemical pathways for making such nucleic acids as well as cells comprising them and the modified nucleic acids themselves. This invention satisfies these and other needs in certain embodiments.
DESCRIPTION OF THE INVENTION
[0002] The examples demonstrate that a cluster of three genes found in a cyanophage called S- 2L, when used in combination, is the minimum set of genes that should be used to introduce in a living organism a new base called 2-aminoadenine, or diaminopurine, in its DNA. The examples demonstrate identification and full characterization of datZ and mazZ, and demonstrate how these genes may be used in combination with purZ to incorporate 2-aminoadenine into DNA.
[0003] The examples also report the crystal structures of these enzymes.
[0004] Nucleic acids in which 2-aminoadenine is incorporated is more stable than natural DNA, and therefore may display increased and longer information storage capacities. Nucleotide and Amino Acid Sequences
[0005] The nucleotide sequences of the datZ, mazZ, purZ, and pplA native genes are shown in the following table. PplA is the PrimPol polymerase from the S-2L phage.
Figure imgf000003_0001
Figure imgf000004_0001
Figure imgf000005_0002
[0006] The nucleotide sequences of the datZ, mazZ, purZ, and pplA codon optimized genes are shown in the following table:
Figure imgf000005_0001
Figure imgf000006_0001
Figure imgf000007_0002
[0007] The amino acid sequences of DatZ, MazZ, PurZ and PrimPol are shown in the following table:
Figure imgf000007_0001
Figure imgf000008_0002
[0008] The positions of these genes in the S-2L genome (MW334946) are the following:
Figure imgf000008_0001
[0009] Genes datZ, mazZ and purZ are highly compact, overlapping at their very ends. Recombinant cells and viruses
[0010] In a first aspect, this invention provides a recombinant cell or virus comprising a genome comprising a 2-aminoadenine/thymine (Z/T) to (adenine/thymine (A/T) +
2-aminoadenine/thymine (Z/T)) ratio (Z/T to (A/T + Z/T) ratio) of at least 0.05, wherein the cell or virus does not naturally comprise a datZ gene, does not naturally comprise a mazZ gene, and does not naturally comprise a purZ gene. [0011] As used herein, a virus does not naturally comprise a datZ gene, does not naturally comprise a mazZ gene, and does not naturally comprise a purZ gene if the virus does not naturally comprise the genes in its genome. As used herein, a cell does not naturally comprise a datZ gene, does not naturally comprise a mazZ gene, and does not naturally comprise a purZ gene if the cell does not naturally comprise the genes in its genome and if the cell also is not naturally infected by a virus that introduces the genes into the cell.
[0012] The recombinant cell may be a prokaryotic cell or a eukaryotic cell. In some preferred embodiments, it is a prokaryotic cell. In a preferred embodiment, the recombinant cell is an E. coli cell.
[0013] The virus may be a virus that infects eukaryotic cells or it may be a phage that infects prokaryotic cells. In a preferred embodiment, the virus is a bacteriophage. In another preferred embodiment, the virus is a phage that infects E. coli, such as T7 or lambda.
[0014] The Z/T to (A/T + Z/T) ratio is the ratio of the number of Z/T pairs to the sum of A/T and Z/T pairs in the genome. This value may be determined by measuring the Z content and determining the ratio of Z to (A+Z) in the genome.
[0015] In some embodiments, the occurrence of Z is homogenous throughout the genome. In some embodiments, the occurrence of Z is not homogenous throughout the genome. In some embodiments, the occurrence of Z in the genome correlates with regions of the genome that were recently replicated.
[0016] In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.10. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.15. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.20. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.25. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.30. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.35. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.40. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.45. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.50. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.55. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.60. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.65. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.70. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.75. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.80. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.85. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.90. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.95. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.98. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.99.
[0017] In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 1%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 2%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 3%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 4%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 5%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 6%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 7%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 8%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 9%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 10%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 11%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 12%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 13%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 14%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 15%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 16%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 17%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 18%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 19%. In some embodiments, the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 20%.
[0018] In general, the recombinant cell or virus is based on a starting cell or virus that does not naturally comprise a datZ gene, does not naturally comprise a mazZ gene, and does not naturally comprise a purZ gene. Therefore, the starting cell or virus does not naturally comprise Z in its nucleic acid. It is a feature of certain embodiments of this invention that by expressing a DatZ, a MazZ and a PurZ in a recombinant cell it is possible to produce a nucleic acid comprising Z.
[0019] In some embodiments, the recombinant cell or virus comprises a coding sequence for a polypeptide sequence that is at least 80% identical to DatZ (SEQ ID NO: 9), a coding sequence for a polypeptide sequence that is at least 80% identical to MazZ (SEQ ID NO: 10), and a coding sequence for a polypeptide sequence that is at least 80% identical to PurZ (SEQ ID NO: 11), wherein the coding sequences are operatively linked to regulatory control elements for expression in the recombinant cell. The regulatory control elements may comprise a promoter and/or a terminator. The coding sequences may be present on one, two, or three different nucleic acid molecules, such as for example plasmids.
[0020] In some embodiments, the recombinant cell or virus comprises a coding sequence for a polypeptide sequence that is at least 85% identical to DatZ (SEQ ID NO: 9), a coding sequence for a polypeptide sequence that is at least 85% identical to MazZ (SEQ ID NO: 10), and a coding sequence for a polypeptide sequence that is at least 85% identical to PurZ (SEQ ID NO: 11). In some embodiments, the recombinant cell or virus comprises a coding sequence for a polypeptide sequence that is at least 90% identical to DatZ (SEQ ID NO: 9), a coding sequence for a polypeptide sequence that is at least 90% identical to MazZ (SEQ ID NO: 10), and a coding sequence for a polypeptide sequence that is at least 90% identical to PurZ (SEQ ID NO: 11). In some embodiments, the recombinant cell or virus comprises a coding sequence for a polypeptide sequence that is at least 95% identical to DatZ (SEQ ID NO: 9), a coding sequence for a polypeptide sequence that is at least 95% identical to MazZ (SEQ ID NO: 10), and a coding sequence for a polypeptide sequence that is at least 95% identical to PurZ (SEQ ID NO: 11). In some embodiments, the recombinant cell or virus comprises a coding sequence for a polypeptide sequence that is at least 98% identical to DatZ (SEQ ID NO: 9), a coding sequence for a polypeptide sequence that is at least 98% identical to MazZ (SEQ ID NO: 10), and a coding sequence for a polypeptide sequence that is at least 98% identical to PurZ (SEQ ID NO: 11). In some embodiments, the recombinant cell or virus comprises a coding sequence for a polypeptide sequence that is at least 99% identical to DatZ (SEQ ID NO: 9), a coding sequence for a polypeptide sequence that is at least 99% identical to MazZ (SEQ ID NO: 10), and a coding sequence for a polypeptide sequence that is at least 99% identical to PurZ (SEQ ID NO: 11).
[0021] In some embodiments, the coding sequence for a polypeptide sequence that is at least 80% identical DatZ (SEQ ID NO: 9) is selected from datZ (SEQ ID NO: 1) and codon optimized datZ (SEQ ID NO: 5), and/or the coding sequence for a polypeptide sequence that is at least 80% identical MazZ (SEQ ID NO: 10) is selected from mazZ (SEQ ID NO: 2) and codon optimized mazZ (SEQ ID NO: 6), and/or the coding sequence for a polypeptide sequence that is at least 80% identical PurZ (SEQ ID NO: 11) is selected from purZ (SEQ ID NO: 3) and codon optimized purZ (SEQ ID NO: 7).
[0022] In some embodiments, the recombinant cell or virus comprises a coding sequence for DatZ (SEQ ID NO: 9), a coding sequence for MazZ (SEQ ID NO: 10), and a coding sequence for PurZ (SEQ ID NO: 11).
[0023] In some embodiments, one or more of the coding sequences are codon optimized.
[0024] In some embodiments, the coding sequences are present on one or more plasmids.
[0025] In some embodiments, the coding sequences are present on one or more chromosomes. [0026] In some embodiments, the recombinant cell comprises DatZ (SEQ ID NO: 9), MazZ
(SEQ ID NO: 10) and PurZ (SEQ ID NO: 11).
[0027] Also provided are compositions comprising a plurality of the recombinant cells of the invention. For example, the composition may be a plurality of recombinant cells present in or on a culture medium, a plurality of recombinant cells frozen in a container, or a plurality of recombinant cells in a lyophilized composition.
[0028] In some embodiments, the recombinant cell is not a cyanophage S-2L.
[0029] In some embodiments, the recombinant cell is not a cell that is naturally infected by a cyanophage S-2L.
[0030] In some embodiments, the recombinant virus is not a cyanophage. [0031] In some embodiments, the recombinant virus is not a Vibrio phage.
[0032] In some embodiments, the recombinant cell is not a cyanobacteria.
[0033] In some embodiments, the recombinant cell is not a member of the genus Vibrio. [0034] In some embodiments, the recombinant cell is not a cyanobacteria or a member of the genus Vibrio, and the recombinant virus is not a cyanophage or a Vibrio phage.
[0035] In some embodiments, the recombinant cell is a bacteria.
Methods of making recombinant cells and viruses
[0036] Also provided are methods of making a nucleic acid comprising 2-aminoadenine (Z). In general, the methods may comprise providing a recombinant cell as described herein and isolating the nucleic acid comprising Z from the cell. In some embodiments, the isolated nucleic acid is a plasmid. The plasmid may be a bacterial plasmid and may comprise an origin of replication and/or a gene encoding a selectable marker. In some embodiments, the isolated nucleic acid is a chromosome. In some embodiments, the isolated nucleic acid is a total cell nucleic acid preparation. The isolated nucleic acid may comprise nucleic acid may be present in a composition comprising molecules that comprise Z and others that do not.
Nucleic acid libraries
[0037] Nucleic acid libraries are also provided. As used herein, a “nucleic acid library” is a plurality of nucleic acids in any form that are all obtained from the genome of a reference organism. Reference may be made to the “coverage” of the genome of the reference organism by the library. The coverage is the subset of the genome that is represented by at least one copy in the library. Thus, a coverage of 50% indicates that the library comprises at least one copy of 50% of the positions in the genome.
[0038] In some embodiments, the nucleic acid library is isolated. In some embodiments, it is provided in cloned form, and may be present in a host such as a bacteria or phage host. In some embodiments, the nucleic acid library is provided in the form of purified nucleic acid.
[0039] In some embodiments, the isolated nucleic acid library comprises at least 50% coverage of the genome of a reference organism or virus, wherein the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.05, and wherein the reference organism does not naturally comprise a datZ gene, does not naturally comprise a mazZ gene, and does not naturally comprise a purZ gene.
[0040] In some embodiments, the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.10. In some embodiments, the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.15. In some embodiments, the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.20. In some embodiments, the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.25. In some embodiments, the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.30. In some embodiments, the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.35. In some embodiments, the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.40. In some embodiments, the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.45. In some embodiments, the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.50. In some embodiments, the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.55. In some embodiments, the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.60. In some embodiments, the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.65. In some embodiments, the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.70. In some embodiments, the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.75. In some embodiments, the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.80. In some embodiments, the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.85. In some embodiments, the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.90. In some embodiments, the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.95. In some embodiments, the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.98. In some embodiments, the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.99.
[0041] In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 1%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 2%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 3%. In some embodiments, the nucleic acid library comprises a 2- aminoadenine (Z) content of at least 4%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 5%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 6%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 7%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 8%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 9%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 10%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 11%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 12%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 13%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 14%. In some embodiments, the nucleic acid library comprises a 2- aminoadenine (Z) content of at least 15%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 16%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 17%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 18%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 19%. In some embodiments, the nucleic acid library comprises a 2-aminoadenine (Z) content of at least 20%.
[0042] In some embodiments, the reference organism is prokaryotic. In some embodiments, the reference organism is eukaryotic. In some embodiments, the reference organism is E. coli. In some embodiments, the reference virus infects eukaryotic cells. In some embodiments, the reference organism is a phage, such as a bacteriophage.
[0043] In some embodiments, the reference virus is not a cyanophage S-2L.
[0044] In some embodiments, the reference cell is not a cell that is naturally infected by a cyanophage S-2L.
[0045] In some embodiments, the reference virus is not a cyanophage. [0046] In some embodiments, the reference virus is not a Vibrio phage.
[0047] In some embodiments, the reference cell is not a cyanobacteria.
[0048] In some embodiments, the reference cell is not a member of the genus Vibrio.
[0049] In some embodiments, the reference cell is not a cyanobacteria or a member of the genus Vibrio, and the reference virus is not a cyanophage or a Vibrio phage.
[0050] In some embodiments, the reference cell is a bacteria.
Methods of making a stabilized nucleic acid
[0051] The recombinant cells disclosed herein may be used to make a stabilized nucleic acid because nucleic acids having Z incorporated therein may have increased stability. Accordingly, in another aspect this invention provides methods of making a stabilized nucleic acid. The methods may comprise providing a cell that does not naturally comprise a datZ gene, does not naturally comprise a mazZ gene, and does not naturally comprise a purZ gene; and expressing recombinant DatZ, MazZ and PurZ proteins in the cell for a period of time sufficient for incorporation of 2-aminoadenine (Z) into nucleic acid in the cell to form a stabilized nucleic acid comprising 2-aminoadenine (Z). In some embodiments, the methods further comprise isolating the stabilized nucleic acid from the cell. In some embodiments, the stabilized nucleic acid is endogenous to the cell. In some embodiments, the stabilized nucleic acid is heterologous to the cell. In some embodiments, the stabilized nucleic acid is a viral nucleic acid.
[0052] In some embodiments of the methods, the cell is not a cyanophage S-2L. [0053] In some embodiments of the methods, the cell is not a cell that is naturally infected by a cyanophage S-2L.
[0054] In some embodiments of the methods, the cell is not a cyanobacteria.
[0055] In some embodiments of the methods, the cell is not a member of the genus Vibrio. [0056] In some embodiments of the methods, the cell is not a cyanobacteria or a member of the genus Vibrio, and the recombinant virus is not a cyanophage or a Vibrio phage.
[0057] In some embodiments of the methods, the cell is a bacterium.
[0058] In some embodiments, the methods comprise introducing into the cell a recombinant coding sequence for a polypeptide sequence that is at least 80% identical to DatZ (SEQ ID NO: 9), introducing into the cell a recombinant coding sequence for a polypeptide sequence that is at least 80% identical to MazZ (SEQ ID NO: 10), and introducing into the cell a recombinant coding sequence for a polypeptide sequence that is at least 80% identical to PurZ (SEQ ID NO: 11), wherein the coding sequences are operatively linked to regulatory control elements for expression is the recombinant cell.
[0059] In some embodiments, the methods comprise introducing into the cell a recombinant coding sequence for a polypeptide sequence that is at least 85% identical to DatZ (SEQ ID NO: 9), a coding sequence for a polypeptide sequence that is at least 85% identical to MazZ (SEQ ID NO: 10), and a coding sequence for a polypeptide sequence that is at least 85% identical to PurZ (SEQ ID NO: 11). In some embodiments, the methods comprise introducing into the cell a recombinant coding sequence for a polypeptide sequence that is at least 90% identical to DatZ (SEQ ID NO: 9), a coding sequence for a polypeptide sequence that is at least 90% identical to MazZ (SEQ ID NO: 10), and a coding sequence for a polypeptide sequence that is at least 90% identical to PurZ (SEQ ID NO: 11). In some embodiments, the methods comprise introducing into the cell a recombinant coding sequence for a polypeptide sequence that is at least 95% identical to DatZ (SEQ ID NO: 9), a coding sequence for a polypeptide sequence that is at least 95% identical to MazZ (SEQ ID NO: 10), and a coding sequence for a polypeptide sequence that is at least 95% identical to PurZ (SEQ ID NO: 11). In some embodiments, the methods comprise introducing into the cell a recombinant coding sequence for a polypeptide sequence that is at least 98% identical to DatZ (SEQ ID NO: 9), a coding sequence for a polypeptide sequence that is at least 98% identical to MazZ (SEQ ID NO: 10), and a coding sequence for a polypeptide sequence that is at least 98% identical to PurZ (SEQ ID NO: 11). In some embodiments, the methods comprise introducing into the cell a recombinant coding sequence for a polypeptide sequence that is at least 99% identical to DatZ (SEQ ID NO: 9), a coding sequence for a polypeptide sequence that is at least 99% identical to MazZ (SEQ ID NO: 10), and a coding sequence for a polypeptide sequence that is at least 99% identical to PurZ (SEQ ID NO: 11).
[0060] In some embodiments of the methods, the coding sequence for a polypeptide sequence that is at least 80% identical DatZ (SEQ ID NO: 9) is selected from datZ (SEQ ID NO: 1) and codon optimized datZ (SEQ ID NO: 5), wherein the coding sequence for a polypeptide sequence that is at least 80% identical MazZ (SEQ ID NO: 10) is selected from mazZ (SEQ ID NO: 2) and codon optimized mazZ (SEQ ID NO: 6), and wherein the coding sequence for a polypeptide sequence that is at least 80% identical PurZ (SEQ ID NO: 11) is selected from purZ (SEQ ID NO: 3) and codon optimized purZ (SEQ ID NO: 7).
[0061] In some embodiments, the method comprises introducing into the cell a recombinant coding sequence for DatZ (SEQ ID NO: 9), introducing into the cell a recombinant coding sequence for MazZ (SEQ ID NO: 10), and introducing into the cell a recombinant coding sequence for PurZ (SEQ ID NO: 11).
[0062] In some embodiments, the coding sequences are present on one or more plasmids.
[0063] In some embodiments, the coding sequences are present on one or more chromosomes.
[0064] The methods may further comprise introducing a heterologous starting nucleic acid into the cell in order to make a Z-enriched form of the starting nucleic acid. In some embodiments, the Z-enriched form has an increased stability. As used herein increased stability is defined by measuring the thermal denaturation profile of the DNA and its Tm, and comparing with the Tm of a DNA of same composition and same length but with 100% of A and no Z. In some embodiments, the Tm is increased by at least 5%, at least 10%, at least 15%, at least 20%, or at least 25%.
[0065] In some embodiments, the heterologous starting nucleic acid is a viral nucleic acid. In some embodiments, the virus is not a cyanophage. In some embodiments, the virus is not a Vibrio phage. [0066] The invention encompasses the use of the recombinant cell or virus, or the composition according to the present disclosure to store DNA. DNA may be used as medium for the storage of information; if the DNA is made more stable, the information storage will last longer and will be safer. The invention also encompasses the use of the recombinant cell or virus, or the composition according to the present disclosure, in phage therapy. In particular, the genome of a phage may be engineered, through the addition of the three genes datZ, mazZ and datZ in its genome, to have enough of its adenines substituted into 2-aminoadenine in its DNA, so that it will acquire an increased resistance to its host endonucleases.
BRIEF DESCRIPTION OF THE DRAWINGS
[0067] Figure 1. Watson-Crick base pairs and natural variations thereof. Hydrogen bonds are marked by a dotted orange line a Classical DNA base pairs, universal to all three domains of life and most viruses b Other types of base pairs with three hydrogen bonds found in some organisms and viruses. Additional chemical groups are in red. 2-aminoadenine : thymine (Z:T, left); guanine : 5-hydroxymethylcytosine (G:hmc, center); archaeosine : cytosine (G+:C, right). The Z:T pair, first found in cyanophage S-2L, replaces completely the usual A:T pair in the genome.
[0068] Figure 2. Functional characterisation of S-2L PrimPol. a Schematic diagram of S-2L PrimPol constructs showing its different domains with their respective amino-acid range (to scale) b-d Results of DNA polymerase activity tests of S-2L PrimPol with either dATP or dZTP as the incoming dNTP, using templates with dT ioGG (b and c) or dT 12 (d) overhang b Different buffers with various pHs, noted below the triangles c Effect of different divalent ions, at 5 μM each d Effect of growing concentrations of nucleotides (lanes 3-8) and pre-incubation of reactional mixture for DatZ WT (lanes 9-10) and I22A mutant (lanes 11-12). Nucleotide concentrations are given in μM under the triangles on the panel to the left; unless otherwise stated they are at 500 pM. Lanes 1-2 represent, respectively, a negative control, without any polymerase, and a positive control, with E. coli Pol I (Klenow fragment) and dATP. [0069] Figure 3. AEP domain of S-2L PrimPol: conserved residues and their structural context, a Five AEP sequence motifs in PP-N 190 close homologues. In addition to previous motif classifications 37,19, the steric gate tyrosine is included as motif 0, and motifs 1 and 2 are extended. Numbers on top of the sequence blocks indicate their amino acid range according to S-2L PrimPol. Residues conserved with other AEPs and of known function are indicated with a grey dot underneath; residues conserved only between the closest relatives of PrimPol and of potential catalytic importance for primase activity - with a black dot. The double-hatted residue D87 could be involved in both polymerase (known) and/or primase (suggested) activities b Structure of PP- N190 in ribbon and surface representation, with two symmetric molecules in the crystallographic asymmetric unit, each coloured with an grey-black gradient. Calcium ions are shown by grey spheres, with water molecules forming their hydration shells shown as black ones. The catalytic site of molecule A is shown in grey stick representation and indicated with a dotted circle c Zoom on the catalytic site of PP-N 190. Residues highlighted in (a) are shown in stick representation and labelled, maintaining the same colour code. The experimental 2Fo-FC electron density around these residues (black mesh) is contoured at 1 sigma.
[0070] Figure 4. HPLC analysis of S-2L DatZ dephosphorylation products. Nucleotide standards are in black, products eluted after incubation of the corresponding triphosphates with DatZ are in grey. Each sample was eluted separately, using an amount of 40 nmol. The enzyme is active exclusively with dATP and removes from it all phosphates: it is therefore a triphosphohydrolase specific of dATP, or dATPase.
[0071] Figure 5. Three-dimensional structure of S-2L DatZ. a Ribbon representation of a DatZ monomer in a light grey-dark grey gradient, with bound dA in stick (yellow). The Zn2+ ion is shown as a grey sphere b A close-up on the catalytic pocket of DatZ with the experimental 2Fo-Fc electron density contoured at 2.5 sigmas around bound ligands: dA and Zn2+ (black mesh). Additionally, the anomalous density at Zn2+ absorption edge (dark grey mesh) is contoured at 10 sigmas. Residue 122 provides direct specificity towards the adenine nucleobase, creating a steric hindrance for chemical groups in position 2 of the purine ring. Other residues highlighted in the text are Zn2+-coordinating ones, W20 and P79. c Structure of the full DatZ hexamer, top and side views, in surface representation protomers form a compact, particularly stable disc in an alternating, zigzagging pattern. Two of the six symmetrical cavities leading to buried dA molecules are visible in the side view and highlighted by the white dotted circles d Surface representation of DatZ hexamer coloured by the experimental B -factors (dark grey-light grey gradient, hydrogen atoms omitted), with the scale bar below. The highest temperature factors map to the flexible loop above dA.
[0072] Figure 6. Catalytic centre of S-2L DatZ with the substrate and cofactors and the mechanism of tri-dephosphorylation, a Model of the reaction centre made by superposition of two of the structures solved in this work. The first structure defines dATP and residue R19 interacting with the a-phosphate; hydrogen atoms were omitted for clarity. The second structure provides catalytic ions A and B (spheres), bound water molecules that are likely to take part in the reaction (light grey) and the metal coordinating residues. Interacting atoms, ions and groups of interests are shown by dashed lines of corresponding colour. The distance between the two Co2+ ions is 5.2 A. b Schematic diagram of DatZ reaction under two-metal-ion mechanism with the initial substrates, intermediate and products. Bonds being made and broken are shown in dashed lines; ionic interactions are in hashed grey (with ionic cofactors) and grey (with protein). Interactions of the substrate with base-stabilising P79, sugar-specificity-conferring W20, 2- amino-specificity-conferring 122 and triphosphate-neutralizing K81 and K116 residues are additionally highlighted. In this diagram, a hydroxide ion (OH ) is proposed for the nucleophile.
[0073] Figure 7. Additional polymerase activity tests on S-2L PrimPol constructs, using nucleotide dATP/dATGC mix or dZTP/dZTCG. a Polymerisation assay of the three PrimPol constructs, with a negative control without any polymerase in the first lane. The incubation was conducted for 20 min, using the dTi2 overhang template b Polymerisation assay for the first 124nt of PrimPoTs native gene. Results are shown for a negative control without any polymerase (lane l),a positive control with E. coli Pol I (Klenow fragment) (lanes 2-3), PP-N300 polymerase without (lanes 4-5), or with pre-incubation of the reactional mixture with DatZ (lanes 6-7). The polymerisation step was allowed to proceed for 15 min with 42.3 μM of PP-N300. [0074] Figure 8. AEP and its ligands a Three known AEP structures with bound dsDNA, viewedfrom the same perspective. Below are: the protein name, PDB code of the structure and the organism or plasmid of origin. The DNA molecule seems to bend in an L-shape at the catalytic site b A model of PP-N190 with two Mg2+ ions bound in B and C sites and two nucleotides in the initiation and elongation sites, obtained after energy minimization step. Residues interacting with Mg2+ ions in a way previously undescribed are in dark grey, the ones interacting in a typical way - in pale grey. The ionic bonds are visualised by the black lines c Distances between the residues and bound ions shown in (b), with the code given below the graph. They were measured in the course of 212 ns of the simulation and averaged in the 2 ns frame. The novel ion in site C interacts with the g-phosphate of the nucleotide in the initiation site; the binding is stable and similar across all simulations.
[0075] Figure 9. Further tests on S-2L DatZ catalytic activity and its I22A mutant. The panels are constructed as in Figure 4. a HPLC analysis of nucleotides obtained after incubation of DatZ with dADP and dAMP, showing no discernible dephosphorylation products b Same analysis for dATP, dZTP and dGTP incubated with DatZ I22A. Compared to the wild-type enzyme, the mutant shows reduced dATPase activity and improved, although still marginal, dZTPase activity. dATP to dA tri- dephosphorylation occurs in a single step.
[0076] Figure 10. Catalytic centre of S-2L DatZ dATPase with bound substrate, product and cofactors. Colour code as in Figure 5b and Figure 6a; residues K81 and K116, balancing the charge of the triphosphate, are also displayed. Water molecules and hydrogen atoms are omitted for clarity. a Structure of DatZ with dA and Co2+. The 2Fo-Fc electron density map is contoured at 1 sigma around dA and Co2+ ions in the binding sites named A and B (black mesh). The anomalous signal at the wave- length of data collection is contoured at 10 sigmas (dark grey mesh) b Structure of DatZ with dATP and partially occupied Zn2+, using the same representation. The 2Fo-Fc electron density map is contoured at 1 sigma around dATP and Zn2+ ion in the binding site A (black mesh). Residual amounts of penta- coordinated Zn2+ can be identified by the anomalous signal at Zn edge, contoured here at 5 sigmas (dark greymesh). [0077] Figure 11. Sequence multialignment of close DatZ homologues co-occurring with purZ gene in related phages. Numbering above the alignment refers to S-2L DatZ. Dots below the alignment mark positions of residues crucial for DatZ: residues coordinating metal ions A and B (H, D, E); residues stabilising the triphosphate (R, K); W20 discriminating ribonucleotides; 122 providing steric hindrance for Z and G nucleobases; P79 stabilizing the purine ring. The remaining strictly conserved residues with hypothetical structural importance are marked by a grey dot (Q, A, G). Because of large inconsistencies in naming, phages other than S-2L andφVC8 are described by their reference number (left). They correspond to (in order): Sinobacteraceae bacterium, phage PMBT28, phage SH-Ab 15497, Siphoviridae sp. ctbf_3, Compost metagenome, Bacteroidetes bacterium, phage ZP6, Podoviridae sp. ctpVR23, Chloroflexi bacterium, four unnamed sequences from Tara database and phage vB_OspP_OH. The supposedly bacterial sequences are short and match other viral sequences on the full length of the genome. S-2L_DatZ (SEQ ID NO: 9); phiVC8 (SEQ ID NO: 17); SSEA01000061 (SEQ ID NO: 18); MG641885 (SEQ ID NO: 19); MG674163 (SEQ ID NO: 20); MH622939 (SEQ ID NO: 21); RCUU0 1000438 (SEQ ID NO: 22); QQVW01000181 (SEQ ID NO: 23); MK203850 (SEQ ID NO: 24);MN582112 (SEQ ID NO: 25); RPPI01000007 (SEQ ID NO: 26); CENI 14966_2 (SEQ ID NO: 27); CEUI32_2 2 (SEQ ID NO: 28); CEUX 95133_1 (SEQ ID NO: 29);CEPX324866_1 (SEQ ID NO: 30); MT028492 (SEQ ID NO: 31).
[0078] Figure 12. A model of dZ in the catalytic pocket of DatZ and its mutant I22A. a The distance between dZ nitrogen atom of the amino group in position 2 and the closest atom of 122 (Cy)is shown by a dashed line. It is too short to allow for correct dZ binding b The distance between the same nitrogen atom and the closest atom of the side chain in mutant I22A (Cp) is longer by 1.5 A (dashed line).
[0079] Figure 13. Structural classification of available AEP enzymes with Dali a Dendrogram of AEP superfamily, derived by hierarchical clustering of the similarity matrix data. PDB codes are atopof the branches; PP-N190 is marked by a black triangle. Archaeo-eukaryotic PriS and bacterial NHEJ primases are monophyletic, and group together in so-called AEP proper clade. Primase-Polymerases (PrimPols) are more divergent and spread-out across all threedomains of life, viruses and plasmids. The AEP domain of S-2L PrimPol shares a recent ancestor with plasmidic RepB’. b The similarity matrix data. PDB codes are indicated to the left and below, organism names further to the left, protein families above and similarity scale to the right. PP- N190 is highlighted with a black cross. Each square represents with its colour how close structurally a pair of AEP proteins is, varying from light grey (no similarity) to dark grey (high closeness).
[0080] Figure 14. Structural multialignment of S-2L DatZ and all other HD phosphohydrolases whose crystal structure is available in the PDB. Organism names and PDB codes are indicated on the left. The meaning of the dots below the alignment is the same as in Figure 11 ; the additional conserved residue E93 is marked by a grey dot. Empty circles highlight highly conserved residues with slightly shifted backbone positions with respect to S-2L (connected full circles), but with superimposed similar functional groups. Occasional unstructured and unbuilt regions in the middle were aligned using sequence information alone while non-superposable N- and C-termini were ignored; in particular, an extended, but structured N-terminus of L. innocua and the last a- helix of A. tumefaciens that undergoes a considerable positional shift, were omitted. S-2L_DatZ (SEQ ID NO: 9); E.coli_2PAU (SEQ ID NO: 32); B.megaterium_5TK7 (SEQ ID NO: 33);
L.innocua_3MZO (SEQ ID NO: 34); A.tumefaciens_2GZ4 (SEQ ID NO: 35);
M.magnetotactitum_3KH 1 (SEQ ID NO: 36); P.furiosus_lXX7 ((SEQ ID NO: 37);P.horikoshii_2CQZ (SEQ ID NO: 38); A.fulgidus_lYOY (SEQ ID NO: 39); S.cerevisiae_5YOX (SEQ ID NO: 40); H.sapiens_4DMB ((SEQ ID NO: 41).
[0081] Figure 15. Non-rooted maximum-likelihood phylogenetic tree of HD phosphohydrolases for all available molecular structures. The tree was calculated using the alignment from the Figure 14. Organism names and PDB codes are to the right of the corresponding branches. Enzymes divide into three groups: archaeal, eukaryotic and bacterial/S- 2L, suggesting an acquisition of the datZ gene by S-2L’s ancestor from a bacterium. Numbers on the nodes are bootstrap percentage values; the reference distance corresponds to an average 0.5 substitution per site. The topology of the bootstrap consensus tree is identical, supporting the result. [0082] Figure 16. Common structural features across the HD phosphohydrolase family. Comparison of S-2L DatZ with E. coli YfbR and B. Megaterium OxsA. a Structures 6ZPC, 2PAU (chain A) and 5TK7. The overall protein fold is highly conserved, with RMSD of 2.0 A and 2.3 A, respectively, as well as the position of substrates and fixed catalytic divalent ions b Hexameric quaternary organisation of DatZ, YfbR and OxsA, extrapolated by using crystallographic symmetry operators to generate the hexamer. c Comparison of the reaction centre between S-2L DatZ and E. coli or B. megaterium HD phosphohydrolases. DatZ is represented as in Figure 6a; Yfbr E72A mutant structure with bound dAMP is taken from PDB 2PAU, with A72 residue swapped with natural E72 from PDB 2PAQ; OxsA is from PDB 5TK7. Features missing in YfbR and OxsA models can be inferred from DatZ structures (shownby dotted contours), suggesting that the metal ion site B is universal and has similar coordination across the whole protein family. Site C, observed for OxsA, would be a result of a switch from mostly conserved positively charged residue corresponding to DatZ’s K116 to a negatively charged one, justifying the need for a third divalent cation.
[0083] Figure 17. Genome of S-2L and its Z-cluster conserved in other Siphoviridae . A. Z:T and G:C base pairs constituting S-2L’s ZTGC-DNA. B. Map of S-2L genome. Arrows symbolize identified genes on the DNA molecule. The grey colouring refers to their functionality; white genes have no inferred function. The cluster of genes involved in dZTP synthesis and dATP removal - the Z-cluster - is composed of datZ, mazZ and purZ (dark grey). C. Conservation of the Z-cluster in all phages with close datZ and purZ homologues, and their phylogeny constructed on both gene’s products (left). Between these phages, two variants of a mazZ gene are conserved: mazZ-1 and mazZ-2 (solid and stripped light grey, respectively). Close scrutiny of the sequence data reveals possible misannotation of viral sequences with bacterial names (ann. note). The reference distance for the PurZ/DatZ phylogenetic tree corresponds to an average 0.2 substitution per site.
[0084] Figure 18. Catalytic properties of S-2L PurZ investigated by HPLC. Reactants of PurZ are visualised on absorbance chromatograms. Peaks corresponding to pure compounds (grey) are indicated with black labels. A. Products of PurZ catalysis with dGMP, Asp and dATP/ ATP (left and right panel respectively), taken at three consecutive time steps. The enzyme generates products corresponding to (d)ADP and sadGMP (N6-succino-2-amino deoxyguanidylo monophosphate) and shows no discrimination between the two adenosine triphosphate variants. B. Comparison of the expected activity between PurZ and a typical AdSS family representative from E. coli. In the reaction time of 15 min, AdSS rapidly transforms IMP, GTP and Asp mixture into GDP and sIMP, whereas PurZ stays inactive even at higher concentration (left panel). Inversely, PurZ catalyses the reaction from dGMP, ATP and Asp to ADP and sadGMP, contrary to AdSS that does not recognise these substrates (right panel). Although the sIMP peak is confounded with the GTP one, it gives a noticeable shift in 260/280 nm absorbance ratio.
[0085] Figure 19. Structure of S-2L PurZ, a N6-succino-2-amino deoxyguanidylate synthase with ligands dGMP and dATP. A. Ribbon representation of a PurZ monomer in a light-dark grey gradient, with dGMP and dATP shown in stick. B. Catalytic pocket of PurZ with the experimental 2Fo-Fc electron density contoured around the reactants at 1 sigma (black mesh). Surrounding residues are defined in the text; R146 from the second protein subunit is stabilizing the phosphate of dGMP in the first subunit’s catalytic pocket. C. PurZ dimer, in surface representation: the two domains are coloured in light-darkk grey gradients. White dotted circles point to the opposite catalytic cavities. D. Surface representation of PurZ coloured using experimental B-factors, with the corresponding scale bar below. The flexible loop above the catalytic cleft (left) define the aspartate loop. The interface between the dimer (right) is particularly rigid, suggesting a constitutive dimeric form of PurZ.
[0086] Figure 20. HPLC analysis of S-2L MazZ nucleotide triphosphate specificity and dephosphorylation products. Nucleotide standards are in black, the products eluted after incubation of the corresponding triphosphates with MazZ are in light grey. Each sample was eluted separately, after an injection of 40 nmol. The enzyme is selective towards dGTP and GTP, removing their two terminal b- and g-phosphates.
[0087] Figure 21. Structure of S-2L MazZ with bound dGDP and Mn2+ ions. A. A tetramer of MazZ, that constitutes the crystallographic asymmetric unit. Two tight dimers further form a dimer; each of the four catalytic pockets with the reactant and three catalytic ions is created from the two chains of a tight dimer. B. Close-up on the catalytic pocket. The product of dGTP dephosphorylation is identified as dGDP in the crystal, next to the catalytic Mn2+ ions. The determinants of guanine specificity (N2 and 06) and residues coordinating the ions are placed on one protein chain. The three Mn2+ ions (spheres), designated A, B and C, are hexa-coordinated by the negatively charged protein residues, deoxynucleotide phosphates and water molecules (omitted for clarity). The 2Fo-Fc electron density around the ligands is contoured at 1 sigma (black mesh); the anomalous signal attesting for the presence of Mn2+ ions is contoured at 3 sigmas (dark grey mesh). C. A single protein chain of MazZ, coloured in light-dark grey gradient. D. Surface representation of MazZ coloured using experimental B-factors with a scale-bar represented on the right. The whole tetramer is very rigid, except for the N- and C-termini and the solvent-exposed D43-H46 flexible loop, fully modelled only for the chain A.
[0088] Figure 22. Metabolic pathway of 2-aminoadenine and its reconstitution in unrelated bacteria and phages. A. Cellular original pool of nucleotides is represented by the box to the left, the one modified through expression of the viral Z-cluster - to the right. Three dots represent unmodified dTTP and dCTP. Structures of the involved S-2L proteins are shown next to their respective reaction arrows (to scale). Host enzymes (names in grey) finalise the dZTP pathway. Thin, grey arrows stand for no modification. The dashed grey arrow stands for potential use of the standard dNTP pool by PrimPol in absence of the Z-cluster, highlighting its lack of specificity. B. The A-to-Z substitution rates in total plasmidic and viral DNA obtained after transplantation of S-2L’s Z-cluster to E. coli. Multiple bars for a given construct stand for independent replicates; their average is displayed above the bars. C. The content of Z in total plasmidic DNA can be normalized to the fraction synthesized after the induction of the Z-cluster in bacterial culture. In average, roughly every fourth adenine was changed to 2-aminoadenine.
[0089] Figure 23. Identical fold shared by S-2L MazZ, bacterial MazG and HisE proteins. The tight dimer part is shown for four exemplary structures, viewed from the same perspective. For each enzyme both chains are shown in ribbon representation. One chain is additionally coloured: a-helices in dark grey, loops in light grey and b-strands in dark grey. For the coloured chain, the secondary structure elements are numbered. Below each image, the organism of origin, protein name and the PDB code are Indicated.
[0090] Figure 24. Structural multi-alignment between S-2L PurZ and all 14 other unique adenylosuccinate/succino-2-amino-deoxy adenylate synthases available in PDB. Organism names and PDB codes are indicated on the left. Circles below the alignment mark positions of residues of interest, divided into three categories: residues strictly conserved across AdSS family (light grey); residues conserved in the usual AdSS enzymes but not in phages (dark grey); more loosely conserved residues with two similar variants or with only occasional mutations (star). Empty circles highlight highly conserved residues with shifted backbone position with respect to S-2L’s ones (connected full circles), but with similar superposed functional groups. Occasional unstructured and unbuilt regions in the middle were supplemented using the sequence information alone. S-2L_PurZ (SEQ ID NO: 11); phiV8_PurZ_6FKO (SEQ ID NO: 42); P.horikoshii_5K7X (SEQ ID NO: 43); C.jejuni_3R7T (SEQ ID NO: 44); B.anthracis_4M9D (SEQ ID NO: 45); E.coli_2GCQ (SEQ ID NO: 46); Y.pestis_3HID (SEQ ID NO: 47); B.thailandensis_3UE9 (SEQ ID NO: 48); L.pneumophila_6C25 (SEQ ID NO: 49); P.falciparum_lP9B (SEQ ID NO: 50); A.thaliana_lDJ2 (SEQ ID NO: 51); T.aestivum_lDJ3 (SEQ ID NO: 52); C.neoformans_5I34 (SEQ ID NO: 53); H.sapiens_2V40 (SEQ ID NO: 54); M.musculus_2DGN (SEQ ID NO: 55).
[0091] Figure 25. Visualisation of relationships between S-2L PurZ and homologous proteins. A. Conserved residues from Fig. 24 are mapped onto S-2L PurZ structure, using the same colour code. Nucleotide substrates - dGMP and dATP - are in light grey. B. Visualisation of a S-2L- specific C-terminal insertion in form of an alpha-helix (grey, left) and bacterio-eukaryotic insertions (archaeo-viral deletions) of two large segments, shown on E. coli AdSS (black, right).
[0092] Figure 26. Non-rooted maximum-likelihood phylogenetic tree of AdSS representatives. The tree was generated using the structural alignment from Fig. S2. Enzymes are divided into four clades: eukaryotic, bacterial, archaeal and viral, the two last ones sharing a recent ancestor. The reference distance corresponds to an average 0.2 substitution per site. The topology of the bootstrap consensus tree is identical, supporting the result presented here. [0093] Figure 27. Sequence multialignment of MazZ-1 homologues (S-2L-like). Numbering above the alignment refers to S-2L MazZ. On the N-termini, sequences from phages SH-Ab, PMBT28 and Kokobel2 are fused to an HNH domain; sequence from Siphoviridae sp. is fused to an unidentified domain, and HNH domain is found on a separate domain upstream. Catalytic residues of S-2L MazZ are marked with dark grey dots, R83 stabilising an intermediate product - in dark grey, residues coordinating 2-amino group, 06 and sugar moiety - in light grey. S-2L MazZ (SEQ ID NO : 10); Sinobact. (SEQ ID NO : 56); Caudovir. (SEQ ID NO : 57); SH-Ab (SEQ ID NO : 58); PMBT28 (SEQ ID NO : 59); Kokobel2 (SEQ ID NO : 60).
[0094] Figure 28. Sequence multialignment of MazZ-2 homologues (φVC8-like). Numbering above the alignment refers to φVC8 MazZ. Putative catalytic residues of are marked with dark grey dots and grey dot (star), corresponding to S-2L MazZ E35, E38, E50, D53 and R83. phiVC8 (SEQ ID NO : 61); Bacteroidetes (SEQ ID NO : 62); ctpVR23 (SEQ ID NO : 63); ZP6 (SEQ ID NO : 64); vB_OspP_OH (SEQ ID NO : 65); Chloroflexi (SEQ ID NO : 66).
[0095] Figure 29. Results of DNA polymerase activity tests of E. coli Poll (Klenow fragment) with either dATP or dZTP as the incoming dNTP. Nucleotide concentrations are given in μM concentration under the triangles.
EXAMPLES
Example 1: Introduction to Examples 2 to 11
[0096] All living organisms use the same elementary bricks for their genetic material, namely four, and only four, nucleobases: adenine (A), thymine (T), guanine (G) and cytosine (C). However, certain viruses of bacteria (bacteriophages or phages) use modified bases to escape their host’s defence system, especially their endonucleases 1,2. Most of the observed DNA modifications occur at position 5 of pyrimidines or position 7 of purines that face the major groove of the DNA double helix 1,3. Methylation on N4 of cytosine or N6 of adenine are also observed in viruses 2,4. For pyrimidines, DNA containing 5-hydroxymethylcytosine has long been known to exist in phages T2, T4 and T6 5, along with the enzyme (deoxycytidylate hydroxymethylase) responsible for its biosynthesis 6; more complicated post-replicative pathways of thymine hypermodification were recently found in phages and recreated in vitro 7. For purines, archaeosine, a modified 7-deaza analogue of guanine observed in archaeal tRNA D-loop 8 was found in the genome of the E. coli siphophage 9g 9, and is possibly present in another siphophage BRET 10; their genomes encode genes (QueC, QueD, QueE) necessary for the biosynthesis of guanine modification. Recently, three additional 7-deazaguanine analogues have been identified and characterized in the genomes of phages and archaeal viruses 11. An important point is to distinguish between replicative and post-replicative DNA modifications: if a biosynthetic pathway can be identified for the synthesis of the triphosphate of the modified nucleotide, it is reasonable to assume that the modified base is incorporated during replication and is not the result of a post-replicative modification.
[0097] Cyanophage S-2L is a Synechococcus phage from the double-stranded DNA Siphoviridae family. It was first isolated and described in 1977 12 and its genome was shown to contain no adenine nor any of its 7-deaza derivatives. Instead, it uses 2-aminoadenine (2,6- diaminopurine or Z) that has an additional amino group in position 2 compared to adenine 13. The A:T base pair, with two hydrogen bonds, is therefore replaced by the Z:T base pair that has three hydrogen bonds, as in the G:C base pair (Fig. 1). This feature, combined with an unusually high GC content of S-2L genome, explains its exceptionally high melting point 12. It is believed that the A-to-Z substitution arose as a form of host evasion tactics, rendering S-2L’s DNA resistant to the DNA-targeting proteins of its host, especially endonucleases 14,15.
[0098] Once the S-2L’s genome was sequenced, the presence of a gene homologous to an adenylosuccinate synthetase (purA) was noted, raising the possibility that the phage encodes in its genome the enzymes of the biosynthesis pathway of 2-aminoadenine triphosphate (dZTP; patent application EP1499713A2). A detailed structural study of such purA orthologue (called purZ) in Vibrio phage φVC8 fully supported this hypothesis (PDB ID: 6FM1). However, it remained still largely unknown how the phage S-2L incorporates the base Z in its genome, especially as no gene corresponding to a DNA polymerase could be detected. This is in contrast with the situation in the phageφVC8, where a DNA polymerase of the family A has been identified16. [0099] Here, we identify the enzyme that is responsible for genome duplication of the phage S- 2L, a member of the PrimPol family, and we present its crystal structure. We confirm its polymerase activity but find that the enzyme is not specific to A or Z. Instead, we propose that the absence of A in S-2L genome is explained by a separate enzyme, an HD phosphohydrolase that specifically dephosphorylates dATP and that we name DatZ. We give a structural explanation for both the specificity and the reaction mechanism of DatZ, based on three crystallographic structures, including one determined at sub-angstrom resolution.
Example 2: Materials and Methods for Examples 3 to 9
Identification of genes of interest
[0100] The genomic sequence of cyanophage S-2L was obtained from NCBI’s database (AX955019). Potential ORFs were identified using ORFFinder 53 (>150 nt, genetic code 11). Targeted ORFs were assessed for possible homology with known proteins using BLAST. The genomic positions of genes involved in phage replication is provided in Table 2; nucleotide sequences of native and codon-optimised genes pplA and datZ are provided herein above. Protein disorder of PrimPol was predicted with DISOPRED 27.
Protein expression and purification
[0101] Synthetic genes for expressed proteins were optimized for E. coll and synthesized using ThermoFisher’s Gene Art service. Genes were cloned into modified RSFl-Duet expression vector with a TEV-cleavable N-terminal 14-histidine tag 54 using New England Biolabs and Anza (Thermo Fisher Scientific) enzymes. Shorter versions of PrimPol (PP300 and PPI 90) were obtained by adding overhangs with codon STOP and corresponding cleavage site through standard PCR with designed oligonucleotides (Eurogentec); mutagenesis of DatZ was done using designed oligonucleotides and QuikChange II Site-Directed Mutagenesis Kit (Agilent). E. colt BL21-CodonPlus (DE3)-RIPL cells (Agilent) were separately transformed with engineered plasmids. Bacteria were cultivated at 37°C in LB medium with appropriate antibiotic selection (kanamycin and chloramphenicol), and induced at OD = 0.6- 1.0 with 0.5 mM IPTG. After incubation overnight at 20°C, cells were harvested and homogenized in suspension buffer: 50 mM Tris-HCl pH 8, 400 mM NaCl, 5 mM imidazole. After sonication and centrifugation of bacterial debris, corresponding lysate supernatants were supplemented with Benzonase (Sigma- Aldrich) and protease inhibitors (Thermo Fisher Scientific), 1 mΐ and 1 tablet per 50 ml, respectively. Proteins of interest were isolated by purification of the lysate on Ni-NTA column (suspension buffer as washing buffer, 500 mM imidazole in elution buffer). They were further diluted to 150 mM NaCl and repurified on HiTrap Heparin (for PrimPol) or HiTrap Q (for DatZ) columns (1 M NaCl and no imidazole in elution buffer). Histidine tags were removed from the proteins by incubation with his-tagged TEV enzyme overnight. After removing TEV on Ni-NTA column, proteins were further purified on Superdex 200 10/300 column with 25 mM Tris-HCl pH 8, 150 mM NaCl (for PrimPol-N190 crystallisation a 16 mM concentration of NaCl was used). All purification columns were from Life Sciences. Protein purity was assessed on an SDS gel (BioRad). The enzymes were concentrated to 7-19 mg ml"1 with Amicon Ultra 10k and 30k MWCO centrifugal filters (Merck), flash frozen in liquid nitrogen and stored directly at -80°C, with no glycerol added. Selenomethionine (SeMet) version of PP-N190 was prepared using the same expression strain and construct. Bacteria grew in medium with 6 g L-1 Na2HP04, 3 g L-1 KH2PO4, 1 g L-1 NH4CI, 0.5 g L-1 NaCl, 2 mM MgS04, 100 μM CaCl2 and 0.4% glucose, supplemented with metal solution (5000x): 5 g L-1 FeCl2, 184 mg L-1 CaCb, 64 mg L-1 H3BO3, 40 mg L-1 MnCb, 18 mg L-1 C0CI2, 4 mg L-1 CuCb, 340 mg L-1 ZnCb, 605 mg L-1 Na2MoO4, 1.3 mΐ L-1 and 0.8% cone. HC1. At OD = 0.6, cultures were supplemented with 50 mg L-1 of selenomethionine, isoleucine, leucine and valine, and 100 mg L-1 of lysine, threonine and phenylalanine. All chemicals were from Sigma- Aldrich. After further incubation for 15 min at 37 °C, the culture was induced and processed as above.
DNA polymerase assays
[0102] Radioactivity polymerase activity tests, if not stated otherwise for a particular condition, were executed in 200 mM Tris-HCl pH 8 and 50 mM MgCl2, with 50 nM of dTioGG overhang DNA template, 50 nM of a-32P 5 ’-labeled DNA primer complementary to template upstream sequence, 250 μM dNTP mix and 1 μM of PrimPol (20 min of incubation) at 37°C. Fluorescence polymerase activity tests, if not stated otherwise, were executed in 20 mM Tris-HCl pH 7 and 5 mM MgCl2, with 3 μM of dTi2 overhang DNA template, 1.5 μM of FAM 5’-labeled DNA primer, 500 μM dNTP mix, 0.5 μM of Prim Pol constructs (10 min of incubation) and 1 μM of DatZ (8 min) at 37°C. The Klenow polymerase used as a control was at 5 U in 50 mΐ (10 min incubation). Polymerase gene replication test was conducted similarly, with 3 μM of template and primer labelled radioactively on 5’ end (a-32P); PP-300 at 42.3 μM and Klenow polymerase at 4 U in 20 mΐ were incubated for 15 min. Oligonucleotide sequences are specified in Table 3 (presented in Table 2).
[0103] Before adding the protein, DNA was hybridized by heating up to 95°C and gradually cooling to reaction temperature. Reactions were terminated by adding two volumes of a buffer containing 10 mM EDTA, 98% formamide, 0.1% xylene cyanol and 0.1% bromophenol blue, and stored in 4°C. Products were preheated at 95°C for 10 min, before being separated with polyacrylamide gel electrophoresis and visualised by FAM fluorescence or radioactivity on Typhoon FLA 9000 imager. All oligonucleotides were ordered from Eurogentec, chemicals from Sigma-Aldrich, Klenow polymerase from Takara Bio, standard dNTPs from Fermentas (Thermo Fisher Scientific) and dZTP from TriLink BioTechnologies.
Nucleotide HPLC analysis
[0104] 1 μM of DatZ or its mutant was incubated at 37°C for 10 min with 500 μM of the respective dNTP, in a buffer containing 20 mM Tris pH 7 and 5 mM MgCl2. Reaction products were separated from the protein using 10000 MWCO Vivaspin-500 centrifugal concentrators and stored in -20°C. Products and standards were assayed separately, using around 40 nmol of each for anion-exchange HPLC on DNA-PACIOO (4x50 mm) column (Thermo Fisher Scientific). After equilibration with 150 mΐ of a suspension buffer (25 mM Tris-HCl pH 8, 0.5% acetonitrile), nucleotides were injected on the column and eluted with 3 min of isocratic flow of the suspension buffer followed by a linear gradient of 0-200 mM NH4CI over 10 min (1ml min-1). Eluted nucleotides were detected by absorbance at 260 nm, measured in arbitrary units [mAu] High- purity nucleotides and chemicals were bought from Sigma-Aldrich, and HPLC-quality acetonitrile was from Serva. Crystallography and structural analysis
[0105] All crystallization conditions were screened using the sitting drop technique on an automated crystallography platform 55 and were reproduced manually using the hanging drop method with ratios of protein to well solution ranging from 1:2 to 2:1. PrimPol-N190 was screened at 14.5 mg ml-1 in 4°C. Elongated rods grew over 2 days in 100 mM CaCl2, 20% w/v PEG 8k (40%) and 5% v/v isopropanol (100%) buffered with 100 mM MES pH 6. DatZ was screened at 12-17 mg ml-1 with a molar excess of 1.2 of dATP at 18°C. Big, symmetric crystals grew rapidly over 1-2 days in 1.5 M L12SO4 buffered with 100 mM HEPES pH 7.5. All crystals were soaked in a solution containing 70% crystallization buffer and 30% glycerol and frozen in liquid nitrogen. Crystallographic data was collected at the SOLEIL synchrotron in France (beamlines PROXIMA-1 and PROXIMA-2), processed by XDS 56 with the XDSME 57 pipeline and refined in Phenix 58. Nucleotide constraints for structure refinement and dZ modelling were obtained using Grade Web Server 59. The structure of PrimPol-N190 was solved by SAD technique using SeMet derivative of the protein and data sets collected at the selenium edge (0.9807 A) using the SHELX C/D/E programs 60. The structure of DatZ was solved by the sulphur-SAD (S-SAD) technique at 1.7712 Å wavelength. The anomalous double difference Fourier map for Zn was calculated from data collected at 9.67 and 9.66 keV (Zn peak and pre edge). DatZ ultrahigh resolution structure was obtained by merging 3 individual datasets taken on the same crystal. Structures of DatZ with bound Co2+ and dATP were obtained by growing crystals with 10 mM C0CI2 and 10 mM EDTA, respectively (the latter at pH 7). Replacement of the Zn2+ ion by Co2+ was confirmed with anomalous double difference maps with data collected at 7.73 and 7.28 keV (Co peak and pre-edge wavelengths). We found two peaks at 46.4 sigma and 32.7 sigma at sites A and B, respectively. Retention of the Zn2+ ion in presence of Mg2+ was confirmed with a persisting strong anomalous signal at 7.1 keV, 12.7 keV and 16 keV. Molecular dynamics simulations of PP-N190
[0106] Force field parameters of dCTP were obtained using CGenFF 61. The parameter penalty and the charge penalty were zero, indicating that the parameters can be used safely without any modification. CHARMM36 parameter set was used for the rest of the system 62. Topologies of the structures were prepared with psfgen module of VMD 63. After the topology construction, the structures were solvated in a triclinic box with a distance of at least 11 Å to the box edges and TIP3P solvent model. The systems were neutralized with Na+ and Cl- ions, and the ion concentration was set to 0.15 M. Then, a 50000 step conjugate gradient minimization procedure was carried out. The minimized systems were heated up to 300 K with 0.001 K steps. An NPT equilibration procedure followed the heating. The equilibration time was 2 ns and the time step was 2 fs. The equilibration temperature (300 K) was controlled with Langevin thermostat and the pressure (1 atm) was controlled with Langevin barostat. The production run was 212 ns long, with the remaining parameters of production runs identical to the equilibration stage parameters. All of the molecular dynamics simulations were performed with NAMD version 2.13 64.
Sequence and structure alignments phylogenv
[0107] Close relatives of pplA and datZ were identified by BLAST searches, and aligned with Clustal Omega 65 (PrimPol) or the default MUSCLE algorithm in MEGA X software 66 (DatZ); sequence logos were made with WebLogo 67. Structures homologous to PrimPol and DatZ available in PDB were identified using Dali server 52 ; Dali was further used for pairwise RMSD determination and geometry analysis. The tendencies observed for AEP superfamily clustering were maintained whether the analysis involved whole structures or only AEP cores, and whether the dataset was complete or not. The sequences of DatZ and other structures from HD phosphohydrolase family were aligned in PROMALS3D 68 using structural data supplemented by full protein sequences, excluding not-superimposable N- and C-termini. Multialignment images were prepared with ESPript 3 69. Maximum-likelihood phylogenetic tree of HD phosphohydrolases based on their structural multialignment was prepared in MEGA X with default parameters, taking 100 bootstrap replications. All protein structures were visualised with Chimera 70 and Pymol 71. Statistics and reproducibility
[0108] All non-crystallographic experiments and molecular dynamics simulations were done in triplicates (n = 3). For x-ray crystallography, several consistent datasets were collected from multiple crystals; the best-resolution datasets were chosen for the final refinements.
Example 3: A DNA primase-polymerase nonspecific of A or Z
[0109] Parsing the genomic sequence of cyanophage S-2L (AX955019) in the search for a protein involved in DNA replication, we identified one ORF corresponding to a member of the Archaeo-Eukaryotic Primase (AEP) superfamily, which had not been noted earlier. We will refer to it as “PrimPol”, similarly to its close homologues, and its gene will be referred to as “pplA”.
[0110] AEP is the eukaryotic and archaeal counterpart of DnaG, the bacterial primase superfamily 17 18, to which it is structurally unrelated. Its members are found in all domains of life, including viruses, and are involved in several DNA transactions including not only DNA priming and replication, but also DNA repair through non-homologous end-joining (NHEJ) 18. AEP proteins are often fused or physically interact with DNA helicases, and also with partners containing helix bundle domains (like PriCT-1, PriCT-2, PriL or PriX) that interact with the template ssDNA 17 19 22. Particularly important for this work, it was recently shown that a phage- encoded AEP polymerase is capable of replicating the whole genome of the NrS-1 phage 23. Although AEP is not officially included yet in the standard DNA polymerase classification encompassing polymerases from families A, B, C, D, X, Y and RT 24,25 despite an incentive to do so 18, members of the AEP superfamily share the classical Klenow fold with families A, B and Y DNA polymerases 26.
[0111] We started by characterizing the domain organisation of PrimPol in silico, using DISOPRED 27. The result indicated that the enzyme is composed of three domains, whose function was then determined individually by homology searches (Fig. 2a). The first region (1- 190) corresponds to the AEP domain itself, with all crucial motifs conserved. The second region (210-300) has a strong homology with PrCT-2 domain, most probably involved in the priming activity 19. Together they are joined by a flexible linker and form the primase-polymerase component (1-300). The C-terminal domain (350-737) begins after another large flexible linker. BLAST searches 28 indicate it matches best the VirE family of single-stranded DNA-binding proteins of function not described in the literature 29. However, homology detection combined with structure prediction performed with HHpred 30 finds high-scoring similarity between viral hexameric DNA helicase structures, the closest being from bovine papillomavirus (2GXA).
[0112] We found no other detectable DNA polymerase in the S-2L’s genome and went on to assay the DNA polymerase activity of PrimPol. Specifically, we looked for its ability to selectively incorporate the base Z in front of an instructing base T, discarding the dATP present in the host cell’s dNTP pool and avoiding the A:T base pair altogether. We cloned and overexpressed the synthetic gene of PrimPol in E. coli and tested its polymerase activity in vitro. To study the specificity towards A and Z, we used dsDNA with a dTi2 oligomer as the 5 ’ overhang of the template strand and either dATP or dZTP in the reactional mixture. We tested a range of different conditions, varying temperature, pH, DNA, nucleotide and enzyme concentrations, as well as various divalent ions (Fig. 2b-d) that are usual cofactors in DNA and RNA polymerases 31. All assays indicate that S-2L PrimPol is capable of incorporating both nucleotides across from T, accepting A more readily than Z. We also noted that the presence of Mn2+ ions induces limited terminal transferase activity, as observed for some other DNA polymerases such as the human pol m from the pol X family 32; for another, more distantly -related AEP, this activity was observed even with Mg2+ ions 33. We also overexpressed truncated versions of the enzyme, PP-N300 and PP-N190, corresponding to the primase-polymerase core and polymerase domain, respectively. We observed a gradual decrease in the polymerase activity with progressive domain deletions, but constructs remain active as long as the AEP domain is present (Fig. 7a); this confirms the necessary and sufficient role of this domain during DNA synthesis. In another test, we showed that PP-N300 can synthesise in vitro the first 124 nucleotides of its own native gene, with both dATGC and dZTGC mixtures (Fig. 7b). Example 4: Structural analysis of the AEP domain of S-2L PrimPol
[0113] Using BLAST, we identified 129 other sequences with high similarity to the AEP domain of PrimPol (PP-N190). We aligned them and visualised the conservation status of crucial residues and motifs described in previous reports (Fig. 3a); their function is described further below.
[0114] We could crystallize PP-N190 and solve its structure at 1.5 A resolution (PDB ID: 6ZP9; Table 1, Fig. 3b), using phase information from SeMet derivative crystals. Ca2+ ions were mandatory in the mother liquor to obtain crystals. As expected, the protein has a classical AEP fold. All crucial residues cluster together in the catalytic site of the domain (Fig. 3c). Y63, E85, D87, Til 2, K115, H118, D146, R157 are conserved across all AEPs (or have biochemically similar counterparts), and their function is well established in the superfamily. Residue Y63 plays the role of a steric gate for ribonucleotides, allowing only dNTPs in the catalytic site 34. Residues E85, D87 (that can vary to Asp and Glu, respectively) coordinate a divalent metal ion (M2+) in the B site, that positions the triphosphate of the incoming nucleotide (dNTP) during polymerisation; this triphosphate is further stabilized by interactions with T112, K115, HI 18 and R157 (possibly varying respectively to Ser, Arg, Asn and Lys) 35 38. Residue D146 along with residues E85, D87 and the dNTP’s α-phosphate coordinate another M2+ ion in the A site, making it possible to add the incoming dNTP to the primer strand of the nascent nucleic acid through the two-metal-ion mechanism 35,39,40. The three negatively charged residues E85, D87 and D146 are crucial for the polymerase and primase activity, as shown in the related human PrimPol 41. Importantly, in S-2L PP-N190 we noticed a significant positional shift of residue D87 compared to other AEP structures, along with the conservation among the close relatives of the neighbouring residue D88, which is exposed to the solvent. Either D87 is able to come back to its canonical position once all the substrates and ions are in place, or its position is conserved in the complex: to resolve this point, we investigate below with molecular dynamics its flexibility and potential to stabilize an additional metal ion together with D88. Finally, although residue H163 lies further apart from the triphosphate, its high conservation and covariance with positions R157 and HI 18 was noticed in a recent study 19. In human PriS, the mutation of the corresponding residue (H324) to alanine partially inhibited the enzymatic activity, a result that was explained by the presence of a water molecule that links it to the triphosphate 36.
[0115] Due to the presence of divalent calcium ions in all crystallisation conditions, we could not soak the crystals with nucleotides which immediately precipitate; transferring crystals to a solution devoid of Ca2+ dissolved the crystals in a matter of seconds. On the other hand, there are several AEP structures with bound ligands available in the PDB, including DNA and (d)NTPs. Based on the three structures with DNA (3H25, 3PKY, 5L2X), the nucleic acid apparently bends in an L-shape over the open catalytic site (Fig. 8a). Additionally, the incoming (d)NTP’s conformation is largely conserved across all 8 unique AEP structures with a bound nucleotide (PDB IDs: 1V34, 2ATZ, 2FAQ, 3PKY, 5L2X, 50F3, 6JON, 6R5D). In all cases, the catalytic site is open to the solvent and there is no selection on the incoming nucleotides; after superposition with these structures, PP-N190 presents no structural feature that could lead to a Z vs A specificity during the polymerase reaction.
Example 5: In silico investigation of the primase catalytic site
[0116] In standard primase assays involving a typical single-stranded M13 genome or several random oligonucleotide sequences (50-100 nt), we observed no DNA or RNA primase activity of PrimPol, perhaps because of incompatible template sequence. Nevertheless, using computer simulations, we tried to understand how PrimPol may work in the primase mode, a function that is predicted to be conserved in the enzyme by high homology to other active primase- polymerases. Relying on structure of human PrimPol 37 , we could build a model of S-2L PrimPol AEP domain with a Mg2+ ion placed in the classical site B in the presence of two nucleotide triphosphates in the elongation (polymerase) and initiation (primase) sites. We placed an additional Mg2+ ion in a hypothetical metal binding site “C” between residues D87 and D88 (Fig. 8b). Using this initial model, we conducted molecular dynamics simulations to investigate the stability of the complex in the catalytic site.
[0117] We observed during these simulations that the side chain of SI 16 was coordinating the Mg2+ ion in the B site, together with the usually involved residue E85 (Fig. 8c). Strictly conserved between closely related PP-N190 relatives but not across the AEP superfamily, SI 16 can apparently take over the function of the shifted D87 residue, rather than contacting the g- phosphate of the incoming nucleotide as seen for its counterpart in human PrimPol 37. Additionally, the Mg2+ ion placed at site C between residues D87 and D88 was stable during the 212 ns-long MD simulation, and interacts with the g-phosphate of the nucleotide in the initiation site. The possible change of D88 to Asn or to His observed in related AEP domains retains the capacity of divalent metal ion binding and further supports the functional nature of this position. We propose that during the putative primase activity of PrimPol involving two nucleotide triphosphates, this additional ion binding site C is important in the positioning and charge neutralisation of the 5’ nucleotide. To test this hypothesis, further work is needed to find the sequence of the template that triggers the DNA primase activity. Then, site-directed mutagenesis can be used to probe the role of putative important residues pointed out by our model.
[0118] In conclusion, while the discovery of PrimPol encoded in S-2L’s genome explains how the phage could replicate its genome, functional and structural studies show it cannot discriminate A against Z. Therefore, it remains to be explained how Z gets incorporated in the genome of S- 2L instead of A.
Example 6: DatZ: a triphosphohydrolase specific of dATP
[0119] We subsequently revisited other genes susceptible to intervene during the phage genome replication. We found that one ORF in the immediate vicinity of purZ encodes a 175 aa protein belonging to the HD-domain phosphohydrolase family 42. Enzymes from this family are known to dephosphorylate standard deoxynucleotide monophosphates (dNMPs) and can also act as a triphosphatase on dNTPs, as well as on some close nucleotide analogues 43,44. After purification of the S-2L HD phosphohydrolase overexpressed in E. coli, we tested its activity by pre- incubating it with the reactional mixture for the aforementioned DNA polymerization assay, before adding PrimPol. We observed that the presence of the phosphohydrolase prevented polymerization with dATP, but did not affect the polymerisation with dZTP (Fig. 2d). [0120] We interpreted this behaviour as the result of a specific dATP triphosphohydrolase activity, therefore suggesting to call the enzyme DatZ. We confirmed this hypothesis by incubating DatZ with different nucleotide triphosphates and analysing the reaction products by HPLC analysis (Fig. 4). dATP was rapidly degraded into dA; however, under the same conditions there was no dephosphorylation of ATP, dZTP, nor of all other standard dNTPs (dGTP, dTTP or dCTP). We also found no dephosphorylase activity on dADP or dAMP substrates (Fig. 9a). Marginal tri-dephosphorylation products of dZTP start to appear only after a prolonged incubation (75x longer than for dATP) or in excess of DatZ concentration. Contrary to OxsA phosphohydrolase 44 , we did not observe a sequential dephosphorylation, but a one-step reaction directly from dNTPs to dNs, never detecting any intermediate phosphorylation states in the course of the reaction.
[0121] Our finding that S-2L DatZ is a specific dATP triphosphohydrolase offers a simple explanation of how the phage avoids incorporating adenine in its genome.
Example 7: DatZ structure at 0.86 A resolution: general description
[0122] Using X-ray crystallography, we determined three structures of S-2L DatZ with its substrate, the reaction product and the metal cofactors, the second one at sub-angstrom resolution. They constitute the first structures of a viral HD phosphohydrolase, and the third HD phosphohydrolase to be described in atomic details, after E. coli YfbR 45 and B. megaterium OxsA
44
[0123] First, we present a 0.86 A resolution structure of S-2L DatZ bound to dA, the product of dephosphorylation of dATP in solution (PDB ID: 6ZPA; Table 1). The electron density allowed to build the whole protein as well as 218 water molecules around the DatZ chain (175 aa), which is roughly the number expected for this resolution limit 46. Although several hydrogen atoms are discernible at such a resolution, the usual limit for their experimental allocation is 0.8 A 47; they were therefore refined using a riding model. Each monomer of DatZ takes a globular form composed predominantly of a- and 310-helices (70% and 4% respectively), with no b-strands (Fig. 5a). The base moiety of dA snugly fits in the catalytic pocket below a relatively flexible element (as indicated by higher B-factors), with the P79 residue on its tip (Fig. 5b). A catalytic divalent ion is found in the vicinity of dA’s free 5 ’-OH group, even though no divalent ion was added in buffers during purification or crystallisation. In the catalytic site, the side chain of residue 122 is ideally positioned to sterically exclude the amino group in position 2 of the purine ring of G or Z and provides an immediate explanation for the observed specificity of the enzyme. In addition, W20 side chain constitutes a steric hindrance for the 2’ hydroxyl group of any ribose-based nucleotide.
[0124] Concerning the oligomeric state of DatZ, we found that in crystallo it arranges in a compact toroidal hexamer with a D3 symmetry, where neighbouring subunits are flipped (Fig. 5c). Such a shape emerges from two partially hydrophobic, self-interacting protein sides (A:A and B:B), with a large surface of interaction - 1358.6 A2 and 959.0 A2. We confirmed the hexameric stoichiometry of DatZ in vitro with complementary techniques, i.e. DLS and analytical ultracentrifugation, leading to 5.9 (± 0.1) protomers per oligomer assuming a perfectly globular shape. The whole hexamer is particularly rigid, as judged from the overall very low B-factors (Fig. 5d), which is consistent with the ultrahigh diffraction limit for DatZ crystals.
Example 8: A two-metal-ion mechanism of DatZ
[0125] In the literature, there is some uncertainty as to which divalent cation plays a catalytic role in HD phosphohydrolases. The structure of OsxA suggested the presence of one fixed Co2+ ion coordinated by the protein and one transient Mg2+ interacting with the triphosphate 44. The YfbR enzyme was shown to be active with Co2+ and less with Mn2+, Cu2+ and Zn2+ 43, while OxsA is roughly equally active with Co2+, Co2+/Mg2+ and Mn2+, but not Zn2+ 44.
[0126] In S-2L DatZ, the first detected metal ion, occupying the site “A” in the 0.86 A resolution structure is coordinated by residues H34, H66, D67 and D119; two water molecules, also present in the Co2+ -bound structure (see below), complete a typical octahedral coordination shell and fit well into the electron density map. Both the position and coordination of ion A2+ are identical to what is observed in other known HD phosphohydrolases, that take their name from the conserved HD diad. An excitation x-ray energy scan showed a major contribution of Zn; additionally, an anomalous double-difference signal at 40 sigmas at the Zn edge unambiguously point to the presence of a Zn2+ ion in this site. Its coordination geometry is less common than the usual tetrahedral one, but not atypical 48. The fact that no additional divalent ions were added during protein purification indicates a high affinity of DatZ for Zn2+. Zn2+ is present in E. coli grown on LB medium 49 at a level comparable to the one found in vivo in cyanobacteria 50.
[0127] We then solved a second structure of DatZ co-crystallised with dATP and 10 mM C0CI2 (PDB ID: 6ZPB; Table 1, Fig. 10a) and noticed the presence of a second, previously undescribed metal ion binding site, that we call “B”. This site is not the one observed in OxsA structure, although it lies in the vicinity of the first site (5.2 Å apart) as well. Both Co2+ ions are coordinated octahedrally: in site A, the binding geometry is the same as described above for Zn2+, while in site B the coordination is mediated by residues E70, D75, the 05’ of dA and three water molecules. The presence of the two Co2+ ions was confirmed by a strong signal in the corresponding Fourier double difference anomalous map at 46 and 33 sigmas in sites A and B, respectively.
[0128] Finally, we solved a third structure of DatZ, this time with bound dATP (PDB ID: 6ZPC; Table 1, Fig. 10b) but no divalent ion(s), obtained by adding EDTA to the enzyme before crystallizing it with the triphosphate. In this structure, we could observe the residues K81 and K116 neutralising the negative charge of β- and γ-phosphates. We still find a Zn2+ ion in the A- site as shown by its anomalous signal, although not fully occupied and only penta-coordinated. We assume that this change in coordination, intermediate between tetrahedral and octahedral and also commonly observed for Zn2+ 48 , is the result of the presence of a triphosphate.
[0129] Superposing the new structures with both co-factors (divalent ions) and the substrate allows to propose a complete catalytic mechanism of DatZ (Fig. 6). Similarly to alkaline phosphatase and 3 ’-5’ exonuclease 51 , DatZ uses a typical two-metal-ion mechanism to dephosphorylate dATP. While the ion B2+ stabilizes the leaving 05’ atom and one oxygen of the a-phosphate (Pα), ion A2+ positions a hydroxide (OH ) in an attacking position opposite to 05’. Then, by interacting with OH , the a-phosphate passes through a penta-coordinate intermediate, forming an unstable oxyanion stabilized by the R19 residue. Finally, the bond 05 ’-Pα is broken and a new one, Pα-OH, is created.
[0130] We checked by HPLC that DatZ is active in a buffer containing Mg2+ as the sole added divalent metal ion and we observed that the enzyme stays active in crystallo with no additional divalent ions at all. Two additional crystal structures showed that Zn2+ in site A is replaced by Co2+ in excess of the latter (20 mM C0CI2), but is retained in elevated Mg2+ concentrations (50 mM MgSO4), as determined through anomalous signal analysis (see Methods).
Example 9: The active site of DatZ: conservation and mutagenesis
[0131] A number of phages that contain a close homologue of purZ gene in their genome also contain a homologue of datZ. Looking for the conservation of residues crucial for both a dATPase activity and absence of dZTPase activity, as identified by the present structural studies, we built a multialignment of these closely -related DatZ sequences (Fig. 11). We found that all residues stabilizing both catalytic metal ions are strictly conserved, as well as R19, K81 and K116 interacting with a-, b- and g-phosphates. Residues W20, 122 and P79, interacting with the base, are conserved or involve conservative substitutions. Additionally, residues Q29, A32 and G74 are strictly conserved among close DatZ homologues, highlighting their possible importance for protein structure (tertiary or quaternary) and/or its dynamics.
[0132] With the intention of engineering a dNTPase with a selectivity shifted towards dZTP, we cloned, expressed and tested DatZ I22A mutant, designed to make room for the additional amino group of Z in the binding pocket (Fig. 12). We observed a significant relaxation of the purine specificity (Fig. 2d, Fig. 9b). The mutant’s dATPase activity is clearly reduced and still does not show any intermediate product. The additional space created for the 2-amino group of dZTP has the desired effect of raising the dZTPase activity to the point of becoming detectable, albeit still very low. The dGTPase activity remains undetectable, indicating that the selectivity towards an amino group in position 6 of the purine ring is maintained. Example 10: Discussion of Examples 1 to 9
[0133] The immediate neighbours of PrimPol in the S-2L genome are also replication-related proteins (exonuclease VIII, SF2 helicase and VRR nuclease) (Table 2 ), and all have a high level of sequence identity with Mediterranean uvMED phages’ corresponding proteins. In contrast, these viruses contain neither purZ nor datZ genes - they share with S-2L only their replicative machinery, and not the additional apparatus that enables the A-to-Z switch. Interestingly, S-2L PrimPol is also related to cyanobacterial enzymes: notably, sequence motifs in the AEP polymerase core correspond perfectly to these of A113500-like family 19, with almost all of the high-scoring matches coming from cyanobacteria genus. Such a finding supports the idea that pplA, the gene of PrimPol, may have been exchanged between cyanophages and their hosts.
[0134] Due to the divergent nature of the AEP superfamily, its classification is far from trivial. The universal presence of its members, encompassing all three domains of life, viruses and plasmids, testifies about its ancient origin 19. Advanced sequence-based computational methods divided the superfamily into four clades: AEP proper, NCLDV-herpesvirus primase, PrimPol, and BT4734-like 17. In another approach using sequence clustering, AEPs were distributed into multiple groups, with the newly defined PrimPol-PV 1 supergroup 19. S-2L PrimPol belongs to the Anabaena (. Nostocaceae ) A113500-like (sub)family within the PrimPol clade or the PrimPol - PV 1 supergroup, depending on the classification.
[0135] A search with PrimPol in the Dali server 52 identified all structures of AEP available in the PDB. However, due to excessive divergence of the superfamily, the structure-based multialignment approach, applied below for DatZ, was not reliable. Instead, we adopted the geometry-based analysis proposed by Dali. Both the dendrogram and the non-hierarchical clustering method (Fig. 13) distinguish two major, well-defined groups: archaeo-eukaryotic replicative PriS primases and bacterial NHEJ primases (LigC/D), belonging to the AEP proper clade defined previously 17. The remaining set contains PrimPols with more distant homology. The strongest link between S-2L PrimPol and any other member of the AEP superfamily is with the plasmidic RepB’ (3H20), highlighting the connection between A113500-like and RepB’ clusters within the PrimPol-PVl supergroup 19. Additionally, the previously undescribed subfamily of AEP conserved in the order Campylobacteriales and represented by HP0184 from H. pylori (2ATZ) is systematically placed together with them, hinting that they may share a common ancestor.
[0136] In general, in spite of the modest set size of 15 unique AEP structures, PrimPols are clearly much more widespread and diverse than the PriS and NHEJ primases, which have more specific roles. Our preliminary analysis suggests that the ancestor of S-2L PrimPol was acquired from its cyanobacterial host.
[0137] Concerning DatZ, we performed a multialignment of all available HD phosphohydrolase structures with PROMALS3D (Fig. 14), thus avoiding purely sequence -based errors. There is a strict conservation of all residues binding metal ion A across all representatives, along with metal B-binding E70 residue and R19 that stabilizes the reaction intermediate. There are two singular cases where the D75 B-site binding residue can change to E or H, but chemically both are capable of metal ion coordination. Prominently, the human HD phosphohydrolase HDDC2 (HD domain- containing protein 2) shows a metal coordination identical to the one seen in S-2L DatZ; it is the only other homologue structure with two ions (Mg2+) present in both sites A and B (PDB ID 4DMB). Although it was hypothesised that during the nucleophilic attack a glutamic acid corresponding to DatZ E70 would act as a proton donor through a water bridge 45, here we provide evidence that it participates in metal B binding instead. Interestingly, its alanine mutant was described as having lost its phosphohydrolase activity. Lastly, the residue E93 is almost completely structurally conserved, with the only exception of OxsA, and its position along the sequence is shifted one a-helix turn in DatZ; it remained undetected by previous sequence alignments with close viral DatZ homologues probably due to an intrinsic low precision in this region without structural support. E93 places its side chain in the catalytic pocket, but too far away to interact directly with the phosphate g or the divalent metal ion B2+ (6.5 and 7.8 A, respectively). We suggest that this glutamic acid may instead facilitate the free phosphates’ trafficking between the catalytic pocket and the solvent.
[0138] Using the multialignment data, we constructed a structurally-informed phylogenetic tree of HD phosphohydrolases (Fig. 15). Aside from following the typical distribution into the tree domains of life, it suggests that the ancestor of DatZ was acquired from a bacterial variant; the closest DatZ homologs found in BLAST represent the phyla of g-proteobacteria and firmicutes (excluding the immediate viral clade), in conformity with this hypothesis.
[0139] Although diverse in sequence, the monomeric structures of the other known HD phosphohydrolases are very similar (Fig. 16a), with an average RMSD on Cα atoms of 2.75 Å. Despite the fact that only a dimer was described for related bacterial HD phosphohydrolases 44,45 , we discovered that the same hexameric quaternary state could be found by generating their symmetry -related mates using the space-group symmetry operators (Supplementary Fig. 16b). In fact, a high multimeric state (>3) has been also reported in vitro for YfbR 43 , compatible with our hypothesis.
[0140] As all residues crucial for the reaction in DatZ are conserved or replaced by similar residues in other structures, we suggest that the two-metal-ion mechanism described above is universal for all HD phosphohydrolases, completing previous reports by the identification of metal ion site B and correcting the role of residue E70 counterparts (Supplementary Fig. 16c). Interestingly, OxsA replaced the positively charged K116 with E129 bearing negative charge; we propose that it is this exception that facilitates the accommodation of a third divalent metal ion observed in OxsA and absent in DatZ, which efficiently neutralises the charge of the triphosphate.
[0141] In conclusion, we note that the strategy adopted by the phage S-2L phage is most probably shared with related phages containing homologous datZ and purZ genes. It is very similar to the strategy adopted by the T2, T4 and T6 phages that contain a substantial amount of hydroxymethylcytosine, relying on a dCTP triphosphatase to also shift the pool of available dNTPs in their host cell 6.
[0142] In the future, it will be interesting to see if datZ and purZ genes are sufficient for transferring 2-aminoadenine to the genomes of other organisms. Example 11: References Cited in Examples 1 to 10
[0143] 1. H. Gommers-Ampt, J. & Borst, P. Hypermodified bases in DNA. The FASEB Journal 9, 1034-1042 (1995).
[0144] 2. Weigele, P. & Raleigh, E. A. Biosynthesis and Function of Modified Bases in Bacteria and Their Viruses. Chem. Rev. 116, 12655-12687 (2016).
[0145] 3. Iyer, L. M., Zhang, D., Maxwell Burroughs, A. & Aravind, L. Computational identification of novel biochemical systems involved in oxidation, glycosylation and other complex modifications of bases in DNA. Nucleic Acids Res 41, 7635-7655 (2013).
[0146] 4. Jeudy, S. et al. The DNA methylation landscape of giant viruses. Nature Communications 11, 2657 (2020).
[0147] 5. Wyatt, G. R. & Cohen, S. S. The bases of the nucleic acids of some bacterial and animal viruses: the occurrence of 5-hydroxymethylcytosine. Biochemical Journal 55, 774-782 (1953).
[0148] 6. Koerner, J. F., Smith, M. S. & Buchanan, J. M. Deoxycytidine Triphosphatase, an Enzyme Induced by Bacteriophage Infection. J. Biol. Chem. 235, 2691-2697 (1960).
[0149] 7. Fee, Y.-J. et al. Identification and biosynthesis of thymidine hypermodifications in the genomic DNA of widespread bacterial viruses. PNAS 115, E3116-E3125 (2018).
[0150] 8. Gupta, R. Halobacterium volcanii tRNAs. Identification of 41 tRNAs covering all amino acids, and the sequences of 33 class I tRNAs. J. Biol. Chem. 259, 9461-9471 (1984). [0151] 9. Kulikov, E. E. et al. Genomic Sequencing and Biological Characteristics of a Novel Escherichia Coli Bacteriophage 9g, a Putative Representative of a New Siphoviridae Genus. Viruses 6, 5077-5092 (2014).
[0152] 10. Ngazoa-Kakou, S. et al. Complete Genome Sequence of Escherichia coli Siphophage BRET. Microbiol Resour Announc 8, e01644-18 (2019).
[0153] 11. Hutinet, G. et al. 7-Deazaguanine modifications protect phage DNA from host restriction systems. Nat Commun 10, 1-12 (2019).
[0154] 12. Kirnos, M. D., Khudyakov, I. Y., Alexandrushkina, N. I. & Vanyushin, B. F. 2- Aminoadenine is an adenine substituting for a base in S-2L cyanophage DNA. Nature 270, 369 (1977). [0155] 13. Santhosh, C. & Mishra, P. C. Electronic spectra of 2-aminopurine and 2,6- diaminopurine: phototautomerism and fluorescence reabsorption. Spectrochimica Acta Part A: Molecular Spectroscopy 47 , 1685-1693 (1991).
[0156] 14. Szekeres, M. & Matveyev, A. V. Cleavage and sequence recognition of 2,6- diaminopurine-containing DNA by site-specific endonucleases. FEBS Letters 222, 89-94 (1987). [0157] 15. Bailly, C. & Waring, M. J. The use of diaminopurine to investigate structural properties of nucleic acids and molecular recognition between ligands and DNA. Nucleic Acids Res 26, 4309-4314 (1998).
[0158] 16. Solis-Sanchez, A. et al. Genetic characterization of 0VC8 lytic phage for Vibrio cholerae 01. Virology Journal 13, 47 (2016).
[0159] 17. Iyer, L. M., Koonin, E. V., Leipe, D. D. & Aravind, L. Origin and evolution of the archaeo-eukaryotic primase superfamily and related palm-domain proteins: structural insights and new members. Nucleic Acids Res 33, 3875-3896 (2005).
[0160] 18. Guilliam, T. A., Keen, B. A., Brissett, N. C. & Doherty, A. J. Primase-polymerases are a functionally diverse superfamily of replication and repair enzymes. Nucleic Acids Research 43, 6651-6664 (2015).
[0161] 19. Kazlauskas, D. et al. Novel Families of Archaeo-Eukaryotic Primases Associated with Mobile Genetic Elements of Bacteria and Archaea. Journal of Molecular Biology 430, 737- 750 (2018).
[0162] 20. Geibel, S., Banchenko, S., Engel, M., Lanka, E. & Saenger, W. Structure and function of primase RepB' encoded by broad-host-range plasmid RSF1010 that replicates exclusively in leading-strand mode. PNAS 106, 7810-7815 (2009).
[0163] 21. Liu, B. et al. A primase subunit essential for efficient primer synthesis by an archaeal eukaryotic-type primase. Nat Commun 6, 1-11 (2015).
[0164] 22. Yan, J., Holzer, S., Pellegrini, L. & Bell, S. D. An archaeal primase functions as a nanoscale caliper to define primer length. PNAS 115, 6697-6702 (2018).
[0165] 23. Zhu, B. et al. Deep-sea vent phage DNA polymerase specifically initiates DNA synthesis in the absence of primers. PNAS 114, E2310-E2318 (2017). [0166] 24. Braithwaite, D. K. & Ito, J. Compilation, alignment, and phylogenetic relationships of DNA polymerases. Nucleic Acids Res 21, 787-802 (1993).
[0167] 25. Raia, P., Delarue, M. & Sauguet, L. An updated structural classification of replicative DNA polymerases. Biochemical Society Transactions 47, 239-249 (2019). [0168] 26. Monttinen, H. A. M., Ravantti, J. J. & Poranen, M. M. Common Structural Core of Three-Dozen Residues Reveals Intersuperfamily Relationships. Mol Biol Evol 33, 1697-1710 (2016).
[0169] 27. Jones, D. T. & Cozzetto, D. DISOPRED3: precise disordered region predictions with annotated protein-binding activity. Bioinformatics 31, 857-863 (2015). [0170] 28. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. Journal of Molecular Biology 215, 403-410 (1990).
[0171] 29. Citovsky, V., Vos, G. D. & Zambryski, P. Single-Stranded DNA Binding Protein Encoded by the virE Locus of Agrobacterium tumefaciens. Science 240, 501-504 (1988).
[0172] 30. Zimmermann, L. et al. A Completely Reimplemented MPI Bioinformatics Toolkit with a New HHpred Server at its Core. Journal of Molecular Biology 430, 2237-2243 (2018). [0173] 31. Steitz, T. A., Smerdon, S. J., Jager, J. & Joyce, C. M. A unified polymerase mechanism for nonhomologous DNA and RNA polymerases. Science 266, 2022-2025 (1994). [0174] 32. Dominguez, O. et al. DNA polymerase mu (Pol m), homologous to TdT, could act as a DNA mutator in eukaryotic cells. The EMBO Journal 19, 1731-1742 (2000). [0175] 33. Gill, S. et al. A highly divergent archaeo-eukaryotic primase from the
Thermococcus nautilus plasmid, pTN2. Nucleic Acids Res 42, 3707-3719 (2014).
[0176] 34. Diaz-Talavera, A. et al. A cancer-associated point mutation disables the steric gate of human Prim Pol. Sci Rep 9, 1-13 (2019).
[0177] 35. Zhu, H. et al. Atomic structure and nonhomologous end-joining function of the polymerase component of bacterial DNA ligase D. PNAS 103, 1711-1716 (2006).
[0178] 36. Kilkenny, M. L., Longo, M. A., Perera, R. L. & Pellegrini, L. Structures of human primase reveal design of nucleotide elongation site and mode of Pol a tethering. PNAS 110, 15961-15966 (2013). [0179] 37. Rechkoblit, O. et al. Structure and mechanism of human PrimPol, a DNA polymerase with primase activity. Science Advances 2, el601317 (2016).
[0180] 38. Guo, H. et al. Crystal structures of phage NrS-1 N300-dNTPs-Mg2+ complex provide molecular mechanisms for substrate specificity. Biochemical and Biophysical Research Communications 515, 551-557 (2019).
[0181] 39. Brissett, N. C. et al. Structure of a Preternary Complex Involving a Prokaryotic NHEJ DNA Polymerase. Molecular Cell 41, 221-231 (2011).
[0182] 40. Holzer, S. et al. Structural Basis for Inhibition of Human Primase by Arabinofuranosyl Nucleoside Analogues Fludarabine and Vidarabine. ACS Chem. Biol. 14, 1904-1912 (2019).
[0183] 41. Calvo, P. A. et al. The invariant glutamate of human PrimPol DxE motif is critical for its Mn2+-dependent distinctive activities. DNA Repair 77, 65-75 (2019).
[0184] 42. Aravind, L. & Koonin, E. V. The HD domain defines a new superfamily of metal- dependent phosphohydrolases. Trends in Biochemical Sciences 23, 469-472 (1998).
[0185] 43. Proudfoot, M. et al. General Enzymatic Screens Identify Three New Nucleotidases in Escherichia coli BIOCHEMICAL CHARACTERIZATION OF SurE, YfbR, AND YjjG. J. Biol. Chem. 279, 54687-54694 (2004).
[0186] 44. Bridwell-Rabb, J., Kang, G., Zhong, A., Liu, H. & Drennan, C. L. An HD domain phosphohydrolase active site tailored for oxetanocin-A biosynthesis. PNAS 113, 13750-13755 (2016).
[0187] 45. Zimmerman, M. D., Proudfoot, M., Yakunin, A. & Minor, W. Structural Insight into the Mechanism of Substrate Specificity and Catalytic Activity of an HD-Domain Phosphohydrolase: The 5'-Deoxyribonucleotidase YfbR from Escherichia coli. Journal of Molecular Biology 378, 215-226 (2008).
[0188] 46. Nittinger, E., Schneider, N., Lange, G. & Rarey, M. Evidence of Water Molecules — A Statistical Evaluation of Water Molecules Based on Electron Density. J. Chem. Inf. Model. 55, 771-783 (2015). [0189] 47. Womska, M., Grabowsky, S., Dominiak, P. M., Wozniak, K. & Jayatilaka, D. Hydrogen atoms can be located accurately and precisely by x-ray crystallography. Science Advances 2, el600192 (2016).
[0190] 48. Dokmanic, I., Sikic, M. & Tomic, S. Metals in proteins: correlation between the metal-ion type, coordination number and the amino-acid residues involved in the coordination. Acta Crystallographica Section D 64, 257-263 (2008).
[0191] 49. Outten, C. E. & O’Halloran, and T. V. Femtomolar Sensitivity of
Metalloregulatory Proteins Controlling Zinc Homeostasis. Science 292, 2488-2492 (2001). [0192] 50. Rajeshwari, K. & Rajashekhar, M. Biochemical Composition of Seven Species of Cyanobacteria Isolated from Different Aquatic Habitats of Western Ghats, Southern India. Brazilian Archives of Biology and Technology 54, 849-857 (2011).
[0193] 51. Kim, E. E. & Wyckoff, H. W. Reaction mechanism of alkaline phosphatase based on crystal structures: Two-metal ion catalysis. Journal of Molecular Biology 218, 449-464 (1991). [0194] 52. Holm, L. Benchmarking fold detection by DaliLite v.5. Bioinformatics doi: 10.1093/bioinformatics/btz536.
[0195] 53. Wheeler, D. L. et al. Database resources of the National Center for Biotechnology. Nucleic Acids Res 31, 28-33 (2003).
[0196] 54. Sauguet, L., Raia, P., Henneke, G. & Delarue, M. Shared active site architecture between archaeal PolD and multi-subunit RNA polymerases revealed by X-ray crystallography. Nature Communications 7, 12227 (2016).
[0197] 55. Weber, P. et al. High-Throughput Crystallization Pipeline at the Crystallography Core Facility of the Institut Pasteur. Molecules 24, 4451 (2019).
[0198] 56. Kabsch, W. XDS. Acta Cryst D 66, 125-132 (2010). [0199] 57. Legrand, P. XDS Made Easier (2017) GitHub repository.
[0200] 58. Liebschner, D. et al. Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix. Acta Cryst D 75, 861-877 (2019).
[0201] 59. Bricogne, G. et al. BUSTER. (Global Phasing Ltd.). [0202] 60. Sheldrick, G. M. Experimental phasing with SHELXC/D/E: combining chain tracing with density modification. Acta Cryst D 66, 479-485 (2010).
[0203] 61. Vanommeslaeghe, K. et al. CHARMM general force field: A force field for drug- like molecules compatible with the CHARMM all-atom additive biological force fields. Journal of Computational Chemistry 31, 671-690 (2010).
[0204] 62. Huang, J. & MacKerell, A. D. CHARMM36 all-atom additive protein force field: Validation based on comparison to NMR data. Journal of Computational Chemistry 34, 2135— 2145 (2013).
[0205] 63. Humphrey, W., Dalke, A. & Schulten, K. VMD: Visual molecular dynamics. Journal of Molecular Graphics 14, 33-38 (1996).
[0206] 64. Phillips, J. C. et al. Scalable molecular dynamics with NAMD. Journal of Computational Chemistry 26, 1781-1802 (2005).
[0207] 65. Madeira, F. et al. The EMBL-EBI search and sequence analysis tools APIs in 2019. Nucleic Acids Res 47, W636-W641 (2019). [0208] 66. Kumar, S., Stecher, G., Li, M., Knyaz, C. & Tamura, K. MEGA X: Molecular
Evolutionary Genetics Analysis across Computing Platforms. Mol Biol Evol 35, 1547-1549 (2018).
[0209] 67. Crooks, G. E., Hon, G., Chandonia, J.-M. & Brenner, S. E. WebLogo: A Sequence Logo Generator. Genome Res. 14, 1188-1190 (2004). [0210] 68. Pei, J., Kim, B.-H. & Grishin, N. V. PROMALS3D: a tool for multiple protein sequence and structure alignments. Nucleic Acids Res 36, 2295-2300 (2008).
[0211] 69. Robert, X. & Gouet, P. Deciphering key features in protein structures with the new ENDscript server. Nucleic Acids Res 42, W320-W324 (2014).
[0212] 70. Pettersen, E. F. et al. UCSF Chimera — A visualization system for exploratory research and analysis. Journal of Computational Chemistry 25, 1605-1612 (2004).
[0213] 71. The PyMOL Molecular Graphics System, Version 1.8 Schrodinger, LLC. Example 12: Introduction to Examples 12-21
[0214] At least since the last universal common ancestor (LUCA), only four nucleobases - adenine (A), guanine (G), cytosine (C) and thymine (T) or its analogue, uracil (U) - are used to encode genetic information in DNA or RNA polymers, respectively, and their metabolic pathway is conserved across all branches of cellular life " This principle can be extended to the vast majority of smaller biological entities, such as viruses, which are important agents of evolution (2). Despite that, a genetic material may be subject to many natural nucleobase modifications. These modified nucelotides can either constitute an additional, epigenetic information, or exist as a consequence of an arms race with restriction-modification systems in the hosts. An illustration of this second possibility can be found in double-stranded DNA (dsDNA) bacteriophages from the order Caudovirales: for instance, phages T2, T4 and T6 systematically substitute 5- hydroxymethylcytosine (5hmC) for cytosine (3), whereas phage 9g contains archaeosine (G+) which replaces a quarter of its genomic guanine (4) ', enabling its DNA to resist 71 % of cellular restriction enzymes (5)
[0215] Although numerous nucleobase modifications exist, such alterations are made almost exclusively without changing the Watson-Crick base-pairing scheme. The only known exception to this rule was revealed by the discovery of cyanophage S-2L, belonging, like the aforementioned phage 9g, to the family Siphoviridae (6) S-2L abandons the usage of genomic adenine in favor of 2-aminoadenine (2,6-diaminopurine or Z), which has a supplementary amino group in position 2 of the purine ring (Fig. 17A). The resulting ZTGC-DNA of the cyanophage has an improved thermal stability with respect to classical ATGC-DNA (7, 8) and proves to be almost completely resistant to adenine-targeting restriction enzymes (9)
[0216] The metabolic pathway of 2-aminoadenine synthesis in S-2L are proposed. This identifies the key enzyme of Z metabolism as PurZ, a homologue of adenylosuccinate synthetases (AdSS, encoded by purA gene). It was shown that PurZ creates the immediate precursor of 2- aminoadenine monophosphate (dZMP), N6-succino-2-amino-2’-deoxyadenylate monophosphate (sadAMP), from standard dGMP and free aspartic acid (Asp) as substrates and ATP as the energy donor. This is in contrast to the reaction catalyzed by AdSS enzymes, that use inosine monophosphate (IMP) and GTP to produce sIMP, the direct precursor of AMP. The activity of PurZ as a succinoaminodeoxyadenylate synthase was confirmed in vitro for both cyanophage S- 2L and related Vibrio phage φVC8, another virus with known A-to-Z genomic substitution Additionally, the crystal structure of cφVC8 PurZ was reported with both dGMP and ATP substrates (PDB ID: 6FM1). The enzymes necessary for the subsequent conversion of sadAMP to dZMP and then dZTP are not encoded on the phages’ genomes, suggesting that they are provided by the bacterial hosts. Indeed, the corresponding enzymes of V. cholerae showed a relaxed purine specificity and thus complemented the pathway of dZTP metabolism of cφVC8 in vitro.
[0217] It was also shown that a dATP-specific triphosphatase (DatZ) eliminates dATP from the pool of available dNTPs substrates. Such alteration proved essential for conferring the nucleotide specificity to the otherwise non-discriminative DNA primase-polymerase (PrimPol) of cyanophage S-2L, that we identified as the sole DNA polymerase of S-2L. Through structural studies of both enzymes, we proposed a mechanistic rationale for their activities and specificities, or lack thereof. Based on the co-conservation of both purZ and datZ genes, we expanded the idea of dATP depletion to related ZTGC-DNA-containing viruses.
[0218] Here we broaden the substrate spectrum of S-2L’s PurZ by confirming its alternative role of PurZ as a dATPase, that can nevertheless stay active after the depletion of dATP from the cellular pool of nucleotides by using ATP. We solved its structure in a complex with dGMP as a substrate and dATP as an alternative energy donor. Moreover, through close inspection of all closely related Siphoviridae phages, we identify that gene variants of MazG-like nucleotide pyrophosphatase (MazZ) compose the third and final element of a conserved cluster, which we call the Z-cluster. We demonstrate the specificity of S-2L’s MazZ for both guanosine and deoxyguanosine triphosphates, resulting in GMP or dGMP; production of the latter places the enzyme directly upstream of PurZ in the 2-aminoadenine synthesis pathway. By resolving the crystal structure of MazZ bound to the first dephosphorylation product of dGTP, we identify crucial residues important for the enzyme’s activity. We propose that the underlying two-metal- ion mechanism occurs in two steps, which involve all three catalytic ions identified in the catalytic pocket.
[0219] To confirm the necessity and sufficiency of the Z-cluster for efficient ZTGC-DNA synthesis in cellulo, we expressed the genes of datZ, mazZ and purZ in E. coli. Although toxic to the bacteria, the system was able to convert significant amount of DNA’s adenine to 2- aminoadenine: up to 2.2% for phage T7 infecting the induced cells, and up to 27.5% in newly replicated plasmids.
Example 13: Materials and Methods Used in Examples 14-19
S-2L genome sequencing, annotation and homology with other Siphoviridae
[0220] S-2L DNA was isolated from phage lysate of Synechococcus elongatus culture, adapting the techniques commonly used for phage X DNA. The S-2L genomic library was prepared by successive DNA fragmentation, adapter ligation and amplification by GATC Biotech. Libraries were sequenced using Illumina HiSeq. 14,198,980 sequenced reads (2 x 150 bp) were obtained, covering 4,259,694,000 bases. Resulting reads were mapped against the GenBank AX955019.1 S-2L reference sequence using Minimap2 v2.17 (47). Variant calling was carried out by Freebayes vl.3.2 (48) and later filtered by VCFLIB (49). A consensus sequence was generated using VCF Consensus Builder vO.1.0 (50). Annotation of the consensus sequence was earned out by translated BLAST (17). Representation of the S-2L genome was made with SnapGene Viewer (51). Phages related to S-2L through purZ and datZ genes were found by homology searches using NCBI BLAST (17).
Protein expression and purification
[0221] Synthetic genes for expressed proteins were optimized for E. coli and synthesized using ThermoFisher’s Gene Art service. Genes were cloned into modified RSFl-Duet expression vector with a TEV-cleavable N-terminal 14-histidine tag (52) using New England Biolabs and Anza (Thermo Fisher Scientific) enzymes. E. coli BL21-CodonPlus (DE3)-RIPL cells (Agilent) were separately transformed with engineered plasmids. Bacteria were cultivated at 37°C in LB medium with appropriate antibiotic selection (kanamycin and chloramphenicol), and induced at OD = 0.6- 1.0 with 0.5 mM IPTG. After incubation overnight at 20°C, cells were harvested and homogenized in suspension buffer: 50 mM Tris-HCl pH 8, 400 mM NaCl, 5 mM imidazole. After sonication and centrifugation of bacterial debris, corresponding lysate supernatants were supplemented with Benzonase (Sigma- Aldrich) and protease inhibitor (Thermo Fisher Scientific), 1 μl and 1 tablet per 50 ml, respectively. Proteins of interest were isolated by purification of the lysate on Ni-NTA column (suspension buffer as washing buffer, 500 mM imidazole in elution buffer). Histidine tags were removed from the proteins by incubation with his-tagged TEV enzyme overnight. After removing TEV on Ni-NTA column, proteins were further purified on Superdex 200 10/300 column with 25 mM Tris-HCl pH 8, 300 mM NaCl. All purification columns were from Life Sciences. Protein purity was assessed on an SDS gel (BioRad). The enzymes were concentrated to 10-19.5 mg ml-1 with Amicon Ultra 10k and 30k MWCO centrifugal filters (Merck), flash frozen in liquid nitrogen and stored directly at -80°C, with no glycerol added.
Nucleotide HPLC analysis
[0222] 30 μM (1.25 mg ml 1) of S-2L PurZ or 4.2 μM (0.2 mg ml-1) of E. coli AdSS was incubated at 37°C for lh (if not stated otherwise) with 2 mM of respective nucleotides and 10 mM of aspartate, in a buffer containing 50 mM Tris pH 7.5 and 5 mM MgSO4. For S-2L MazZ, 10 μM of the enzyme was incubated at 37°C for 15 min with 100 pM of respective nucleotides, in a buffer containing 50 mM Tris pH 7.5 and 5 mM MgCl2 Reaction products were separated from the protein using 10000 MWCO Vivaspin-500 centrifugal concentrators and stored in - 20°C. Products and standards were assayed separately, using around 40 nmol of each for anion- exchange HPLC on DNA-PACIOO (4x50 mm) column (Thermo Fisher Scientific). After equilibration with 150 pi of a suspension buffer (25 mM Tris-HCl pH 8, 0.5% acetonitrile), nucleotides were injected on the column and eluted with 3 min of isocratic flow of the suspension buffer followed by a linear gradient of 0-200 mM NH4C1 over 12 min (1ml min-1). Eluted nucleotides were detected by absorbance at 260 nm, measured in arbitrary units [mAu] High- purity nucleotides and chemicals were bought from Sigma Aldrich, and HPLC-quality acetonitrile was from Serva.
Transplantation of the Z-cluster into E. coli and 2-aminoadenine detection
[0223] Gene datZ was cloned onto plasmid pETlOO/D-TOPO providing ampicillin resistance. Genes purZ and mazZ were cloned into modified pRSFl-Duet providing kanamycin resistance, in slots 1 (his-tagged) and 2, respectively, each with a ribosome-binding site upstream. Constructs with partial Z-cluster did not include mazZ in pRSFl-Duet slot 2 (datZ/purZ or purZ alone) and further had datZ replaced by mazZ on pETlOO/D-TOPO (mazZ/purZ). One or both plasmids were used to transform E. coli BL21-CodonPlus (DE3)-RIPL cells.
[0224] For plasmid assays, bacteria were cultivated at 37°C in FB medium with appropriate antibiotic selection and induced with 1 mM IPTG at OD = 0.60 (± 0.03). After 2 hours, plasmids were isolated with NucleoSpin Plasmid QuickPure kit (Macherey-Nagel) and suspended in water. For phage assays, bacteria were cultivated similarly but induced with 0.5 mM IPTG at stationary phase and incubated for 1 hour. The cultures were infected with phages (MOI = 1) and incubated for 3h. Phages were separated from bacterial lysates by centrifugation and filtering (0.22 pm). They were pre-treated with 120 U DNase, 40 pg ml-1 RNase and 10 mM MgCl2 (37°C, 30 min), blocked with 20 mM EDTA and proteolysed with 0.5% proteinase K (55°C, 30 min). DNA was isolated with 3 rounds of phenol/chloroform extraction using MaXtract High Density tubes (Qiagen). To 5.6 ml of the final aqueous phase was added 620 μL of Sodium Acetate (3M, pH = 5.2) and 12.4 mL of absolute ethanol. The mix was gently homogenised, left on ice for 30 min to precipitate and centrifuged. The supernatant was removed by inversion and the remaining pellet was washed with 3 ml of 70% EtOH. After another centrifugation samples were left to evaporate for 1 hour and solubilized in 10 mM Tris pH 8.
[0225] DNA of plasmids and phages was digested to nucleosides with Nucleoside Digestion Mix (NEB) and separated on Amicon Ultra-0.5 mF 10K centrifugal filters. Nucleosides were analysed by FCMS, with standard nucleoside controls (Sigma Aldrich) or dZ (Biosynth Carbosynth). dZ was found to elute at the same position than dG, but with strikingly different MS/MS profiles. The A-to-Z substitution rate was taken as a ratio of dZ content to dZ+dA; it was further normalized to newly synthesized plasmid fraction by isolating plasmids prior to induction and accounting for the difference in the DNA yield.
Crystallography and structural analysis
[0226] All crystallization conditions were screened using the sitting drop technique on an automated crystallography platform (53) and were reproduced manually using the hanging drop method with ratios of protein to well solution ranging from 1:2 to 2:1. PurZ was screened at 19 mg ml-1 with a molar excess of 1.2 of dGMP in 4°C. Capped thick rods grew over several days in 15% Tacsimate and 2% w/v PEG 3350 buffered with 100 mM HEPES pH 7, and did not appear in the absence of dGMP. MazZ was screened at 14.7 mg ml-1 with a molar excess of 1.2 of dGTP at 18°C. Big bundles of thin needles grew over a week in 200 mM L12SO4 and 1.26 M (NH4)2SO4 buffered with 100 mM Tris pH 8.5. PurZ crystals were soaked for 15 min in a solution containing 30% glycerol, 50% dATP solution (100 mM) and 20% crystallization buffer; MazZ crystals were soaked for several seconds with 30% glycerol and 70% crystallization buffer, with added 30 mM MnCb. All crystals were then frozen in liquid nitrogen. Crystallographic data was collected at the Soleil synchrotron in France (beamlines PX1 and PX2), processed by XDS (54) with the XDSME (55) pipeline and refined in Phenix (56). Nucleotide constraints for structure refinement were obtained using Grade Web Server (57). The structure of S-2L PurZ was solved by molecular refinement with φVC8 PurZ model (PDB ID: 6FM1). The structure of MazZ was solved using anomalous signal from bound Mn2+ ions that guided automatic model-building in Phenix’ AutoSol, with final manual reconstruction steps using Coot (58). PurZ quaternary structure analysis was performed using PISA (59).
DNA polymerase assay
[0227] Fluorescence polymerase activity test was executed in 20 mM Tris-HCl pH 7 and 5 mM MgCl2, with 3 μM of dT24 overhang DNA template, 1 μM of FAM 5 ’-labeled DNA primer and various concentrations of either dATP or dZTP. The Klenow polymerase was added to final concentration of 5 U in 50 pi. The assay was conducted at 37°C for 5 min. [0228] Before adding the protein, DNA was hybridized by heating up to 95°C and gradually cooling to reaction temperature. Reactions were terminated by adding two volumes of a buffer containing 10 mM EDTA, 98% formamide, 0.1% xylene cyanol and 0.1% bromophenol blue, and stored in 4°C. Products were preheated at 95°C for 10 min, before being separated with polyacrylamide gel electrophoresis and visualised by FAM fluorescence on Typhoon FLA 9000 imager. All oligonucleotides were ordered from Eurogentec, chemicals from Sigma-Aldrich, Klenow polymerase from Takara Bio, dATP from Fermentas (Thermo Fisher Scientific) and dZTP from TriLink BioTechnologies.
Structure and sequence alignments phylogeny
[0229] Structures homologous to PurZ and MazZ available in PDB were identified using Dali server (36) Dali was further used for pairwise RMSD determination. The sequences were aligned in PROMALS3D (37) using structural data supplemented by full protein sequences. Graphical multialignments were prepared with ESPript 3 (60). Maximum-likelihood phylogenetic tree was prepared with MEGA X software (61 ) with default parameters, taking 100 bootstrap replications. All protein structures were visualised with Chimera (62) and Pymol (63).
Example 14: Genome of cyanophage S-2L and its relatives: a conserved Z-cluster
[0230] The genome of phage S-2L was made available in 2004 with routine sequencing methods (GenBank AX955019); however, due to several discrepancies with homologous genes, we decided to re-sequence it using the next-generation sequencing (NGS) technology. Our new high- coverage sequence is now deposited in GenBank under the accession number MW334946 (Fig. 17B). We identified 56 genes distributed evenly across the DNA sequence, with a total length equivalent to 89.1% of the genome. They are partitioned into several functional blocks, involved in transcription, 2-aminoadenine pathway, replication, capsid formation and virion release. This modularity is reminiscent of the one observed in other phages from Siphoviridae family (10 11 ) the family to which cyanophage S-2L belongs, including the model phage λ (12). We found no Shine-Dalgarno (SD) motifs upstream the genes, which is consistent with low SD motif conservation in cyanobacteria (13, 14), closely mimicked by cyanophages (15). The elevated GC content (69.4%) is fairly constant across the whole genome. Interestingly, the S-2L genome can be divided into two parts of roughly equal length, where almost all genes follow only one direction of translation (Fig. 17B); on their junction we found a gene homologous to marR which usually represses transcription on such intersections (6).
[0231] It was also established that the replication-related block includes pplA encoding a non- selective primase-polymerase (PrimPol). Next to this block are found the genes of N6-succino-2- amino-2’-deoxyadenylate synthetase (purZ) and dATP-specific HD phosphohydrolase (datZ) involved in 2-aminoadenine metabolism. Importantly, the new sequences of all these genes are identical with the previous ones. However, examination of the re-sequenced S-2L genome unravelled the presence of a third gene placed tightly between purZ and datZ, whose product corresponds to a MazG-like pyrophosphatase: for naming consistency, we will hereafter call it MazZ.
[0232] Using NCBI BLAST ( 17) we compared the genome of S-2L with its close relatives having both datZ and purZ genes. We identified 12 phage sequences with complete genome coverage, including the previously investigated Vibrio phage φVC8. On Fig. 17C we show the relevant sections of their genomes, along with their phylogeny constructed on datZ and purZ. We could distribute all phages into four major clusters that are correlated with similar genome architecture. Interestingly, cyanophage S-2L has a unique genome composition, much different from closely related Siphoviridae phages. Finally, we observed that the mazZ gene is always co-conserved with datZ and purZ, although in one of two possible isoforms. For the sake of precision, we will further specify between isoforms MazZ-1 (S-2L-like) and MazZ-2 (φVC8-like) when relevant. As a note, MazZ-1 seems to undergo frequent N-terminal fusions with a HNH nuclease domain or a different, unidentified domain, but not in S-2L.
[0233] Ultimately, we define the closely correlated datZ, mazZ and purZ genes as the 2- aminoadenine cluster, or the Z-cluster. Further below, after functional and structural investigation of S-2L PurZ and MazZ we demonstrate that this cluster can be introduced to E. coli, resulting in a significant A-to-Z substitution in bacterial plasmids and coliphages. Example 15: S-2L PurZ alternative dATPase activity
[0234] We have cloned and overexpressed a codon-optimized version of S-2L’s gene in E. coll. After purification of the product we performed catalytic tests with various nucleotide substrates, and followed the appearance of the products by HPLC (Fig. 18). Structural analysis of (pVC8’s close homologue (PDB ID: 6FM1) suggested no particular selection on the 2’-OH group of bound ATP; thus, we sought to compare the reaction time-course of S-2L PurZ with either dATP or ATP as the phosphate donor (Fig. 18 A). The result shows no specificity of the enzyme towards the ribo- and deoxy- variants: the molecular nature of this lack of discrimination and its importance for the phage are discussed in the following sections.
[0235] As a control, we confirmed the previously reported activity of PurZ, as well as the E. colt adenylosuccinate synthetase (AdSS) (18) in our experimental setup (Fig. 18B).
Example 16: Structure of S-2L PurZ with bound dGMP and dATP at 1.7 A resolution
[0236] We crystallized S-2L PurZ with two of its substrates, dGMP and dATP, and solved its structure at 1.7 A resolution using the related structure of φVC8 (PDB ID: 6FM1) as a template in molecular replacement (Table 4, Fig. 19A). In accordance with the 41% sequence identity between the two proteins, their structures are highly similar, with an RMSD of 1.6 A. Both ligands have exactly the same binding mode as previously described for φVC8 PurZ with dGMP and ATP (Fig. 19B): notably, there is virtually no difference in the position of the 2’ carbon atom of dATP in our structure and ATP in the structure of the homologue.
[0237] Enzymes from AdSS family are known to be functional dimers (19, 20) or even tetramers (21). Even though there is one molecule of S-2L PurZ in the crystallographic asymmetric unit, a dimer can be reconstructed by using a crystallographic two-fold axis (Fig. 19C). As expected, there is a large surface interaction area of 2488 A2 between the pair of proteins. Although the enzyme crystallized in a different space group from the one observed in φVC8 enzyme’s crystals, this observation can be extended to the original PurZ structure as well: analysis of crystal contacts between symmetry-related molecules with PISA indicates indeed the presence of a stable dimer in this case. [0238] Very low B-factors (Fig. 19D), especially on the interface, suggest an exceptional overall rigidity of the dimer; the only exception is the Y272-L278 loop above the reactants. This flexible loop corresponds to the loop 299-304 in E. colt AdSS, shown to be ordered only in the presence of a non-hydrolyzable analogue of the aspartate substrate, hadacidin (thus referred to as “the aspartate loop”) (22). The tip of the aspartate loop (T273-T276) is completely conserved between S-2L and φVC8 , both in sequence and structure. The signature G273T mutation in the PurZ clade is possibly important for the particular activity. dGMP-binding site
[0239] Residues G22, S23, N49, A50, T132, QI 90, V241 and the backbone atoms of Y21 form a pocket where the base moiety of dGMP is placed. Y21-S23 are positioned next to the amino group of dGMP in position 2, which is absent in the IMP substrate of AdSS. The sidechain of S23 in the vicinity of this amino group appears to be specific of phagic PurZ and their immediate homologues as a mutation from a conserved aspartate and is likely to be responsible for the guanine specificity. Its negative partial charge is further stabilized by the close R280 sidechain. Residues G130, S 131 , R146, ¥203, C204, T205 and R240 complete the dGMP pocket. R146 is protruding from the dimer's other molecule, forming an ion pair with the negative charge of the α-phosphate, a feature already described for E. colt AdSS as well (23). Lastly, V241 is ideally placed to sterically interfere with the ribose 2'-OH group in ribonucleotides, establishing preference towards the substrate in the deoxy- form. Although it is conserved in both AdSS and PurZ, its position is spatially shifted in the latter: the reason behind this difference is examined in the discussion section.
(d)ATP -binding site
[0240] Just like ATP in the PurZ structure of φVC8 , the base moiety of dATP is stabilized by stacking interactions with F306 and P336 residues. Oxygen atoms from the side-chains of N305, Q308 and the backbone of G335 form hydrogen bonds with adenine’s amino group in position 6. Importantly, the amino group of Q308 would destabilize the amino group in position 2 of bases G and Z; the same effect is observed with the amino group of N297 in φVC8 . Identically to its homologue, the rest of the ligand interacts with S-2L PurZ almost exclusively through its triphosphate tail with residues S23, G25, G51, H52 and T53; the only residue also in contact with the deoxyribose moiety is T53, touching it from the C3' atom side. The position inferred for the 2'-OH of ATP would be entirely exposed to the solvent, identically to what is seen for φVC8 PurZ (PDB ID: 6FM1). This finding confirms the lack of a selection mechanism for ribose/deoxyribose energy donor variants in PurZ enzymes.
Example 17: A (d)GTP-specific S-2L MazZ belonging to MazG-like pyrophosphatase family
[0241] The third element of Z-cluster, MazZ, is a MazG-like pyrophosphatase. Closely related MazG(-like) and HisE proteins share an evolutionary history with dimeric dUTPases, and all three constitute the all-a NTP pyrophosphohydrolase (NTP-PPase) superfamily (24). MazG and MazG- like proteins play house-cleaning or related functions, like degrading an alarmone (p)ppGpp (25) or aberrant dUTP (26). On the other hand, HisE members are closely related and are involved in bacterial histidine synthesis pathway as a phosphoribosyl- ATP pyrophosphatase (27), often fusing with HisI that catalyzes the following reaction in the pathway (28). Varying specificity of all NTP- PPase enzymes is reflected in the divergence of residues in contact with the ligand - only the catalytic residues and fold-related hydrophobic ones are consistently conserved across the superfamily (24). The former were identified to coordinate up to three Mg2+ ions, two of which were proposed to participate in the two-metal-ion mechanism (29, 30).
[0242] To complete our studies on the Z-cluster in cyanophage S-2L, we sought to investigate the catalytic properties of the virus’ MazZ. We cloned and overexpressed its gene, purified the enzyme and subjected it to HPLC tests (Fig. 20). MazZ is indeed capable of removing two terminal phosphates from a nucleotide triphosphate. Its preferential substrates are dGTP and GTP, rapidly dephosphorylated to dGMP and GMP, respectively. In contrast, S-2L MazZ shows no substantial activity with other deoxy nucleotides, including dZTP: the enzyme is a comparatively weak dTTPase, and trace of activity in dATP dephosphorylation starts to be visible only with incubation times 8 times longer than in usual conditions. Thus, S-2L MazZ evidently exerts strong discrimination on base moiety, but seemingly none on the 2’ -OH group. Example 18: Structure of S-2L MazZ, an NTP-PPase with a MazG-HisE fold
[0243] The basic unit of NTP-PPase enzymes is a four- or five-a-helical chain; in a notable exception of A. fulgidus MazG-like protein, an extended helix structurally compensates for the lack of another despite its opposite directionality (PDB ID: 2P06). In MazG(-like) and HisE enzymes, believed to represent the ancestral fold, two of these 4-helical chains intertwine, forming a tight dimer with two symmetric catalytic sites made from both subunits. Sometime later in evolution, the ancestor of dimeric dUTPases underwent gene duplication and fusion, creating a covalent equivalent of such a dimer that subsequently lost one redundant catalytic site (24). Most MazG-like and HisE proteins dimerize further through a hydrophobic surface, forming a tetramer with four active sites; dimeric dUTPases accordingly dimerize as well, in a geometrically similar arrangement (24). For boundary cases, the tetramers can be loosely bound (31 ) or do not appear altogether when two incompatible MazG domains are fused (32). To our knowledge, in spite of a number of determined MazG, MazG-like and HisE structures, it is the first time that their typical ternary and quaternary structure is recognized as identical (Fig. 23). Thus, we will refer to the original basic unit as the “MazG-HisE” fold.
[0244] We obtained crystals of MazZ and solved its structure, bound to the dephosphorylation product of dGTP and catalytic Mn2+ ions, at 1.43 A resolution (Table 4, Fig. 21); we made use of Mn2+ anomalous signal for ab initio structure determination. Contrary to other members of the all-a NTP-PPase superfamily, each MazZ protein chain has two additional b-strands on its C terminus (Fig. 21C, Fig. 23). Together, they form a homotetramer with a typical MazG-HisE fold. The whole tetramer constitutes the asymmetric unit, with very little deviations between the monomers (RMSD of 0.2-0.6 A). The only noticeable differences in electron density between the chains lie in the solvent-exposed D43-H46 flexible loop and on both N- and C-termini, partly influenced by crystallographic contacts.
[0245] The electron density of four deoxyguanine nucleotides, occupying each of the catalytic pockets, revealed the presence of two phosphate groups (Fig. 2 IB). It signifies that under the crystallization conditions the enzyme crystallized after having reduced dGTP to dGDP, but before reducing the latter to dGMP. The nucleotides are placed in very tight pockets, engulfed by the enzyme from every side except from the solvent-exposed b-phosphate. At these openings we found three catalytic Mn2+ ions, their presence confirmed by a strong anomalous signal. dGDP -binding site
[0246] Guanine nucleotides are almost completely buried inside the protein. From one side, the ligand is held by residues 112, W15, 116, N20, K31, E35, D53, 156, L57 and D60 of one chain. The second chain of the tight dimer completes the pocket from the other side with residues K76, N80, W85, A92, M93, R94 and H95.
[0247] Looking at the essential guanine functional groups, the closest residue to the oxygen atom in position 6 is N20, through its amide nitrogen at 3.2 A. On the other hand, guanine’s amino group in position 2 is only 3.0 A away from the carboxyl group of D60. Both of these hydrogen bond interactions are complemented with hydrophobic interactions between the protein and the purine ring. Hence, the specificity of MazZ towards guanine emerges from the cavity volume matching its shape and charge compatibility with the two essential chemical groups of guanine.
[0248] We observe no steric hindrance for the possible presence of the ribose 2'-OH group. Mutation of the closest 156 could potentially decrease the specificity for ribonucleotides; however, it also contacts the base’s 2-amino group and is surrounded by an intricate network of other residues. Thus, improving the specificity of MazZ towards deoxyribose is probably not a trivial task.
Catalytic Mn2+ ions
[0249] The three Mn2+ ions place themselves at the pocket’s opening: one between the a- and b-phosphates of dGDP, one on the opposite side of the b-phosphate, and one next to where the g- phosphate would extend. We name them ions A, B and C, respectively; they are strictly equivalent to the Mg2+ ions found in C. jejuni dimeric dUTPase (PDB 1W2Y), another member of the superfamily (29) The ion A is coordinated by residues E34, E35 and E38; ion B by the E50 and D53; and ion C by E38 and E50. With oxygen atoms from phosphates and water molecules filling the coordination shells, these ions are all coordinated in an octahedral fashion, as seen in the C. jejuni dUTPase. We propose that the two-metal-ion mechanism described for NTP-PPases occurs in fact with two independent and consecutive dephosphorylation steps, which is consistent with the presence of three catalytic ions and an intermediate diphosphate in MazZ structure. Additionally, residue R83, positioned only 2.8 A away from the b-phosphate, is probably important for the second excision step by stabilizing the penta-coordinated intermediate that appears during two-metal-ion catalysis (33)
Example 19: Full 2-aminoadenine metabolic pathway in S-2L and successful DNA conversion of unrelated bacteria and phages
[0250] We now propose a complete 2-aminoadenine metabolic pathway for cyanophage S-2L (Fig. 22A) that can be extended to closely related Siphoviridae phages. Upon infection by the virus, the available cellular pool of dNTPs is heavily modified by the three conserved enzymes composing the Z-cluster. On one hand, dATP is completely eliminated by dephosphorylation to dA by DatZ and to dADP by PurZ. On the other hand, MazZ uses a fraction of the dGTP pool to make dGMP, further transformed by PurZ to sadAMP, the direct precursor of deoxyaminoadenine monophosphate (dZMP).
[0251] The finishing steps of dZTP synthesis are carried out by non-specific host enzymes participating in dATP production, as their genes are absent in the viral genomes; they may even be additionally upregulated by the infected host cell sensing and fighting the depletion of its dATP reserve. The post-infection composition of cellular dNTP pool thus consists of dATP being replaced by dZTP, and the Primase-Polymerase of S-2L, non-specific to A or Z, readily inserts the surrogate base in the cyanophage DNA in front of any instructing thymine.
[0252] To test if the Z-cluster is indeed responsible for efficient Z-to-A substitution in cellulo, and that no aminoadenine-specific DNA polymerase is needed for reliable synthesis of ZTGC- DNA, we put the genes datZ, mazZ and purZ on compatible expression plasmids in E. coli and induced the cultures. The whole system proved to be toxic to bacteria when expressed, even for low concentrations of the inducer; however, that effect would not hamper viral infection process, as phages S-2L andφVC8 are known to be lytic (6 34). [0253] Using mass spectrometry, we were however able to quantify the content of Z in the newly synthesized plasmids (Fig. 22B). Bacteria in the exponential growth with the Z-cluster expressed over 2 hours substituted 14.6% of total plasmidic adenine content with 2- aminoadenine; after normalization by mass, it corresponds to 27.3% of Z in the plasmidic DNA synthesized after induction (Fig. 22C). When not induced, minimal leaking of the promotors resulted in 0.01% of Z content, which was not lethal to the cells. Importantly, incomplete Z- cluster expression yielded lower substitution rates: around 2.4 times lower without mazZ gene, and 6.4 times lower without datZ. Their effect is synergistic, as the expression of purZ alone resulted in 19.7 lower 2-aminoadenine incorporation. Therefore, we proved that bacterial DNA can be enriched with diaminopurine efficiently with synthetic introduction of the Z-cluster proteins alone. Additionally, we observed that DatZ lowers the overall DNA yield, that can be partially compensated when both MazZ and PurZ are expressed as well: this finding is compatible with the proposed 2-aminoadenine metabolism pathway.
[0254] We also infected induced E. coli bearing the Z-cluster with coliphages T7 and T4; they belong to different families of the order Caudovirales, which includes family Siphoviridae as well (35). Phage T7 managed to replicate in the cultures in exponential and stationary phases, with the respective 1.18% and 2.17% of A-to-Z substitution in the DNA isolated from the progeny, respectively; phage T4 yielded a lower final 2-aminoadenine incorporation rate, at 0.17%. New phages were still infectious, however the PFU count was two (T7) and one (T4) order of magnitude lower than in a control with phages infecting non-induced bacteria. This may result either from lower progeny yield from bacteria with a disturbed dNTP pool, or from lower infectiousness of phages with significant 2-aminoadenine incorporation.
[0255] The datZ, mazZ and purZ genes operably linked to appropriate regulatory sequences for their expression in E. coli were introduced into the genome of lambda phage using standard recombination techniques. E. coli culture was infected with the recombinant phage and phage particles produced during phage lytic cycle were harvested. Then, the content of Z in the newly synthesized recombinant phage genome was quantified using mass spectrometry as disclosed above for plasmids. Example 20: Discussion of Examples 12-19
Molecular basis for substrate selectivity in PurZ vs AdSS
[0256] With Dali (36) we found every AdSS family member available in the PDB. Using PROMALS3D (37) we extracted the existing structural information in order to construct a reliable structurally-informed sequence multialignment for both AdSS and PurZ (Fig. 24). 36 residues are strictly conserved, while 26 further residues have very little variability, making up for 17.3% of S-2L PurZ total length. These two classes cluster in the catalytic pocket and the surrounding layer, respectively (Fig. 25A). We note that several conserved residues seemingly important for the ternary structure of all family members (S-2L P256, F283, D18) do maintain the same physical position of their sidechain in PurZ, but are subjected to sequence rearrangements. Finally, 16 residues are unique to PurZ, but are otherwise strictly conserved in cellular AdSS. Their placement is intermediate compared to the previous classes, although several such residues (S23, T273 and Q308) interact with the substrates.
[0257] The position of the guanidino group of S-2L PurZ R244 corresponds to φVC8 PurZ K267, whose location on the backbone is shared with the otherwise conserved Arg residue (R303 in E. coll). However, whereas in AdSS this arginine balances the partial charge of 02’ of the ribose ring of IMP at 2.5-4 A distance (PDB IDs: 1P9B, 5134, 5K7X) and interacts with the free aspartate substrate (38), in viruses these residues extend noticeably further, being 7.9 A away from C2’ in S-2L and 8.5 A in φVC8 (PDB ID: 6FKO), precluding any possible stabilization of the ribonucleotide. Moreover, residue V241, mentioned above as important for ribose discrimination, is also present in standard AdSS family representatives. However, in phages’ PurZ an insertion deforms the loop that contains it, pulling it slightly closer to the C2' ribose atom - from 4.3-5.3 Å as seen in AdSS (PDB IDs: 1P9B, 2DGN, 2GCQ, 4M9D, 5134, 5K7X) to 3.7 A (S-2L) and 3.6 Å (φVC8).
[0258] Lastly, there are two large deletions specific to PurZ and AdSS of P. horikoshii involving a helix-loop-helix motif and a strand-strand-helix-strand structural element. They are both solvent-exposed and do not appear to intervene in the catalytic activity or protein dimerization. In contrast, S-2L PurZ has a unique additional C-terminal helix al l, that partially compensates for the second deletion (Fig. 25B). Using the structure-informed sequence alignment, we constructed a phylogenetic tree (Fig. 26) that supports the hypothesis on PurZ’s archaeal origin.
S-2L MazZ as a representative of MazZ- 1 and similarity with MazZ-2
[0259] With the exception of E34, all residues of S-2L MazZ coordinating catalytic ions are strictly conserved across all MazZ-1 homologues (Fig. 27). Most of the amino acids participating in nucleotide pocket formation tend to be conserved as well, although infrequent departures from the consensus are possible: only the residues 116, N20 and A92 placed around the nucleobase are unique to S-2L, which suggests alternative binding modes of the nucleotide for similar proteins. Importantly, residues 156 and D60 are kept by all MazZ-1 homologues, suggesting preserved guanine specificity.
[0260] As for MazZ-2, their distinct sequence hardly overlaps with MazZ-1 enzymes. However, 4 negatively charged residues are strictly conserved, their position corresponding to E35, E38, E50 and E53 of S-2L MazZ (Fig. 28). Finally, a counterpart of R83 important for the two-metal- ion mechanism is preserved in every MazZ enzyme from both isoforms.
[0261] Using Dali (36), we identified 15 PDB structures of MazG-like and HisE enzymes with common MazG-HisE fold. However, as the protein chains are very divergent, even the structure- guided sequence multialignment is not sensitive enough to unravel relations between S-2L MazZ and these representatives with a satisfactory level of confidence; Dali’s structural clustering performed poorly as well. BEAST searches indicate that the closest non-viral homologues of S- 2L MazZ are in great majority bacterial, mainly from the phyla Terrabacteria and Proteobacteria.
2-aminoadenine metabolic pathway and ZTGC-DNA
[0262] Two phosphohydrolases and one synthetase constitute the Z-cluster conserved in phages S-2L, φVC8 and their close relatives. Together with promiscuous cellular enzymes, they constitute the full 2-aminoadenine metabolic pathway that transforms an original dNTP pool, removing dATP and creating dZTP from dGTP. The kinetics of the latter has to be fine-tuned by viruses, as dGTP is also a crucial substrate for the synthesis of ZTGC-DNA. We note that in in vitro assays S-2L PurZ is overall less active than the homologous E. coli AdSS (Fig. 18), which may contribute to this effect.
[0263] We proved that the expression of S-2L synthetic genes in E. coli leads to substantial replacement of adenine with 2-aminoadenine in the DNA of the bacteria and infecting phages. Pre- and post-replicative modification pathways of genomic thymidine in bacteriophages have already been successfully transplanted to E. coli (39 ) or found to be active in its lysates (40) Here, however, the additional amino group on the Watson-Crick edge intervenes directly in the intrinsic property of the DNA, profoundly improving hybridization between two nucleic acid strands.
[0264] Importantly, DNA polymerases of E. coli and phages T7 and T4 allow for incorporation of 2-aminoadenine, which was partially documented before (41, 42). In addition, we show here that the Klenow fragment of E. coli Poll, a family A DNA polymerase participating in plasmid replication while not being the main replicative polymerase (43) ', does not discriminate between A and Z whatsoever (Fig. 29). We expect that this relaxed specificity holds true for most DNA polymerases, as a simple exclusion mechanism of the additional 2-amino group in a purine would hamper guanine incorporation.
[0265] We believe that the toxic effect of 2-aminoadenine in ATGC-DNA organisms such as E. coli stems from complex regulation of their genomes. To initiate gene transcription an RNA polymerase probes the minor groove of an AT-rich element (44) ', being directly sensitive to the presence of a supplementary 2-amino group there. Furthermore, uncoupling of the two strands needed for replication relies on a presence of other AT-rich sequences with lower melting point (45). Nonetheless, the melting point of a Z:T homopolymer was found to be intermediate between the A:T and G:C ones (46). Attuning the cellular machinery for the presence of 2-aminoadenine shall thus be in principle achievable, as it is possible for the less complex phages. In the future, a pure ZTGC-DNA synthetic organism would allow for a thermoresistant, more stable storage of genetic information that could also be faithfully retrieved. [0266] Lastly, the structural information on PurZ from S-2L and φVC8 could be used to re- engineer its active site. By changing its preferential substrate to GMP, the 2-aminoadenine pathway could in principle be altered towards the synthesis of ZTP and, given a compatible RNA polymerase, ultimately to synthetic ZTGC-RNA.
Figure imgf000073_0001
Figure imgf000074_0001
[0267] Table 1. Diffraction data collection and Model Refinement statistics. Numbers in parenthesis refer to the highest-resolution shell.
Figure imgf000074_0002
[0268] Table 2. Position of replication-related protein genes on S-2L reference genome (AX955019)
Figure imgf000075_0001
[0269] Table 3. Oligonucleotide sequences
Figure imgf000075_0002
Figure imgf000076_0001
[0270] Table 4. Diffraction data collection and Model Refinement statistics. Numbers in parenthesis refer to the highest-resolution shell.
Figure imgf000077_0001
[0271] Table 5. Position of replication-related protein genes on the new, high-coverage S-2L genome sequence (MW334946). Genes datZ, mazZ and purZ are highly compact, overlapping at their very ends.
Example 21: References Cited in Examples 12-20
[0273] 1. E. V. Koonin, Comparative genomics, minimal gene-sets and the last universal common ancestor. Nature Reviews Microbiology 1, 127-136 (2003).
[0274] 2. P. Forterre, The origin of viruses and their possible roles in major evolutionary transitions. Virus Research 117, 5-16 (2006).
[0275] 3. G. R. Wyatt, S. S. Cohen, The bases of the nucleic acids of some bacterial and animal viruses: the occurrence of 5-hydroxymethylcytosine. Biochemical Journal 55, 774-782 (1953).
[0276] 4. J. J. Thiaville, et al., Novel genomic island modifies DNA with 7-deazaguanine derivatives. PNAS 113, E1452-E1459 (2016).
[0277] 5. R. Tsai, I. R. Correa, M. Y. Xu, S. Xu, Restriction and modification of deoxyarchaeosine (dG + )-containing phage 9 g DNA. Scientific Reports 7, 8348 (2017).
[0278] 6. M. D. Kirnos, I. Y. Khudyakov, N. I. Alexandrushkina, B. F. Vanyushin, 2- Aminoadenine is an adenine substituting for a base in S-2L cyanophage DNA. Nature 270, 369 (1977).
[0279] 7. I. Ya. Khudyakov, M. D. Kirnos, N. I. Alexandrushkina, B. F. Vanyushin, Cyanophage S-2L contains DNA with 2,6-diaminopurine substituted for adenine. Virology 88, 8-18 (1978).
[0280] 8. M. Cristofalo, et al, Nanomechanics of Diaminopurine-Substituted DNA. Biophysical Journal 116, 760-771 (2019).
[0281] 9. M. Szekeres, A. V. Matveyev, Cleavage and sequence recognition of 2,6- diaminopurine-containing DNA by site-specific endonucleases. FEBS Letters 222, 89-94 (1987). [0282] 10. H. Briissow, F. Desiere, Comparative phage genomics and the evolution of
Siphoviridae: insights from dairy phages. Molecular Microbiology 39, 213-223 (2001).
[0283] 11. J. Murphy, et al, Comparative genomics and functional analysis of the 936 group of lactococcal Siphoviridae phages. Sci Rep 6, 1-13 (2016).
[0284] 12. H. Echols, H. Murialdo, Genetic map of bacteriophage lambda. Microbiology and
Molecular Biology Reviews 42, 577-591 (1978). [0285] 13. J. Ma, A. Campbell, S. Karlin, Correlations between Shine-Dalgarno Sequences and Gene Features Such as Predicted Expression Levels and Operon Structures. Journal of Bacteriology 184, 5733-5745 (2002).
[0286] 14. S. Nakagawa, Y. Niimura, K. Miura, T. Gojobori, Dynamic evolution of translation initiation mechanisms in prokaryotes. PNAS 107, 6382-6387 (2010).
[0287] 15. Y. Wei, X. Xia, Unique Shine-Dalgarno Sequences in Cyanobacteria and
Chloroplasts Reveal Evolutionary Differences in Their Translation Initiation. Genome Biol Evol 11, 3194-3206 (2019).
[0288] 16. I. C. Perera, A. Grove, Molecular Mechanisms of Ligand-Mediated Attenuation of DNA Binding by MarR Family Transcriptional Regulators. Journal of Molecular Cell Biology 2, 243-254 (2010).
[0289] 17. D. L. Wheeler, et al, Database resources of the National Center for Biotechnology. Nucleic Acids Res 31, 28-33 (2003).
[0290] 18. I. Lieberman, W. the technical assistance of W. H. Eto, Enzymatic Synthesis of
Adenosine-5 ’-Phosphate from Inosine-5’ -Phosphate. J. Biol. Chem. 223, 327-339 (1956).
[0291] 19. W. Wang, A. Gorrell, R. B. Honzatko, H. J. Fromm, A Study of Escherichia coli Adenylosuccinate Synthetase Association States and the Interface Residues of the Homodimer. J. Biol. Chem. 272, 7078-7084 (1997).
[0292] 20. R. Jayalakshmi, K. Sumathy, H. Balaram, Purification and Characterization of Recombinant Plasmodium falciparum Adenylosuccinate Synthetase Expressed in Escherichia coli. Protein Expression and Purification 25, 65-72 (2002).
[0293] 21. S. Mehrotra, H. Balaram, Kinetic Characterization of Adenylosuccinate Synthetase from the Thermophilic Archaea Methanocaldococcus jannaschii. Biochemistry 46, 12821-12832 (2007).
[0294] 22. R. B. Honzatko, H. J. Fromm, Structure-Function Studies of Adenylosuccinate Synthetase from Escherichia coli. Archives of Biochemistry and Biophysics 370, 1-8 (1999). [0295] 23. B. W. Poland, et al., Crystal structure of adenylosuccinate synthetase from Escherichia coli. Evidence for convergent evolution of GTP -binding domains. J. Biol. Chem. 268, 25334-25342 (1993). [0296] 24. O. V. Moroz, et al., Dimeric dUTPases, HisE, and MazG belong to a New Superfamily of all-a NTP Pyrophosphohydrolases with Potential “House-cleaning” Functions. Journal of Molecular Biology 347, 243-255 (2005).
[0297] 25. M. Gross, I. Marianovsky, G. Glaser, MazG - a regulator of programmed cell death in Escherichia coli. Molecular Microbiology 59, 590-601 (2006).
[0298] 26. A. M. D. Gonsalves, D. de Sanctis, S. M. McSweeney, Structural and Functional Insights into DR2231 Protein, the MazG-like Nucleoside Triphosphate Pyrophosphohydrolase from Deinococcus radiodurans. J. Biol. Chem. 286, 30691-30705 (2011).
[0299] 27. D. W. E. Smith, B. N. Ames, Phosphoribosyladenosine Monophosphate, an Intermediate in Histidine Biosynthesis. J. Biol. Chem. 240, 3056-3063 (1965).
[0300] 28. L. Chiariotti, P. Alifano, M. S. Carlomagno, C. B. Bruni, Nucleotide sequence of the Escherichia coli hisD gene and of the Escherichia coli and Salmonella typhimurium hisIE region. Mol Gen Genet 203, 382-388 (1986).
[0301] 29. O. V. Moroz, et al., The Crystal Structure of a Complex of Campylobacter jejuni dUTPase with Substrate Analogue Sheds Light on the Mechanism and Suggests the “Basic Module” for Dimeric d(C/U)TPases. Journal of Molecular Biology 342, 1583-1597 (2004). [0302] 30. C. S. Mota, A. M. D. Gonsalves, D. de Sanctis, Deinococcus radiodurans DR2231 is a two-metal-ion mechanism hydrolase with exclusive activity on dUTP. The FEBS Journal 283, 4274-4290 (2016).
[0303] 31. F. Javid-Majd, D. Yang, T. R. Ioerger, J. C. Sacchettini, The 1.25 A resolution structure of phosphoribosyl-ATP pyrophosphohydrolase from Mycobacterium tuberculosis. Acta Cryst D 64, 627-635 (2008).
[0304] 32. S. Lee, et al, Crystal Structure of Escherichia coli MazG, the Regulator of Nutritional Stress Response. J. Biol. Chem. 283, 15232-15240 (2008).
[0305] 33. E. E. Kim, H. W. Wyckoff, Reaction mechanism of alkaline phosphatase based on crystal structures: Two-metal ion catalysis. Journal of Molecular Biology 218, 449-464 (1991). [0306] 34. A. Solis-Sanchez, et al., Genetic characterization of 0VC8 lytic phage for Vibrio cholerae 01. Virology Journal 13, 47 (2016). [0307] 35. J. Maniloff, H.-W. Ackermann, Taxonomy of bacterial viruses: establishment of tailed virus genera and the other Caudovirales. Arch. Virol. 143, 2051-2063 (1998).
[0308] 36. L. Holm, Benchmarking fold detection by DaliLite v.5. Bioinformatics https:/doi.org/10.1093/bioinformatics/btz536 (December 10, 2019).
[0309] 37. J. Pei, B.-H. Kim, N. V. Grishin, PROMALS3D: a tool for multiple protein sequence and structure alignments. Nucleic Acids Res 36, 2295-2300 (2008).
[0310] 38. W. Wang, B. W. Poland, R. B. Honzatko, H. J. Fromm, Identification of Arginine Residues in the Putative L-Aspartate Binding Site of Escherichiacoli Adenylosuccinate Synthetase. J. Biol. Chem. 270, 13160-13163 (1995).
[0311] 39. A. P. Mehta, et al, Replacement of Thymidine by a Modified Base in the Escherichia coli Genome. J. Am. Chem. Soc. 138, 7272-7275 (2016).
[0312] 40. Y.-J. Lee, et al., Identification and biosynthesis of thymidine hypermodifications in the genomic DNA of widespread bacterial viruses. PNAS 115, E3116-E3125 (2018).
[0313] 41. A. Cerami, E. Reich, D. C. Ward, I. H. Goldberg, The interaction of actinomycin with DNA: requirement for the 2-amino group of purines. PNAS 57, 1036-1042 (1967).
[0314] 42. M. W. Frey, L. C. Sowers, D. P. Millar, S. J. Benkovic, The nucleotide analog 2- aminopurine as a spectroscopic probe of nucleotide incorporation by the Klenow fragment of Escherichia coli polymerase I and bacteriophage T4 DNA polymerase. Biochemistry 34, 9185— 9192 (1995).
[0315] 43. M. Camps, L. A. Loeb, When Pol I Goes into High Gear: Processive DNA Synthesis by Pol I in the Cell. Cell Cycle 3, 114-116 (2004).
[0316] 44. W. Ross, A. Ernst, R. L. Gourse, Fine structure of E. coli RNA polymerase- promoter interactions: a subunit binding to the UP element minor groove. Genes Dev. 15, 491- 506 (2001).
[0317] 45. D. Bramhill, A. Kornberg, Duplex opening by dnaA protein at novel sequences in initiation of replication at the origin of the E. coli chromosome. Cell 52, 743-755 (1988).
[0318] 46. J. Sagi, E. Szakonyi, M. Vorlickova, J. Kypr, Unusual Contribution of 2- Aminoadenine to the Thermostability of DNA. Journal of Biomolecular Structure and Dynamics 13, 1035-1041 (1996). [0319] 47. H. Li, Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094-3100 (2018).
[0320] 48. E. Garrison, G. Marth, Haplotype -based variant detection from short-read sequencing. arXiv: 1207.3907 [q-bio] (2012).
[0321] 49. E. Garrison, VCFLIB (2012) GitHub repository.
[0322] 50. P. Kruczkiewicz, VCF Consensus Builder (2019) Python Package Index repository.
[0323] 51. , SnapGene software (Insightful Science; available at snapgene.com).
[0324] 52. L. Sauguet, P. Raia, G. Henneke, M. Delarue, Shared active site architecture between archaeal PolD and multi-subunit RNA polymerases revealed by X-ray crystallography. Nature Communications 7, 12227 (2016).
[0325] 53. P. Weber, et al, High-Throughput Crystallization Pipeline at the Crystallography Core Facility of the Institut Pasteur. Molecules 24, 4451 (2019).
[0326] 54. W. Kabsch, XDS. Acta Cryst D 66, 125-132 (2010).
[0327] 55. P. Legrand, XDS Made Easier (2017) GitHub repository.
[0328] 56. D. Liebschner, et al., Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix. Acta Cryst D 75, 861-877 (2019).
[0329] 57. G. Bricogne, et al, BUSTER (Global Phasing Ltd.).
[0330] 58. P. Emsley, B. Lohkamp, W. G. Scott, K. Cowtan, Features and development of Coot. Acta Cryst D 66, 486-501 (2010).
[0331] 59. E. Krissinel, K. Henrick, Inference of Macromolecular Assemblies from Crystalline State. Journal of Molecular Biology 372, 774-797 (2007).
[0332] 60. X. Robert, P. Gouet, Deciphering key features in protein structures with the new ENDscript server. Nucleic Acids Res 42, W320-W324 (2014).
[0333] 61. S. Kumar, G. Stecher, M. Li, C. Knyaz, K. Tamura, MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms. Mol Biol Evol 35, 1547-1549 (2018).
[0334] 62. E. F. Pettersen, et al, UCSF Chimera — A visualization system for exploratory research and analysis. Journal of Computational Chemistry 25, 1605-1612 (2004). [0335] 63. , The PyMOL Molecular Graphics System, Version 1.8 Schrodinger, LLC.

Claims

1. A recombinant cell or virus comprising a genome comprising a 2-aminoadenine/thymine (Z/T) to (adenine/thymine (A/T) + 2-aminoadenine/thymine (Z/T)) ratio (Z/T to (A/T + Z/T) ratio) of at least 0.05, wherein the cell or virus does not naturally comprise a datZ gene, does not naturally comprise a mazZ gene, and does not naturally comprise a purZ gene.
2. The recombinant cell or virus of claim 1 , wherein the recombinant cell or virus comprises a genome comprising a Z/T to (A/T + Z/T) ratio of at least 0.15; at least 0.25; at least 0.50; at least 0.75; or at least 0.95.
3. The recombinant cell or virus of any one of claim 1 or 2, wherein the recombinant cell or virus comprises a genome comprising a 2-aminoadenine (Z) content of at least 1 % ; at least 10%; or at least 20%.
4. The recombinant cell or virus of any one of claims 1 to 3, wherein the recombinant cell is a prokaryotic cell and wherein the recombinant virus is a phage; in particular wherein the recombinant cell is not a cyanobacteria or a member of the genus Vibrio and/or wherein the recombinant virus is not a cyanophage or a Vibrio phage; preferably wherein the recombinant cell is an E. coli cell and wherein the recombinant virus is an E. coli phage.
5. The recombinant cell or virus of any one of claims 1 to 4, wherein the recombinant cell or virus comprises a coding sequence for a polypeptide sequence that is at least 80% identical to DatZ (SEQ ID NO: 9), a coding sequence for a polypeptide sequence that is at least 80% identical to MazZ (SEQ ID NO: 10), and a coding sequence for a polypeptide sequence that is at least 80% identical to PurZ (SEQ ID NO: 11), wherein the coding sequences are operatively linked to regulatory control elements for expression is the recombinant cell.
6. The recombinant cell or virus of claim 5, wherein the coding sequence for a polypeptide sequence that is at least 80% identical DatZ (SEQ ID NO: 9) is selected from datZ (SEQ ID NO: 1) and codon optimized datZ (SEQ ID NO: 5), wherein the coding sequence for a polypeptide sequence that is at least 80% identical MazZ (SEQ ID NO: 10) is selected from mazZ (SEQ ID NO: 2) and codon optimized mazZ (SEQ ID NO: 6), and wherein the coding sequence for a polypeptide sequence that is at least 80% identical PurZ (SEQ ID NO: 11) is selected from purZ (SEQ ID NO: 3) and codon optimized purZ (SEQ ID NO: 7); preferably wherein the recombinant cell or virus comprises a coding sequence for DatZ (SEQ ID NO: 9), a coding sequence for MazZ (SEQ ID NO: 10), and a coding sequence for PurZ (SEQ ID NO: 11).
7. The recombinant cell or virus of any one of claims 5 or 6, wherein the coding sequences are present on one or more plasmids.
8. The recombinant cell or virus of any one of claims 5 or 6, wherein the coding sequences are present on one or more chromosomes.
9. The recombinant cell of any one of claims 5 to 8, wherein the recombinant cell comprises DatZ (SEQ ID NO: 9), MazZ (SEQ ID NO: 10) and PurZ (SEQ ID NO: 11).
10. A composition comprising a plurality of recombinant cells according to any one of claims 1 to 9.
11. A method of making a nucleic acid comprising 2-aminoadenine (Z), comprising providing a recombinant cell according to any one of claims 1 to 10 and isolating the nucleic acid comprising Z from the cell; preferably wherein the isolated nucleic acid is selected from a plasmid, a chromosome, and a total cell nucleic acid preparation.
12. An isolated nucleic acid library comprising at least 50% coverage of the genome of a reference organism or virus, wherein the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.05, and wherein the reference organism does not naturally comprise a datZ gene, does not naturally comprise a mazZ gene, and does not naturally comprise a purZ gene; preferably wherein the nucleic acid library comprises a Z/T to (A/T + Z/T) ratio of at least 0.15; at least 0.25; at least 0.50; at least 0.75; or at least 0.95; or preferably wherein the nucleic acid library comprises a Z content of at least 1%; at least 10% or at least 20%.
13. The isolated nucleic acid library of claim 12, wherein the reference organism or virus is a prokaryote or a phage; preferably wherein the reference organism or virus is E. coli or an E. coli phage; preferably wherein the reference organism is not a cyanobacteria or a member of the genus Vibrio; and/or preferably wherein the reference virus is not a cyanophage or a Vibrio phage.
14. A method of making a stabilized nucleic acid, comprising: a. providing a cell that does not naturally comprise a datZ gene, does not naturally comprise a mazZ gene, and does not naturally comprise a purZ gene; preferably wherein the cell is not a cyanobacteria or a member of the genus Vibrio; or wherein the cell is a bacterium; and b. expressing recombinant DatZ, MazZ and PurZ proteins in the cell for a period of time sufficient for incorporation of 2-aminoadenine (Z) into nucleic acid in the cell to form a stabilized nucleic acid comprising 2-aminoadenine (Z); and optionally further comprising isolating the stabilized nucleic acid from the cell.
15. The method of claim 14, wherein the stabilized nucleic acid is endogenous to the cell; wherein the stabilized nucleic acid is heterologous to the cell; or wherein the stabilized nucleic acid is a viral nucleic acid; preferably wherein the viral nucleic acid is not a cyanophage nucleic acid or a Vibrio phage nucleic acid.
16. The method of any one of claims 14 to 15, wherein the method comprises introducing into the cell a recombinant coding sequence for a polypeptide sequence that is at least 80% identical to DatZ (SEQ ID NO: 9), introducing into the cell a recombinant coding sequence for a polypeptide sequence that is at least 80% identical to MazZ (SEQ ID NO: 10), and introducing into the cell a recombinant coding sequence for a polypeptide sequence that is at least 80% identical to PurZ (SEQ ID NO: 11), wherein the coding sequences are operatively linked to regulatory control elements for expression is the recombinant cell; preferably wherein the coding sequence for a polypeptide sequence that is at least 80% identical DatZ (SEQ ID NO: 9) is selected from datZ (SEQ ID NO: 1) and codon optimized datZ (SEQ ID NO: 5), wherein the coding sequence for a polypeptide sequence that is at least 80% identical MazZ (SEQ ID NO: 10) is selected from mazZ (SEQ ID NO: 2) and codon optimized mazZ (SEQ ID NO: 6), and wherein the coding sequence for a polypeptide sequence that is at least 80% identical PurZ (SEQ ID NO: 11) is selected from purZ (SEQ ID NO: 3) and codon optimized purZ (SEQ ID NO: 7); more preferably wherein the method comprises introducing into the cell a recombinant coding sequence for DatZ (SEQ ID NO: 9), introducing into the cell a recombinant coding sequence for MazZ (SEQ ID NO: 10), and introducing into the cell a recombinant coding sequence for PurZ (SEQ ID NO: 11).
17. The method of any one of claims 14 to 16, wherein the coding sequences are present on one or more plasmids; or wherein the coding sequences are present on one or more chromosomes.
PCT/EP2022/059320 2021-04-07 2022-04-07 2-aminoadenine modified nucleic acids, cells comprising them, and methods of producing them WO2022214617A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163172070P 2021-04-07 2021-04-07
US63/172070 2021-04-07

Publications (1)

Publication Number Publication Date
WO2022214617A1 true WO2022214617A1 (en) 2022-10-13

Family

ID=81346018

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2022/059320 WO2022214617A1 (en) 2021-04-07 2022-04-07 2-aminoadenine modified nucleic acids, cells comprising them, and methods of producing them

Country Status (1)

Country Link
WO (1) WO2022214617A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2839079A1 (en) * 2002-04-30 2003-10-31 Pasteur Institut GENOMIC BANK OF S-2L CYANOPHAGE AND PARTIAL FUNCTIONAL ANALYSIS

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2839079A1 (en) * 2002-04-30 2003-10-31 Pasteur Institut GENOMIC BANK OF S-2L CYANOPHAGE AND PARTIAL FUNCTIONAL ANALYSIS
EP1499713A2 (en) 2002-04-30 2005-01-26 Institut Pasteur Genomic library of cyanophage s-2l and functional analysis

Non-Patent Citations (110)

* Cited by examiner, † Cited by third party
Title
A. CERAMIE. REICHD. C. WARDI. H. GOLDBERG: "The interaction of actinomycin with DNA: requirement for the 2-amino group of purines", PNAS, vol. 57, 1967, pages 1036 - 1042, XP008024398, DOI: 10.1073/pnas.57.4.1036
A. M. D. GONIALVESD. DE SANCTISS. M. MCSWEENEY: "Structural and Functional Insights into DR2231 Protein, the MazG-like Nucleoside Triphosphate Pyrophosphohydrolase from Deinococcus radiodurans", J. BIOL. CHEM., vol. 286, 2011, pages 30691 - 30705
A. P. MEHTA ET AL.: ", Replacement of Thymidine by a Modified Base in the Escherichia coli Genome", J. AM. CHEM. SOC., vol. 138, 2016, pages 7272 - 7275
ALTSCHUL, S. F.GISH, W.MILLER, W.MYERS, E. W.LIPMAN, D. J.: "Basic local alignment search tool", JOURNAL OF MOLECULAR BIOLOGY, vol. 215, 1990, pages 403 - 410, XP002949123, DOI: 10.1006/jmbi.1990.9999
ARAVIND, L.KOONIN, E. V.: "The HD domain defines a new superfamily of metal-dependent phosphohydrolases", TRENDS IN BIOCHEMICAL SCIENCES, vol. 23, 1998, pages 469 - 472, XP002455935, DOI: 10.1016/S0968-0004(98)01293-6
B. W. POLAND ET AL.: "Crystal structure of adenylosuccinate synthetase from Escherichia coli. Evidence for convergent evolution of GTP-binding domains", J. BIOL. CHEM., vol. 268, 1993, pages 25334 - 25342
BAILLY, C.WARING, M. J.: "The use of diaminopurine to investigate structural properties of nucleic acids and molecular recognition between ligands and DNA", NUCLEIC ACIDS RES, vol. 26, 1998, pages 4309 - 4314, XP002505777, DOI: 10.1093/nar/26.19.4309
BRAITHWAITE, D. K.ITO, J.: "Compilation, alignment, and phylogenetic relationships of DNA polymerases", NUCLEIC ACIDS RES, vol. 21, 1993, pages 787 - 802, XP002143036
BRIDWELL-RABB, J.KANG, G.ZHONG, A.LIU, H.DRENNAN, C. L.: "An HD domain phosphohydrolase active site tailored for oxetanocin-A biosynthesis", PNAS, vol. 113, 2016, pages 13750 - 13755
BRISSETT, N. C. ET AL.: "Structure of a Preternary Complex Involving a Prokaryotic NHEJ DNA Polymerase", MOLECULAR CELL, vol. 41, 2011, pages 221 - 231
C. S. MOTAA. M. D. GONIAL VESD. DE SANCTIS: "Deinococcus radiodurans DR2231 is a two-metal-ion mechanism hydrolase with exclusive activity on dUTP", THE FEBS JOURNAL, vol. 283, 2016, pages 4274 - 4290
CALVO, P. A. ET AL.: "The invariant glutamate of human PrimPol DxE motif is critical for its Mn2+-dependent distinctive activities", DNA REPAIR, vol. 77, 2019, pages 65 - 75, XP085652394, DOI: 10.1016/j.dnarep.2019.03.006
CITOVSKY, V.VOS, G. D.ZAMBRYSKI, P.: "Single-Stranded DNA Binding Protein Encoded by the virE Locus of Agrobacterium tumefaciens", SCIENCE, vol. 240, 1988, pages 501 - 504
CROOKS, G. E.HON, G.CHANDONIA, J.-MBRENNER, S. E.: "WebLogo: A Sequence Logo Generator", GENOME RES., vol. 14, 2004, pages 1188 - 1190, XP055570674, DOI: 10.1101/gr.849004
D. BRAMHILLA. KORNBERG: "Duplex opening by dnaA protein at novel sequences in initiation of replication at the origin of the E. coli chromosome", CELL, vol. 52, 1988, pages 743 - 755
D. W. E. SMITHB. N. AMES: "Phosphoribosyladenosine Monophosphate, an Intermediate in Histidine Biosynthesis", J. BIOL. CHEM., vol. 240, 1965, pages 3056 - 3063
DFAZ-TALAVERA, A. ET AL.: "A cancer-associated point mutation disables the steric gate of human PrimPol", SCI REP, vol. 9, 2019, pages 1 - 13
DOKMANIC, I.SIKIC, MTOMIC, S: "Metals in proteins: correlation between the metal-ion type, coordination number and the amino-acid residues involved in the coordination", ACTA CRYSTALLOGRAPHICA SECTION D, vol. 64, 2008, pages 257 - 263
DOMFNGUEZ, O. ET AL.: "DNA polymerase mu (Pol ), homologous to TdT, could act as a DNA mutator in eukaryotic cells", THE EMBO JOURNAL, vol. 19, 2000, pages 1731 - 1742, XP002144772, DOI: 10.1093/emboj/19.7.1731
E. KRISSINELK. HENRICK: "Inference of Macromolecular Assemblies from Crystalline State", JOURNAL OF MOLECULAR BIOLOGY, vol. 372, 2007, pages 774 - 797, XP022220069, DOI: 10.1016/j.jmb.2007.05.022
E. V. KOONIN: "Comparative genomics, minimal gene-sets and the last universal common ancestor", NATURE REVIEWS MICROBIOLOGY, vol. 1, 2003, pages 127 - 136
F. JAVID-MAJDD. YANGT. R. IOERGERJ. C. SACCHETTINI: "The 1.25 Å resolution structure of phosphoribosyl-ATP pyrophosphohydrolase from Mycobacterium tuberculosis", ACTA CRYST D, vol. 64, 2008, pages 627 - 635
GEIBEL, S.BANCHENKO, S.ENGEL, M.LANKA, E.SAENGER, W.: "Structure and function of primase RepB' encoded by broad-host-range plasmid RSF1010 that replicates exclusively in leading-strand mode", PNAS, vol. 106, 2009, pages 7810 - 7815
GILL, S. ET AL.: "A highly divergent archaeo-eukaryotic primase from the Thermococcus nautilus plasmid, pTN2", NUCLEIC ACIDS RES, vol. 42, 2014, pages 3707 - 3719, XP055605680, DOI: 10.1093/nar/gkt1385
GUILLIAM, T. A.KEEN, B. A.BRISSETT, N. C.DOHERTY, A. J.: "Primase-polymerases are a functionally diverse superfamily of replication and repair enzymes", NUCLEIC ACIDS RESEARCH, vol. 43, 2015, pages 6651 - 6664, XP055541662, DOI: 10.1093/nar/gkv625
GUO, H. ET AL.: "Crystal structures of phage NrS-1 N300-dNTPs-Mg2+ complex provide molecular mechanisms for substrate specificity", BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, vol. 515, 2019, pages 551 - 557, XP085725569, DOI: 10.1016/j.bbrc.2019.05.162
GUPTA, R: "Halobacterium volcanii tRNAs. Identification of 41 tRNAs covering all amino acids, and the sequences of 33 class I tRNAs", J. BIOL. CHEM., vol. 259, 1984, pages 9461 - 9471
H. BRUSSOWF. DESIERE: "Comparative phage genomics and the evolution of Siphoviridae: insights from dairy phages", MOLECULAR MICROBIOLOGY, vol. 39, 2001, pages 213 - 223
H. ECHOLSH. MURIALDO: "Genetic map of bacteriophage lambda", MICROBIOLOGY AND MOLECULAR BIOLOGY REVIEWS, vol. 42, 1978, pages 577 - 591
H. GOMMERS-AMPT, J.BORST, P.: "Hypermodified bases in DNA", THE FASEB JOURNAL, vol. 9, 1995, pages 1034 - 1042
H. LI: "Minimap2: pairwise alignment for nucleotide sequences", BIOINFORMATICS, vol. 34, 2018, pages 3094 - 3100
HOLM, L.: "Benchmarking fold detection by DaliLite v.5", BIOINFORMATICS, 10 December 2019 (2019-12-10), Retrieved from the Internet <URL:https:/doi.org/10.1093/bioinformatics/btz536>
HOLZER, S. ET AL.: "Structural Basis for Inhibition of Human Primase by Arabinofuranosyl Nucleoside Analogues Fludarabine and Vidarabine", ACS CHEM. BIOL., vol. 14, 2019, pages 1904 - 1912
HUANG, J.MACKERELL, A. D.: "CHARMM36 all-atom additive protein force field: Validation based on comparison to NMR data", JOURNAL OF COMPUTATIONAL CHEMISTRY, vol. 34, 2013, pages 2135 - 2145
HUMPHREY, W.DALKE, A.SCHULTEN, K.: "VMD: Visual molecular dynamics", JOURNAL OF MOLECULAR GRAPHICS, vol. 14, 1996, pages 33 - 38, XP055140690, DOI: 10.1016/0263-7855(96)00018-5
HUTINET, G. ET AL.: "7-Deazaguanine modifications protect phage DNA from host restriction systems", NAT COMMUN, vol. 10, 2019, pages 1 - 12
I. C. PERERAA. GROVE: "Molecular Mechanisms of Ligand-Mediated Attenuation of DNA Binding by MarR Family Transcriptional Regulators", JOURNAL OF MOLECULAR CELL BIOLOGY, vol. 2, 2010, pages 243 - 254
I. LIEBERMAN: "W. the technical assistance of W. H. Eto, Enzymatic Synthesis of Adenosine-5'-Phosphate from Inosine-5'-Phosphate", J. BIOL. CHEM., vol. 223, 1956, pages 327 - 339
I. YA. KHUDYAKOVM. D. KIRNOSN. I. ALEXANDRUSHKINAB. F. VANYUSHIN: "Cyanophage S-2L contains DNA with 2,6-diaminopurine substituted for adenine", VIROLOGY, vol. 88, 1978, pages 8 - 18
IYER, L. M.KOONIN, E. V.LEIPE, D. D.ARAVIND, L.: "Origin and evolution of the archaeo-eukaryotic primase superfamily and related palm-domain proteins: structural insights and new members", NUCLEIC ACIDS RES, vol. 33, 2005, pages 3875 - 3896
IYER, L. M.ZHANG, D.MAXWELL BURROUGHSARAVIND, L: "Computational identification of novel biochemical systems involved in oxidation, glycosylation and other complex modifications of bases in DNA", NUCLEIC ACIDS RES, vol. 41, 2013, pages 7635 - 7655
J. J. THIAVILLE ET AL.: "Novel genomic island modifies DNA with 7-deazaguanine derivatives", PNAS, vol. 113, 2016, pages E1452 - E1459, XP055460178, DOI: 10.1073/pnas.1518570113
J. MAA. CAMPBELLS. KARLIN: "Correlations between Shine-Dalgarno Sequences and Gene Features Such as Predicted Expression Levels and Operon Structures", JOURNAL OF BACTERIOLOGY, vol. 184, 2002, pages 5733 - 5745
J. MANILOFFH.-W. ACKERMANN: "Taxonomy of bacterial viruses: establishment of tailed virus genera and the other Caudovirales", ARCH. VIROL., vol. 143, 1998, pages 2051 - 2063, XP037226643, DOI: 10.1007/s007050050442
J. MURPHY ET AL.: "Comparative genomics and functional analysis of the 936 group of lactococcal Siphoviridae phages", SCI REP, vol. 6, 2016, pages 1 - 13
J. SAGIE. SZAKONYIM. VORLICKOVAJ. KYPR: "Unusual Contribution of 2-Aminoadenine to the Thermostability of DNA", JOURNAL OF BIOMOLECULAR STRUCTURE AND DYNAMICS, vol. 13, 1996, pages 1035 - 1041
JEUDY, S. ET AL.: "The DNA methylation landscape of giant viruses", NATURE COMMUNICATIONS, vol. 11, 2020, pages 2657, XP055892464, DOI: 10.1038/s41467-020-16414-2
JONES, D. T.COZZETTO, D.: "DISOPRED3: precise disordered region predictions with annotated protein-binding activity", BIOINFORMATICS, vol. 31, 2015, pages 857 - 863
KAZLAUSKAS, D. ET AL.: "Novel Families of Archaeo-Eukaryotic Primases Associated with Mobile Genetic Elements of Bacteria and Archaea", JOURNAL OF MOLECULAR BIOLOGY, vol. 430, 2018, pages 737 - 750
KILKENNY, M. L.LONGO, M. A.PERERA, R. L.PELLEGRINI, L.: "Structures of human primase reveal design of nucleotide elongation site and mode of Pol a tethering", PNAS, vol. 110, 2013, pages 15961 - 15966
KIM, E. E.WYCKOFF, H. W.: "Reaction mechanism of alkaline phosphatase based on crystal structures: Two-metal ion catalysis", JOURNAL OF MOLECULAR BIOLOGY, vol. 218, 1991, pages 449 - 464, XP024019524, DOI: 10.1016/0022-2836(91)90724-K
KIRNOS, M. D.KHUDYAKOV, I. Y.ALEXANDRUSHKINA, N. I.VANYUSHIN, B. F.: "2-Aminoadenine is an adenine substituting for a base in S-2L cyanophage DNA", NATURE, vol. 270, 1977, pages 369, XP002239457, DOI: 10.1038/270369a0
KOERNER, J. F.SMITH, M. S.BUCHANAN, J. M.: "Deoxycytidine Triphosphatase, an Enzyme Induced by Bacteriophage Infection", J. BIOL. CHEM., vol. 235, 1960, pages 2691 - 2697
KULIKOV, E. E. ET AL.: "Genomic Sequencing and Biological Characteristics of a Novel Escherichia Coli Bacteriophage 9g, a Putative Representative of a New Siphoviridae Genus", VIRUSES, vol. 6, 2014, pages 5077 - 5092
KUMAR, S.STECHER, G.LI, M.KNYAZ, C.TAMURA, K.: "MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms", MOL BIOL EVOL, vol. 35, 2018, pages 1547 - 1549
L. CHIARIOTTIP. ALIFANOM. S. CARLOMAGNOC. B. BRUNI: "Nucleotide sequence of the Escherichia coli hisD gene and of the Escherichia coli and Salmonella typhimurium hisIE region", MOL GEN GENET, vol. 203, 1986, pages 382 - 388
LEE, Y.-J. ET AL.: "Identification and biosynthesis of thymidine hypermodifications in the genomic DNA of widespread bacterial viruses", PNAS, vol. 115, 2018, pages E3116 - E3125
LIEBSCHNER, D. ET AL.: "Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix", ACTA CRYST D, vol. 75, 2019, pages 861 - 877
LIU, B. ET AL.: "A primase subunit essential for efficient primer synthesis by an archaeal eukaryotic-type primase", NAT COMMUN, vol. 6, 2015, pages 1 - 11
M. CAMPSL. A. LOEB: "When Pol I Goes into High Gear: Processive DNA Synthesis by Pol I in the Cell", CELL CYCLE, vol. 3, 2004, pages 114 - 116
M. CRISTOFALO ET AL.: "Nanomechanics of Diaminopurine-Substituted DNA", BIOPHYSICAL JOURNAL, vol. 116, 2019, pages 760 - 771
M. GROSSI. MARIANOVSKYG. GLASER: "MazG - a regulator of programmed cell death in Escherichia coli", MOLECULAR MICROBIOLOGY, vol. 59, 2006, pages 590 - 601
M. W. FREYL. C. SOWERSD. P. MILLARS. J. BENKOVIC: "The nucleotide analog 2-aminopurine as a spectroscopic probe of nucleotide incorporation by the Klenow fragment of Escherichia coli polymerase I and bacteriophage T4 DNA polymerase", BIOCHEMISTRY, vol. 34, 1995, pages 9185 - 9192, XP002200597, DOI: 10.1021/bi00028a031
MADEIRA, F. ET AL.: "The EMBL-EBI search and sequence analysis tools APIs in 2019", NUCLEIC ACIDS RES, vol. 47, 2019, pages W636 - W641
MEDICI R ET AL: "Microbial synthesis of 2,6-diaminopurine nucleosides", JOURNAL OF MOLECULAR CATALYSIS B : ENZYMATIC,, vol. 39, no. 1-4, 2 May 2006 (2006-05-02), pages 40 - 44, XP028015804, ISSN: 1381-1177, [retrieved on 20060502], DOI: 10.1016/J.MOLCATB.2006.01.024 *
MONTTINEN, H. A. M.RAVANTTI, J. J.PORANEN, M. M.: "Common Structural Core of Three-Dozen Residues Reveals Intersuperfamily Relationships", MOL BIOL EVOL, vol. 33, 2016, pages 1697 - 1710
NGAZOA-KAKOU, S. ET AL.: "Complete Genome Sequence of Escherichia coli Siphophage BRET", MICROBIOL RESOUR ANNOUNC, vol. 8, 2019, pages e01644 - 18
NITTINGER, E.SCHNEIDER, N.LANGE, GRAREY, M.: "Evidence of Water Molecules-A Statistical Evaluation of Water Molecules Based on Electron Density", J. CHEM. INF. MODEL., vol. 55, 2015, pages 771 - 783
O. V. MOROZ ET AL.: ", The Crystal Structure of a Complex of Campylobacter jejuni dUTPase with Substrate Analogue Sheds Light on the Mechanism and Suggests the ''Basic Module'' for Dimeric d(C/U)TPases", JOURNAL OF MOLECULAR BIOLOGY, vol. 342, 2004, pages 1583 - 1597, XP004844937, DOI: 10.1016/j.jmb.2004.07.050
O. V. MOROZ ET AL.: "Dimeric dUTPases, HisE, and MazG belong to a New Superfamily of all-a NTP Pyrophosphohydrolases with Potential ''House-cleaning'' Functions", JOURNAL OF MOLECULAR BIOLOGY, vol. 347, 2005, pages 243 - 255, XP004762507, DOI: 10.1016/j.jmb.2005.01.030
OUTTEN, C. E.O'HALLORANT. V. FEMTOMOLAR: "Sensitivity of Metalloregulatory Proteins Controlling Zinc Homeostasis", SCIENCE, vol. 292, 2001, pages 2488 - 2492
P. EMSLEYB. LOHKAMPW. G. SCOTTK. COWTAN: "Features and development of Coot", ACTA CRYST D, vol. 66, 2010, pages 486 - 501
P. FORTERRE: "The origin of viruses and their possible roles in major evolutionary transitions", VIRUS RESEARCH, vol. 117, 2006, pages 5 - 16, XP024957046, DOI: 10.1016/j.virusres.2006.01.010
PEI, J.KIM, B.-H.GRISHIN, N. V.: "PROMALS3D: a tool for multiple protein sequence and structure alignments", NUCLEIC ACIDS RES, vol. 36, 2008, pages 2295 - 2300
PETER WEIGELE ET AL: "Biosynthesis and Function of Modified Bases in Bacteria and Their Viruses", CHEMICAL REVIEWS, vol. 116, no. 20, 26 October 2016 (2016-10-26), US, pages 12655 - 12687, XP055324327, ISSN: 0009-2665, DOI: 10.1021/acs.chemrev.6b00114 *
PETTERSEN, E. F. ET AL.: "UCSF Chimera-A visualization system for exploratory research and analysis", JOURNAL OF COMPUTATIONAL CHEMISTRY, vol. 25, 2004, pages 1605 - 1612
PHILLIPS, J. C. ET AL.: "Scalable molecular dynamics with NAMD", JOURNAL OF COMPUTATIONAL CHEMISTRY, vol. 26, 2005, pages 1781 - 1802, XP055484133, DOI: 10.1002/jcc.20289
PROUDFOOT, M. ET AL.: "General Enzymatic Screens Identify Three New Nucleotidases in Escherichia coli BIOCHEMICAL CHARACTERIZATION OF SurE, YfbR, AND YjjG", J. BIOL. CHEM., vol. 279, 2004, pages 54687 - 54694
R. B. HONZATKOH. J. FROMM: "Structure-Function Studies of Adenylosuccinate Synthetase from Escherichia coli", ARCHIVES OF BIOCHEMISTRY AND BIOPHYSICS, vol. 370, 1999, pages 1 - 8
R. JAYALAKSHMIK. SUMATHYH. BALARAM: "Purification and Characterization of Recombinant Plasmodium falciparum Adenylosuccinate Synthetase Expressed in Escherichia coli", PROTEIN EXPRESSION AND PURIFICATION, vol. 25, 2002, pages 65 - 72
RAIA, P.DELARUE, M.SAUGUET, L.: "An updated structural classification of replicative DNA polymerases", BIOCHEMICAL SOCIETY TRANSACTIONS, vol. 47, 2019, pages 239 - 249
RAJESHWARI, K.RAJASHEKHAR, M.: "Biochemical Composition of Seven Species of Cyanobacteria Isolated from Different Aquatic Habitats of Western Ghats, Southern India", BRAZILIAN ARCHIVES OF BIOLOGY AND TECHNOLOGY, vol. 54, 2011, pages 849 - 857
RECHKOBLIT, O. ET AL.: "Structure and mechanism of human PrimPol, a DNA polymerase with primase activity", SCIENCE ADVANCES, vol. 2, 2016, pages e1601317
ROBERT, X.GOUET, P.: "Deciphering key features in protein structures with the new ENDscript server", NUCLEIC ACIDS RES, vol. 42, 2014, pages W320 - W324
S. LEE ET AL.: ", Crystal Structure of Escherichia coli MazG, the Regulator of Nutritional Stress Response", J. BIOL. CHEM., vol. 283, 2008, pages 15232 - 15240
S. MEHROTRAH. BALARAM: "Kinetic Characterization of Adenylosuccinate Synthetase from the Thermophilic Archaea Methanocaldococcus jannaschii", BIOCHEMISTRY, vol. 46, 2007, pages 12821 - 12832
S. NAKAGAWAY. NIIMURAK. MIURAT. GOJOBORI: "Dynamic evolution of translation initiation mechanisms in prokaryotes", PNAS, vol. 107, 2010, pages 6382 - 6387
SÁGI J. ET AL: "Unusual Contribution of 2-Aminoadenine to the Thermostability of DNA", JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, vol. 13, no. 6, 1 June 1996 (1996-06-01), US, pages 1035 - 1041, XP055938805, ISSN: 0739-1102, Retrieved from the Internet <URL:http://dx.doi.org/10.1080/07391102.1996.10508918> DOI: 10.1080/07391102.1996.10508918 *
SANTHOSH, CMISHRA, P. C.: "Electronic spectra of 2-aminopurine and 2,6-diaminopurine: phototautomerism and fluorescence reabsorption", SPECTROCHIMICA ACTA PART A: MOLECULAR SPECTROSCOPY, vol. 47, 1991, pages 1685 - 1693
SAUGUET, L.RAIA, P.HENNEKE, GDELARUE, M.: "Shared active site architecture between archaeal PolD and multi-subunit RNA polymerases revealed by X-ray crystallography", NATURE COMMUNICATIONS, vol. 7, 2016, pages 12227, XP055805589, DOI: 10.1038/ncomms12227
SHELDRICK, G. M.: "Experimental phasing with SHELXC/D/E: combining chain tracing with density modification", ACTA CRYST D, vol. 66, 2010, pages 479 - 485
SOLIS-SANCHEZ, A. ET AL.: "Genetic characterization of 0VC8 lytic phage for Vibrio cholerae 01", VIROLOGY JOURNAL, vol. 13, 2016, pages 47
STEITZ, T. A.SMERDON, S. J.JAGER, J.JOYCE, C. M.: "A unified polymerase mechanism for nonhomologous DNA and RNA polymerases", SCIENCE, vol. 266, 1994, pages 2022 - 2025
SZEKERES, M.MATVEYEV, A. V.: "Cleavage and sequence recognition of 2,6-diaminopurine-containing DNA by site-specific endonucleases", FEBS LETTERS, vol. 222, 1987, pages 89 - 94, XP025605725, DOI: 10.1016/0014-5793(87)80197-7
VANOMMESLAEGHE, K. ET AL.: "CHARMM general force field: A force field for drug-like molecules compatible with the CHARMM all-atom additive biological force fields", JOURNAL OF COMPUTATIONAL CHEMISTRY, vol. 31, 2010, pages 671 - 690
W. ROSSA. ERNSTR. L. GOURSE: "Fine structure of E. coli RNA polymerase-promoter interactions: a subunit binding to the UP element minor groove", GENES DEV, vol. 15, 2001, pages 491 - 506
W. WANGA. GORRELLR. B. HONZATKOH. J. FROMM: "A Study of Escherichia coli Adenylosuccinate Synthetase Association States and the Interface Residues of the Homodimer", J. BIOL. CHEM., vol. 272, 1997, pages 7078 - 7084
W. WANGB. W. POLANDR. B. HONZATKOH. J. FROMM: "Identification of Arginine Residues in the Putative L-Aspartate Binding Site of Escherichiacoli Adenylosuccinate Synthetase", J. BIOL. CHEM., vol. 270, 1995, pages 13160 - 13163
WEBER, P. ET AL.: "High-Throughput Crystallization Pipeline at the Crystallography Core Facility of the Institut Pasteur", MOLECULES, vol. 24, 2019, pages 4451
WEIGELE, PRALEIGH, E. A.: "Biosynthesis and Function of Modified Bases in Bacteria and Their Viruses", CHEM. REV., vol. 116, 2016, pages 12655 - 12687, XP055324327, DOI: 10.1021/acs.chemrev.6b00114
WHEELER, D. L. ET AL.: "Database resources of the National Center for Biotechnology", NUCLEIC ACIDS RES, vol. 31, 2003, pages 28 - 33, XP002978124, DOI: 10.1093/nar/gkg033
WOINSKA, M.GRABOWSKY, S.DOMINIAK, P. M.WOZNIAK, K.JAYATILAKA, D.: "Hydrogen atoms can be located accurately and precisely by x-ray crystallography.", SCIENCE ADVANCES, vol. 2, 2016, pages el600192
WYATT, G. R.COHEN, S. S.: "The bases of the nucleic acids of some bacterial and animal viruses: the occurrence of 5-hydroxymethylcytosine", BIOCHEMICAL JOURNAL, vol. 55, 1953, pages 774 - 782
Y. WEIX. XIA: "Unique Shine-Dalgarno Sequences in Cyanobacteria and Chloroplasts Reveal Evolutionary Differences in Their Translation Initiation", GENOME BIOL EVOL, vol. 11, 2019, pages 3194 - 3206
Y.-J. LEE ET AL.: ", Identification and biosynthesis of thymidine hypermodifications in the genomic DNA of widespread bacterial viruses", PNAS, vol. 115, 2018, pages E3116 - E3125
YAN, J.HOLZER, S.PELLEGRINI, L.BELL, S. D.: "An archaeal primase functions as a nanoscale caliper to define primer length", PNAS, vol. 115, 2018, pages 6697 - 6702
ZHU, B. ET AL.: "Deep-sea vent phage DNA polymerase specifically initiates DNA synthesis in the absence of primers", PNAS, vol. 114, 2017, pages E2310 - E2318
ZHU, H. ET AL.: "Atomic structure and nonhomologous end-joining function of the polymerase component of bacterial DNA ligase D", PNAS, vol. 103, 2006, pages 1711 - 1716
ZIMMERMAN, M. D.PROUDFOOT, M.YAKUNIN, AMINOR, W.: "Structural Insight into the Mechanism of Substrate Specificity and Catalytic Activity of an HD-Domain Phosphohydrolase: The 5'-Deoxyribonucleotidase YfbR from Escherichia coli", JOURNAL OF MOLECULAR BIOLOGY, vol. 378, 2008, pages 215 - 226, XP022607419, DOI: 10.1016/j.jmb.2008.02.036
ZIMMERMANN, L. ET AL.: "A Completely Reimplemented MPI Bioinformatics Toolkit with a New HHpred Server at its Core", JOURNAL OF MOLECULAR BIOLOGY, vol. 430, 2018, pages 2237 - 2243

Similar Documents

Publication Publication Date Title
Lowey et al. CBASS immunity uses CARF-related effectors to sense 3′–5′-and 2′–5′-linked cyclic oligonucleotide signals and protect bacteria from phage infection
Tal et al. Cyclic CMP and cyclic UMP mediate bacterial immunity against phages
Iyer et al. Evolutionary connection between the catalytic subunits of DNA-dependent RNA polymerases and eukaryotic RNA-dependent RNA polymerases and the origin of RNA polymerases
Galburt et al. Structure of a tRNA repair enzyme and molecular biology workhorse: T4 polynucleotide kinase
Iyer et al. Comparative genomics of the FtsK–HerA superfamily of pumping ATPases: implications for the origins of chromosome segregation, cell division and viral capsid packaging
Ostermann et al. Insights into the phosphoryltransfer mechanism of human thymidylate kinase gained from crystal structures of enzyme complexes along the reaction coordinate
Leipe et al. Evolution and classification of P-loop kinases and related proteins
Pang et al. tRNA synthetase: tRNA aminoacylation and beyond
Smit et al. Biosynthesis of isoprenoids via mevalonate in Archaea: the lost pathway
JP2022531539A (en) Editing Methods and Compositions for Editing Nucleotide Sequences
Czernecki et al. How cyanophage S-2L rejects adenine and incorporates 2-aminoadenine to saturate hydrogen bonding in its DNA
Kyrieleis et al. Crystal structure of vaccinia virus mRNA capping enzyme provides insights into the mechanism and evolution of the capping apparatus
Goyal et al. Molecular basis of the functional divergence of fatty acyl-AMP ligase biosynthetic enzymes of Mycobacterium tuberculosis
Hillen et al. Structural basis of poxvirus transcription: transcribing and capping vaccinia complexes
Osawa et al. Crystal structure of the Cmr2–Cmr3 subcomplex in the CRISPR–Cas RNA silencing effector complex
Chen et al. Unique 5′-P recognition and basis for dG: dGTP misincorporation of ASFV DNA polymerase X
Pérez-Arnaiz et al. Haloferax volcanii—a model archaeon for studying DNA replication and repair
Gonçalves et al. Structural and functional insights into DR2231 protein, the MazG-like nucleoside triphosphate pyrophosphohydrolase from Deinococcus radiodurans
Remus et al. Structure and mechanism of E. coli RNA 2′, 3′-cyclic phosphodiesterase
Czernecki et al. Characterization of a triad of genes in cyanophage S-2L sufficient to replace adenine by 2-aminoadenine in bacterial DNA
Arif et al. Biochemical and structural studies of Mycobacterium smegmatis MutT1, a sanitization enzyme with unusual modes of association
Benarroch et al. Characterization of a trifunctional mimivirus mRNA capping enzyme and crystal structure of the RNA triphosphatase domain
Huang et al. Helix swapping between two α/β barrels: crystal structure of phosphoenolpyruvate mutase with bound Mg2+–oxalate
Cherney et al. The structures of Thermoplasma volcanium phosphoribosyl pyrophosphate synthetase bound to ribose-5-phosphate and ATP analogs
Toh et al. Mechanism for the alteration of the substrate specificities of template-independent RNA polymerases

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22717839

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22717839

Country of ref document: EP

Kind code of ref document: A1