WO2023156139A1 - Aid-based cytosine base editor system for ex vivo antibody diversification - Google Patents

Aid-based cytosine base editor system for ex vivo antibody diversification Download PDF

Info

Publication number
WO2023156139A1
WO2023156139A1 PCT/EP2023/051453 EP2023051453W WO2023156139A1 WO 2023156139 A1 WO2023156139 A1 WO 2023156139A1 EP 2023051453 W EP2023051453 W EP 2023051453W WO 2023156139 A1 WO2023156139 A1 WO 2023156139A1
Authority
WO
WIPO (PCT)
Prior art keywords
aid
sequence
mega
cell
antibody
Prior art date
Application number
PCT/EP2023/051453
Other languages
French (fr)
Inventor
Richard CHAHWAN
Julian WEISCHEDEL
Onur Boyman
Ufuk KARAKUS
Original Assignee
Universität Zürich
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Universität Zürich filed Critical Universität Zürich
Publication of WO2023156139A1 publication Critical patent/WO2023156139A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/78Hydrolases (3) acting on carbon to nitrogen bonds other than peptide bonds (3.5)
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/005Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from viruses
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K16/00Immunoglobulins [IGs], e.g. monoclonal or polyclonal antibodies
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y305/00Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
    • C12Y305/04Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
    • C12Y305/04005Cytidine deaminase (3.5.4.5)
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/80Fusion polypeptide containing a DNA binding domain, e.g. Lacl or Tet-repressor
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2795/00Bacteriophages
    • C12N2795/00011Details
    • C12N2795/10011Details dsDNA Bacteriophages
    • C12N2795/10022New viral proteins or individual genes, new structural or functional aspects of known viral proteins or genes

Definitions

  • the present invention relates to a method for ex vivo antibody diversification, making use of a protein-RNA complex comprising an activation-induced cytidine deaminase (AID), a nuclease dead Cas9 (dCas9) or nickase Cas9 (nCas9), transcription activator VP64; and one or more single-guide RNAs (sgRNA).
  • AID activation-induced cytidine deaminase
  • dCas9 nuclease dead Cas9
  • nCas9 nickase Cas9
  • sgRNA single-guide RNAs
  • AID Since its discovery AID has shown to be a multifunctional mutator protein. It is still not completely understood how AID alone can initiate both genomic and epigenomic modifications. Base editors have highlighted the great potential of DNA deaminases in genomic engineering. But even AID- based editors only focus on C-to-T mutations within a limited context. Herein, the inventors describe the first instance of a modular AID-based editor that recapitulates the full spectrum of genomic and epigenomic activity. The inventors’ novel multifunctional “swiss army knife”-like toolbox will help to improve targeted genomic and epigenomic editing.
  • B cells rely on such mutations to diversify their immunoglobulin (Ig) binding and effector repertoire. This ensures adequate protection against the vast plethora of pathogenic threats.
  • the antibody diversification process is divided into affinity maturation by somatic hypermutation (SHM) and Ig isotype change through class switch recombination (CSR). Two mechanisms that still have not been recapitulated completely ex vivo.
  • SHM somatic hypermutation
  • CSR class switch recombination
  • AID activation-induced deaminase
  • AID is the main driver of SHM and CSR (3). It belongs to the mutagenic Apolipoprotein B mRNA Editing Catalytic Polypeptide-like (APOBEC) family.
  • AID By deaminating cytosine (C) to uracil (U) in single stranded DNA (ssDNA) AID creates an uracil guanosine (G) mismatch. Either through replication or error-prone DNA repair the mismatch resolves in a fixed C to thymine (T) transition mutation.
  • AID is a multifunctional mutator protein that induces in total three distinct genomic and epigenomic effects.
  • RNA-guided Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)ZCRISPR associated nuclease 9 (Cas9) system originates from the prokaryotic adaptive immune system against exogenous double stranded DNA (dsDNA).
  • dsDNA double stranded DNA
  • sgRNA single guide RNA
  • CBEs Cytosine Base Editors
  • the inventors describe a novel human AID-focused modular CBE that besides its genomic editing function allows programmable epigenomic editing through C deamination.
  • Modular Epigenomic and Genomic AID base editor system (MEGA) takes advantage of AID’S multifunctional characteristic to induce targeted C-to-T mutations, DSBs, and 5mC demethylation.
  • MEGA Modular Epigenomic and Genomic AID base editor system
  • the invention provides a new and better equipped programmable “genomic swiss army knife”. Its use will provide the opportunity to fully translate AID activity ex vivo for genome editing and cytosine methylation editing at high resolution.
  • the objective of the present invention is to provide means and methods to provide a system for genomic and/or epigenomic editing. This objective is attained by the subject-matter of the independent claims of the present specification, with further advantageous embodiments described in the dependent claims, examples, figures and general description of this specification.
  • a first aspect of the invention relates to method for ex vivo antibody diversification, comprising the steps: a. providing a plurality of mammalian cells, wherein each cell expresses an antibody of interest on the cell surface; b. introducing an expression system encoding a protein-RNA complex into the cell, wherein the protein-RNA complex comprises i. activation-induced cytidine deaminase (AID); ii. nuclease dead Cas9 (dCas9) or nickase Cas9 (nCas9); iii. transcription activator VP64; and iv.
  • AID activation-induced cytidine deaminase
  • dCas9 nuclease dead Cas9
  • nCas9 nickase Cas9
  • transcription activator VP64 iii.
  • sgRNA single-guide RNA
  • crRNA part is complementary to a sub-sequence of a DNA sequence encoding the antibody of interest, and wherein the protein components i-iii are covalently linked
  • a further aspect of the invention relates to a protein-RNA complex
  • a protein-RNA complex comprising a. activation-induced cytidine deaminase (AID); b. nuclease dead Cas9 (dCas9) or nickase Cas9 (nCas9); c. transcription activator VP64; and d. a single-guide RNA (sgRNA) comprising a tracrRNA part and a crRNA part.
  • AID activation-induced cytidine deaminase
  • dCas9 nuclease dead Cas9
  • nCas9 nickase Cas9
  • sgRNA single-guide RNA
  • a further aspect of the invention relates to an expression system encoding the protein-RNA complex (on several vectors) according to the aspect above.
  • a further aspect of the invention relates to a method for genomic and/or epigenomic editing, comprising the steps: a. providing a target cell comprising a sequence-to-be-edited; b. introducing an expression system according to the aspect above into the target cell, wherein the crRNA part is complementary to a sub-sequence of the sequence-to-be-edited; c. keeping the target cell under conditions that allow for expression of the expression construct.
  • an article “comprising” components A, B, and C can consist of (i.e., contain only) components A, B, and C, or can contain not only components A, B, and C but also one or more other components.
  • components A, B, and C can consist of (i.e., contain only) components A, B, and C, or can contain not only components A, B, and C but also one or more other components.
  • “comprises” and similar forms thereof, and grammatical equivalents thereof include disclosure of embodiments of “consisting essentially of’ or “consisting of.”
  • references to “about” a value or parameter herein includes (and describes) variations that are directed to that value or parameter per se. For example, description referring to “about X” includes description of “X.”
  • VP64 in the context of the present specification relates to a synthetically designed tetrameric repeat of the minimal activation domain of herpes simplex virus VP16 (Beerli et al. Proc Natl Acad Sci U S A. 1998 Dec 8;95(25): 14628-33).
  • PAM in the context of the present specification relates to protospacer adjacent motif which is a 2-6-base pair DNA sequence immediately following the DNA sequence targeted by a Cas9 nuclease.
  • the canonical PAM is the sequence 5'-NGG-3', where "N” is any nucleobase followed by two guanine (“G”) nucleobases.
  • nuclease dead Cas9 in the context of the present specification relates to a catalytically inactive variant of Cas9.
  • nickase Cas9 in the context of the present specification relates to a partially inactive enzyme Cas9 which can only cleave the DNA strand that is complementary to the gRNA.
  • sequences similar or homologous are also part of the invention.
  • the sequence identity at the amino acid level can be about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or higher.
  • the sequence identity can be about 70%, 75%, 80%, 85%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or higher.
  • substantial identity exists when the nucleic acid segments will hybridize under selective hybridization conditions (e.g., very high stringency hybridization conditions), to the complement of the strand.
  • the nucleic acids may be present in whole cells, in a cell lysate, or in a partially purified or substantially pure form.
  • sequence identity and percentage of sequence identity refer to a single quantitative parameter representing the result of a sequence comparison determined by comparing two aligned sequences position by position.
  • Methods for alignment of sequences for comparison are well-known in the art. Alignment of sequences for comparison may be conducted by the local homology algorithm of Smith and Waterman, Adv. Appl. Math. 2:482 (1981), by the global alignment algorithm of Needleman and Wunsch, J. Mol. Biol. 48:443 (1970), by the search for similarity method of Pearson and Lipman, Proc. Nat. Acad. Sci.
  • sequence identity values refer to the value obtained using the BLAST suite of programs (Altschul et al., J. Mol. Biol. 215:403-410 (1990)) using the above identified default parameters for protein and nucleic acid comparison, respectively.
  • polypeptide in the context of the present specification relates to a molecule consisting of 50 or more amino acids that form a linear chain wherein the amino acids are connected by peptide bonds.
  • the amino acid sequence of a polypeptide may represent the amino acid sequence of a whole (as found physiologically) protein or fragments thereof.
  • polypeptides and protein are used interchangeably herein and include proteins and fragments thereof. Polypeptides are disclosed herein as amino acid residue sequences.
  • peptide in the context of the present specification relates to a molecule consisting of up to 50 amino acids, in particular 8 to 30 amino acids, more particularly 8 to 15amino acids, that form a linear chain wherein the amino acids are connected by peptide bonds.
  • Amino acid residue sequences are given from amino to carboxyl terminus.
  • Capital letters for sequence positions refer to L-amino acids in the one-letter code (Stryer, Biochemistry, 3 rd ed. p. 21).
  • Lower case letters for amino acid sequence positions refer to the corresponding D- or (2R)- amino acids. Sequences are written left to right in the direction from the amino to the carboxy terminus.
  • amino acid residue sequences are denominated by either a three letter or a single letter code as indicated as follows: Alanine (Ala, A), Arginine (Arg, R), Asparagine (Asn, N), Aspartic Acid (Asp, D), Cysteine (Cys, C), Glutamine (Gin, Q), Glutamic Acid (Glu, E), Glycine (Gly, G), Histidine (His, H), Isoleucine (lie, I), Leucine (Leu, L), Lysine (Lys, K), Methionine (Met, M), Phenylalanine (Phe, F), Proline (Pro, P), Serine (Ser, S), Threonine (Thr, T), Tryptophan (Trp, W), Tyrosine (Tyr, Y), and Valine (Vai, V).
  • amino acid linker refers to a polypeptide of variable length that is used to connect two polypeptides in order to generate a single chain polypeptide.
  • linkers useful for practicing the invention specified herein are oligopeptide chains consisting of 1 , 2, 3, 4, 5, 10, 20, 30, 40 or 50 amino acids.
  • a non-limiting example of an amino acid linker is a monomer or di-, tri- or tetramer of a tetraglycine-serine peptide linker.
  • gene refers to a polynucleotide containing at least one open reading frame (ORF) that is capable of encoding a particular polypeptide or protein after being transcribed and translated.
  • ORF open reading frame
  • a polynucleotide sequence can be used to identify larger fragments or full-length coding sequences of the gene with which they are associated. Methods of isolating larger fragment sequences are known to those of skill in the art.
  • transgene in the context of the present specification relates to a gene or genetic material that has been transferred from one organism to another.
  • the term may also refer to transfer of the natural or physiologically intact variant of a genetic sequence into tissue of a patient where it is missing. It may further refer to transfer of a natural encoded sequence the expression of which is driven by a promoter absent or silenced in the targeted tissue.
  • a recombinant in the context of the present specification relates to a nucleic acid, which is the product of one or several steps of cloning, restriction and/or ligation and which is different from the naturally occurring nucleic acid.
  • a recombinant virus particle comprises a recombinant nucleic acid.
  • gene expression or expression may refer to either of, or both of, the processes - and products thereof - of generation of nucleic acids (RNA) or the generation of a peptide or polypeptide, also referred to transcription and translation, respectively, or any of the intermediate processes that regulate the processing of genetic information to yield polypeptide products.
  • RNA nucleic acids
  • the term gene expression may also be applied to the transcription and processing of a RNA gene product, for example a regulatory RNA or a structural (e.g. ribosomal) RNA. If an expressed polynucleotide is derived from genomic DNA, expression may include splicing of the mRNA in a eukaryotic cell.
  • nucleic acid expression vector in the context of the present specification relates to a plasmid, a viral genome or an RNA, which is used to transfect (in case of a plasmid or an RNA) or transduce (in case of a viral genome) a target cell with a certain gene of interest, or -in the case of an RNA construct being transfected- to translate the corresponding protein of interest from a transfected mRNA.
  • the gene of interest is under control of a promoter sequence and the promoter sequence is operational inside the target cell, thus, the gene of interest is transcribed either constitutively or in response to a stimulus or dependent on the cell’s status.
  • the viral genome is packaged into a capsid to become a viral vector, which is able to transduce the target cell.
  • expression system in the context of the present specification relates to a nucleic acid sequence encoding the complex of the invention, wherein the nucleic acid sequence is comprised in a nucleic acid expression vector, or inside the genome of a cell.
  • an antibody refers to a molecule capable of specific binding to another molecule or target with high affinity / a Kd ⁇ 10' 7 mol/L (particularly ⁇ 10 -9 mol/L).
  • an antibody refers to an immunoglobulin, an antibody, or an antibody sequence.
  • the antibody may be of any species.
  • the antibody I antibody sequence of interest can be of any isotype (e.g.
  • the term antibody further encompasses a humanized camelid antibody. In certain embodiments, the term antibody similarly encompasses an scFv fragment.
  • the antibody includes, but is not limited to immunoglobulin type G (IgG), type A (IgA), type D (IgD), type E (IgE) or type M (IgM), any antigen-binding fragment or single chains thereof and related or derived constructs.
  • An antibody may be a glycoprotein comprising at least two heavy (H) chains and two light (L) chains inter-connected by disulfide bonds.
  • Each heavy chain is comprised of a heavy chain variable region (VH) and a heavy chain constant region (CH).
  • the heavy chain constant region of IgG is comprised of three domains, CH1 , CH2 and CH3.
  • Each light chain is comprised of a light chain variable region (abbreviated herein as VL) and a light chain constant region (CL).
  • the light chain constant region is comprised of one domain, CL.
  • the variable regions of the heavy and light chains contain a binding domain that interacts with an antigen.
  • the constant regions of the antibodies may mediate the binding of the immunoglobulin to host tissues or factors, including various cells of the immune system (e.g., effector cells) and the first component of the classical complement system.
  • the term encompasses a so-called nanobody or single domain antibody, an antibody fragment consisting of a single monomeric variable antibody domain.
  • the source of the antibody sequence can be either physiological (from an immunised person/animal or a hybridoma cell) or non-physiological (e.g. in vitro display libraries).
  • the first aspect of the invention relates to a method for ex vivo antibody diversification (affinity maturation and/or isotype switching), comprising the steps: a. A plurality of mammalian cells is provided. Each mammalian cell of the plurality expresses an antibody of interest on the cell surface.
  • the antibody of interest is an antibody as defined in the terms and definitions.
  • An expression system is introduced into the cell. This expression system encodes a protein-RNA complex as defined right below.
  • the protein-RNA complex is capable of editing a genetic DNA sequence and/or an epigenetic modification of the DNA sequence.
  • the DNA sequence encodes the antibody of interest, or is a sub-sequence of the encoding DNA sequence.
  • a cell After allowing the protein-RNA complex to edit the DNA sequence and/or its modifications, a cell is selected which expresses an antibody with a feature different from the original antibody of interest. This feature may be increased or decreased affinity and/or altered effector function. d. The selected cell is expanded in cell culture.
  • the protein-RNA complex comprises: a. activation-induced cytidine deaminase (AID); b. nuclease dead Cas9 (dCas9) or nickase Cas9 (nCas9); c. transcription activator VP64; and d. a single-guide RNA (sgRNA) comprising a tracrRNA part and a crRNA part, wherein the crRNA part is complementary to a sub-sequence of a DNA sequence encoding the antibody of interest.
  • AID activation-induced cytidine deaminase
  • dCas9 nuclease dead Cas9
  • nCas9 nickase Cas9
  • VP64 transcription activator VP64
  • sgRNA single-guide RNA
  • step c is performed via FACS.
  • the sub-sequence of a DNA sequence encoding the antibody of interest relates to the sequence part which should be altered in order to diversify the antibody sequence.
  • the ex vivo antibody diversification (affinity maturation and/or isotype switching) pipeline uses a mammalian cell line that stably expresses the antibody of interest on its surface.
  • a human- derived B cell line with endogenous antibody expression could be also used as a starting cell line.
  • B cells with a naive antibody sequence could be employed.
  • a potential cell might be human-derived RAMOS or Raji cells.
  • the cell needs to be of mammalian origin (e.g. human-derived HEK293 or hamster- derived CHO) to ensure correct protein folding and post-translational modifications
  • the pool of mutated/non-mutated cells are stained with a tagged-form of the target antigen and applied for Fluorescence Activated Cell Sorting. Cells with desired binding phenotype and/or effector functions can be sorted out of the mixed pool.
  • genomic DNA is isolated and sequenced by a cutting-edge single molecule long read deep sequencing method (e.g. PacBio Sequencing).
  • editing of the antibody of interest happens within the variable domain of the heavy and light chain.
  • CDRs are targeted (as they mediated direct interaction with the antigen).
  • CDR3 regions are targeted.
  • the antibody of interest includes a tag sequence.
  • a tag sequence can be any sequence conferring an additional feature to the antibody of interest.
  • a non-limiting example for a tag sequence is an affinity tag sequence used in protein purification.
  • the antibody of interest is fused to other molecules/particles.
  • the antibody sequence includes sites, residues, sequences, and/or motifs for post- translational modifications. In certain embodiments, these sites, residues, sequences, and/or motifs for post-translational modification are modified by the method of the invention.
  • the antibody sequence of interest is modified and/or optimised and/or codon optimised.
  • WRC(Y) AID hotspots
  • the diversification of the antibody sequence is independent of other cell types (e.g. helper cells) or molecular (co-)factors such as cytokines, chemokines, etc.
  • the mammalian cell can be a B cell or a non-B cell.
  • the cell used can be wild type, from a diseases sample, and/or genetically modified/engineered.
  • the cell has an endogenous immunoglobulin locus.
  • the cell is engineered to express/contain the antibody sequence of interest.
  • An exogenous antibody sequence can be either stably integrated into the cell genome or be ectopically present.
  • the engineered cell can have integrated either one or multiple gene copies of the antibody sequence into its genome.
  • An engineered cell can have one or more different antibody sequences integrated.
  • the mammalian cell can relate to a pool of cells, wherein this pool of cells either consist of monoclonal cells (all expressing the same antibody/having the same antibody sequence integrated) or poly-clonal cells (each cell express a different antibody/have a different antibody sequence).
  • the following embodiments relate to a protein-RNA complex itself, and also to the protein-RNA complex employed in the method of antibody diversification.
  • One aspect of the invention relates to a protein-RNA complex comprising a. activation-induced cytidine deaminase (AID); b. nuclease dead Cas9 (dCas9); c. transcription activator VP64; and d. a single-guide RNA (sgRNA) comprising a tracrRNA part and a crRNA part.
  • AID activation-induced cytidine deaminase
  • dCas9 nuclease dead Cas9
  • VP64 transcription activator VP64
  • sgRNA single-guide RNA
  • One alternative aspect of the invention relates to a protein-RNA complex comprising a. activation-induced cytidine deaminase (AID); b. nickase Cas9 (nCas9); c. transcription activator VP64; and d. a single-guide RNA (sgRNA) comprising a tracrRNA part and a crRNA part.
  • AID activation-induced cytidine deaminase
  • nCas9 nickase Cas9
  • VP64 transcription activator VP64
  • sgRNA single-guide RNA
  • One Base Editor molecule associates with one single-guide RNA molecule.
  • VP64 is a transcriptional activator which leads to transcription bubble formation. AID requires active transcription for both somatic hypermutation and class switch recombination in vivo. Providing a more physiological context could enhance AID’S ex vivo activity in a base editor architecture. An enhanced spectrum of AID-dependent events (Single base substitutions, DSBs & demethylation) was observed when the base editor was combined with VP64.
  • the target-specific crRNA part may be joint to the tracrRNA scaffold part without a linker.
  • the protein components a-c (AID and VP64 and nCas or dCas) are covalently linked. In certain embodiments, the protein components a-c (AID and VP64 and nCas or dCas) are expressed as a continuous peptide chain.
  • the system of the invention is designed with the protein components AID, nCas9 or dCas9, and VP64 covalently linked to ensure the mutagenic/epi-mutagenic activity is restricted/limited to a genomic site that is defined by the gRNA. Without the linkage, unwanted off-target effects are more likely.
  • the protein-RNA complex comprises dCas9.
  • the dCas9 comprises amino acid substitutions D10A, D839A, H840A and N863A (with the numbering referring to Cas9 of UniProt-ID Q99ZW2).
  • the nCas9 comprises amino acid substitution D10A.
  • AID is wild-type full-length human AID.
  • AID is a hyperactive human AID variant (AID*A) lacking a C-terminal Nuclear Export Signal (NES) and comprising amino acid substitutions K10E, T82I, and E156G (with the numbering referring to human wt AID of SEQ ID NO 001).
  • NES C-terminal Nuclear Export Signal
  • the dCas9 is an enzymatically-inactive variant of Cas9 from Streptococcus. In certain embodiments, the dCas9 is an enzymatically-inactive variant of Cas9 from Streptococcus pyogenes.
  • AID and dCas9 are covalently linked via a linker. In certain embodiments, AID and dCas9 are covalently linked via an XTEN-linker. In certain embodiments, AID and dCas9 are covalently linked via an XTEN-linker of SEQ ID NO 012. In certain embodiments, the covalently linked protein components are linked via their protein backbone, and thus, form a continuous polypeptide chain.
  • the XTEN-Linker consists of 16 amino acids and is a rather flexible linker. Other linkers, especially longer and/or more flexible linkers could be also beneficial. However, with nCas9 no or short linkers can be also used, because nickase Cas9 can partially compensate for the XTEN- Linker between AID*A and Cas9; constructs without linker were still able to mutate the genome. Also, a nickase Cas9 could potentially increase the editing window size.
  • the protein-RNA complex additionally comprises uracil-DNA glycosylase inhibitor (UGI).
  • UGI uracil-DNA glycosylase inhibitor
  • the UGI is from the Bacillus subtilis bacteriophage PBS1 or PBS2. In certain embodiments, the UGI is from PBS1.
  • molecules that can further enhance/modulate editing activity are included as independent co-factors.
  • the inventors’ data showed that having UGI co-expressed (and not covalently linked to the base editor) can improve genomic base editing purity and specificity.
  • genomic editing window size defines the length of DNA where a base editor can successfully mutate.
  • Known cytosine base editors tend to have very narrow editing windows (approx. 5 nucleotides).
  • the construct of the invention has an editing window size of up to 20 nucleotides.
  • the construct of the invention has a more narrow/precise editing window than known TET-based modulators. In a window of between 16 and 56 nucleotides depending on cell type, 5mC as well as 5hmC demethylation is observed.
  • editing window of approx. 20 nucleotides. Editing happens between position 9 and 29 with the PAM sequence being at position 0. When the gRNA is complementary to the + strand the editing can also happen 5’ of it. When the gRNA is complementary to the - strand the editing can also happen 3’ of it.
  • wild-type full-length human AID comprises or essentially consists of the sequence SEQ ID NO 001.
  • AID*A comprises or essentially consists of the sequence SEQ ID NO 003.
  • dCas9 comprises or essentially consists of the sequence SEQ ID NO 005.
  • nCas9 comprises or essentially consists of the sequence SEQ ID NO 007.
  • VP64 comprises or essentially consists of the sequence SEQ ID NO 009.
  • the tracrRNA part comprises or essentially consists of the sequence SEQ ID NO 011.
  • UGI comprises or essentially consists of the sequence SEQ ID NO 014.
  • One aspect of the invention relates to an expression system encoding the protein-RNA complex (on several vectors) according to the first aspect.
  • the expression construct (plasmid vector) only encodes the base editor, a separate expression construct contains the gRNA sequence, and optionally a third expression construct can be used which encodes UGI.
  • One aspect of the invention relates to a method for genomic and/or epigenomic editing, comprising the steps: a. providing a target cell comprising a sequence-to-be-edited; b. introducing an expression system according to the second aspect into the target cell, wherein the crRNA part is complementary to a sub-sequence of the sequence-to-be-edited; c. keeping the target cell under conditions that allow for expression of the expression construct.
  • the crRNA defines the position of the protospacer region.
  • a single base substitution and/or a DNA double strand break is introduced in the sequence-to-be-edited, and AID is human AID*A.
  • a 5-methylcytosine and/or 5-hydroxymethylcytosine demethylation is introduced in the sequence-to-be-edited, and AID is full-length human AID.
  • AID-dependent 5mC/5hmC demethylation is a multistep procedure. It is initiated by the active deamination of 5mC/5hmC by AID resulting in a T:G or 5hmll:G mismatch, respectively. Through the involvement of additional downstream enzymes such as TDG (Thymine-DNA glycosylase) the T/5hmll nucleotides are removed from the genome. Eventually, the BER (Base excision repair) pathway fills the resulting abasic site with a non-modificated C. Thus, AID-dependent deamination leads to epigenetic 5mC/5hmC demethylation.
  • TDG Thimine-DNA glycosylase
  • the invention further encompasses the following items:
  • a protein-RNA complex comprising a. activation-induced cytidine deaminase (AID); b. nuclease dead Cas9 (dCas9) or nickase Cas9 (nCas9), particularly dCas9; c. transcription activator VP64; and d. a single-guide RNA (sgRNA) comprising a tracrRNA part and a crRNA part.
  • AID activation-induced cytidine deaminase
  • dCas9 nuclease dead Cas9
  • nCas9 nickase Cas9
  • transcription activator VP64 a transcription activator VP64
  • sgRNA single-guide RNA
  • dCas9 comprises amino acid substitutions D10A, D839A, H840A and N863A.
  • AID is a hyperactive human AID variant (AID*A) lacking a C-terminal Nuclear Export Signal (NES) and comprising amino acid substitutions K10E, T82I, and E156G.
  • AID and dCas9 are covalently linked via a linker, particularly via an XTEN-linker, more particularly an XTEN linker of SEQ ID NO 012.
  • the protein-RNA complex additionally comprises uracil-DNA glycosylase inhibitor (UGI), particularly UGI non-covalently associated with the protein-RNA complex.
  • UGI uracil-DNA glycosylase inhibitor
  • AID comprises or essentially consists of the sequence SEQ ID NO 001 ; and/or b.
  • AID*A comprises or essentially consists of the sequence SEQ ID NO 003; and/or c.
  • dCas9 comprises or essentially consists of the sequence SEQ ID NO 005; and/or d.
  • nCas9 comprises or essentially consists of the sequence SEQ ID NO 007; and/or e.
  • VP64 comprises or essentially consists of the sequence SEQ ID NO 009; and/or f.
  • the tracrRNA part comprises or essentially consists of the sequence SEQ ID NO 011 ; and/or g.
  • UGI comprises or essentially consists of the sequence SEQ ID NO 014.
  • a method for genomic and/or epigenomic editing comprising the steps: a. providing a target cell comprising a sequence-to-be-edited; b. introducing an expression system according to item 11 into the target cell, wherein the crRNA part is complementary to a sub-sequence of the sequence-to- be-edited; c. keeping the target cell under conditions that allow for expression of the expression construct.
  • a method for ex vivo antibody diversification comprising the steps: a. providing a mammalian cell comprising expression of an antibody of interest on the cell surface; b. introducing an expression system according to item 11 into the cell, wherein the crRNA part is complementary to a sub-sequence of a DNA sequence encoding the antibody of interest; c. selecting a cell expressing an antibody with increased or decreased affinity and/or altered effector functions, particularly via FACS; d. expanding the selected cell in cell culture.
  • Fig. 1 shows proof-of-function of the new MEGA System.
  • C) Representative FACS histograms show the distribution of GFP-negative and - positive cells depending on the indicated condition. Changes in GFP-negative cell population of transfected HEK293T-GFP cells were normalised to GFP-negative population of non-transfected HEK293T-GFP cells. D) Loss in GFP signal was normalized and shown as fold increase of GFP-negative population over nontransfected control HEK293T-GFP cells. MEGA led to a significant increase in GFP-negative cells. It is shown without or in combination with one out of four different GFP-targeting gRNAs, respectively. Each data point represents an independent experiment (n 3) and the standard deviation is shown.
  • Fig. 2 shows deamination occurs preferentially at AID hotspots within protospacer.
  • Heatmaps show the base editing frequency on a single base level in comparison to the reference sequence. All loci were edited by MEGA-2 together with UGI. gRNAs targeting the antisense strand lead to G-to-A mutations whereas gRNAs complementary to the sense strand result in C-to-T mutations. The protospacer and PAM sequence are highlighted. Dots indicate specific sequence motifs within the quantification window. Dark grey dots: AID OHS; dots without colour filling: AID hotspots; grey dots: AID coldspots; light grey dots: C/G that do not belong to specific AID motifs.
  • GFP loci G1 - G4 was sequenced by PacBio long-read Sequencing.
  • Gene loci ATP1 a1 and TP53BP1 were sequenced by Illumina Technology.
  • A Addenosine
  • C Cytosine
  • G Guanosine
  • T Thymine
  • OHS Overlapping Hotspot
  • Fig. 3 shows UGI enhances target mutation.
  • B) Total Indel frequency with and without UGI is summarized for each locus. Standard Deviation is given for locus ATP1 a1 and 53Bp1. Three independent replicates of each target site were sent for deep sequencing. GFP loci G1 - G4 was sequenced by PacBio long-read Sequencing. Gene loci ATP1a1 and TP53BP1 were sequenced by Illumina Technology.
  • Fig. 4 shows MEGA configuration impacts mutagenic activity.
  • MEGA-4 lacks both the transcription activator VP64 and the XTEN-Linker.
  • MEGA3 connects AID*A through a XTEN- Linker to dCas9. Compared to MEGA-2, construct 4 uses human wild type (WT) AID.
  • Fig. 5 shows MEGA induces broad mutation pattern in murine variable heavy chain domain.
  • (Lower Panel) WT S. pyogenes Cas9 with four gRNAs. CDR1 - 3 and gRNAs are highlighted.
  • VH amplicons was sequenced by Illumina Technology. CDR (Complement Determining Region), VH (Variable Heavy Chain Domain). SEQ ID NO. and the sequence shown in this figure is stated in the table below.
  • Fig. 6 shows MEGA-1 has low mutagenic but high epigenetic activity.
  • A) MEGA enables targeted demethylation of 5mC’s. Deaminated 5mC’s are recognized as T’s and will eventually be replaced by enzymatically replaced with a nonmethylated C.
  • Fig. 7 shows comparison PacBio vs. Illumina Sequencing. Comparison between
  • Fig. 8 shows base editing window of MEGA-2.
  • the target window had an approximately size of 20 nucleotides. Mutation frequency peaked between position 14 and 17.
  • Fig. 10 shows UGI improved base editing purity.
  • C-to-A/G and G-to-C/T mutation frequency is shown for six loci, respectively.
  • Base editing purity with and without UGI is compared.
  • the respective wild type sequence is given.
  • G4 UGI reduces non-target base substitutions.
  • Both off- target C-to-A & C-to-G as well as G-to-C & G-to-T mutations occurred with lower frequency.
  • Protospacer region, PAM sequence and C’s/G’s are highlighted. Dark grey dots: AID OHS; dots without colour filling: AID hotspots; grey dots: AID coldspots; light grey dots: C/G that do not belong to specific AID motifs.
  • Nucleotide numbering corresponds to their position relative to PAM sequence at position 0. Standard deviation is shown for locus ATP1a1 and 53Bp1 . Three independent replicates of each target site were sent for deep sequencing. GFP loci G1 - G4 was sequenced by PacBio long-read Sequencing. Gene loci ATP1a1 and TP53BP1 were sequenced by Illumina Technology.
  • Fig. 11 shows mutagenic activity of the MEGA System.
  • Non-target base editing frequency is shown for each MEGA version at gRNA position G3 and G4.
  • MEGA configuration did influence base editing purity. The highest off-target mutations were seen for MEGA-2.
  • Fig. 12 shows targeted mutagenesis in variable heavy domain.
  • Heatmaps represent targeted single base mutations around gRNA position V2, V3 and V4. Protospacer region, PAM sequence and C’s/G’s are highlighted. Dark grey dots: AID OHS; dots without colour filling: AID hotspots; grey dots: AID coldspots; light grey dots: C/G that do not belong to specific AID motifs; black dots: Polymerase Eta motifs.
  • Quantification window around gRNA position V2 only shows specific C-to-T editing at AID hotspot motifs.
  • FIG. 13 shows experimental outline of 5mC demethylation analysis.
  • Human HEK293A or mouse 3T3 cells were transfected with MEGA-1 and Myo D-targ eting gRNAs. Genomic DNA was extracted and used either for bisulfite treatment and sequencing or Illumina deep sequencing.
  • Fig. 14 shows MEGA-based ex vivo affinity maturation pipeline. Schematic representation of the screening pipeline for MEGA-dependent antibody engineering/ex vivo affinity maturation.
  • Fig. 15 shows UFKA22-displaying cell line for MEGA mutagenesis.
  • D Representative comparison between mutated and non-mutated UFKA22-displaying cell lines.
  • Fig. 16 shows UFKA22 mutagenesis.
  • Cell line S11 ,3#-P underwent three mutation rounds with MEGA-2 and a mix of ten gRNAs targeting both variable heavy and light chain.
  • Cell Line S6.1#- underwent one mutation round with MEGA-2 and four gRNAs only targeting the variable light chain domain.
  • Cell line S12.1#- also underwent only one mutation round with MEGA 1 but with four gRNAs targeting the variable heavy chain domain.
  • Fig. 17 shows that UFKA22 variants reduce but not diminished affinity to IL-2.
  • Table 1 shows gRNA List.
  • Table 2 shows PacBio sequencing primers.
  • Table 3 shows Illumina sequencing primers.
  • Table 4 shows qPCR primers.
  • Example 1 Engineering and proof-of-function of the MEGA base editing system
  • MEGA-2 includes hyperactive human AID*A as deaminase moiety (Fig 1 A). Hyperactive AID*A was reported to have superior deamination activity than human wild type (WT) AID.
  • the inventors fused AID*A with a flexible XTEN-Linker to the N-terminus of the S. pyogenes-de rived nuclease dead Cas9 (dCas9).
  • VP64 is a potent minimal transcription activator leading to transcription bubble formation and subsequently ssDNA exposure.
  • the inventors By adding this to the C-terminus of the inventors’ construct the inventors aimed to increase substrate accessibility. Inhibiting the endogenous uracil DNA glycosylase with the bacteriophage-derived uracil glycosylase inhibitor (UGI) improves C-to-T base editing. Hence, many CBEs have included UGI within their constructs.
  • the inventors complemented the inventors’ MEGA System with UGI as an independent co-factor which has been also shown to enhance mutagenesis. To demonstrate the activity of the inventors’ system a GFP disruption assay was performed using HEK293T-GFP cells, that constitutively express GFP.
  • the inventors performed deep sequencing to identify the mutation signature of MEGA-2 at six different genomic sites.
  • the inventors included two endogenous genes.
  • the Na + /K + ATPase ATP1a1 gene was targeted with a previously published gRNA herein referred as gRNA A.
  • gRNA A a previously published gRNA herein referred as gRNA A.
  • TP53BP1 the inventors designed gRNA B.
  • Targeted amplicon sequencing was done for each locus, respectively.
  • All four GFP loci underwent single molecule long-read PacBio sequencing. ATP1a1 and TP53BP1 were sequenced by Illumina technology. Both approaches gave comparable results (Fig 7).
  • MEGA-2 exclusively mutated C’s on the non-targeted DNA strand (Fig 2).
  • Example 4 MEGA configuration impacts mutagenesis
  • MEGA-2 efficiently mutated single bases (Fig 3A). However, high Indel frequencies pointed towards the induction of DNA DSBs (Fig 3B). Three additional configurations were generated to better understand the mutagenic mode-of-action of the Cas9-AID fusion.
  • the modular setup allowed us to change the deaminase moiety and to remove or retain the linker as well as the transcription activator (Fig 4A).
  • the inventors tested base editing activity by GFP disruption assay.
  • HEK293T-GFP cells were transfected with both gRNAs G3 and G4, UGI and one of four MEGA variants. Indeed, MEGA architectural conformations affected editing activity. All constructs led to a significant increase in the GFP-negative cell population (Fig 11 A).
  • MEGA-2 showed the highest overall substitution frequency of 31 .46 %. With 21 .88 % MEGA3 was less efficient. Without the XTEN-Linker MEGA3 had a total substitution frequency of 15.92 %. The least efficient construct with an overall substitution frequency of 6.14 % was MEGA-1 (Fig 4C). As visualised in the sequence histograms (Fig 4B), the construct architecture also influenced overall Indel formation. MEGA-2 did not only induce the most efficient base editing but also the highest Indel frequency of 46.34 %. Compared to that MEGA-4 and 3 had only minor Indel rates of 1 .59 % and 3.21 %, respectively. No deletions were detectable with MEGA-1 .
  • the murine IgM-positive B cell lymphoma cell line CH12-F3 is unable to undergo SHM. So far it has not been reported that its immunoglobulin variable domain can be mutated by exogenous means.
  • the inventors’ AID-based MEGA System retained a similar mode-of-action as physiological AID. The inventors were interested to see whether or not it can mutate the variable heavy chain domain in a SHM-like fashion.
  • CH12-F3 cells were electroporated with MEGA-2 together with UGI and four CDR-targeting gRNAs (Fig 5A). Subsequently, the VH domain was sequenced by Illumina technology. At the four gRNA locations the inventors succeeded in creating single base mutations that were distinguishable from background noise.
  • Example 6 MEGA-1 has low mutagenic but high epigenomic editing through active Cytosine demethylation
  • a methylated AID hotspot within the MyoD DMR5 Enhancer region of murine 3T3 cells that has been shown to be critical for gene expression (Fig 6A).
  • the inventors used a previously published MyoD-specific gRNA (gRNA MyoD) located in close proximity to the methylated AID hotspot (Fig 6B). Methylation status was analysed 48 h post-transfection by sequencing bisulfite-treated DNA where unmethylated C’s are read as T’s and 5mC’s as C’s (Fig 13).
  • Example 7 MEGA-based ex vivo affinity maturation pipeline
  • a cell line which displays the mAb of interest on its cell surface is transfected with plasmids encoding MEGA-2 and gRNAs targeting the variable heavy/light chain domain, respectively (Fig 14).
  • plasmids encoding MEGA-2 and gRNAs targeting the variable heavy/light chain domain, respectively (Fig 14).
  • Fig 14 Upon one to three rounds of transfections a mixed pool of cells with different binding characteristics emerges.
  • cells with the desired binding affinity (higher or lower) can be isolated of the mixed pool through fluorescence activated cell sorting.
  • the mutation spectrum is than determined by deep sequencing with a single molecule long read sequencing method.
  • Example 8 UFKA22-displaying cell line for MEGA mutagenesis
  • the mutagenesis pipeline requires the antibody of interest to be stably expressed on the surface of a mammalian cells.
  • HEK293T-GFP cells which constitutively express GFP were transfected with a plasmid encoding UFKA22 heavy and light chain, a humanized anti-human IL- 2 specific mAb.
  • a trans-membrane domain was added for surface-display. 86.1 % of the HEK293TG-mUFKA22 cell line expressed the full antibody on its surface (Fig 15 A). Heavy and light chain were displayed in an equal ratio. 82.3 % of the cells were able to bind recombinant human IL-2 (Fig 15 B).
  • the mutagenesis pipeline depends on gRNAs that localises MEGA-2 to the variable heavy and light chain domain.
  • eleven gRNAs were designed to target predominantly the CDR regions of UFKA22 (Fig 15 C).
  • One round of mutagenesis with MEGA-2 and all eleven gRNAs showed a change in IL-2 binding compared to non-mutated cells (Fig 15 D).
  • a small population of cells showed a reduced affinity to bind IL-2.
  • Example 10 L/FKA22 show reduced but not diminished affinity to IL-2
  • MEGA architecture impacts AID-dependent mutagenesis
  • AID requires active transcription where stalling or pausing of the RNA polymerase II (RNAPII) results in premature transcription termination and ssDNA exposure. Therefore, the inventors hypothesised AID’S ex vivo activity would be enhanced with a physiological substrate environment through transcription bubble formation.
  • the MEGA system induces synthetic SHM ex vivo
  • the inventors induced different levels of Indels within the variable domain (Fig 5B). During physiological SHM this happens as well to further broaden the Ig repertoire. Standard WT Cas9 was not able to generate a comparable mutagenic diversity (Fig 5B). The inventors are the first who successfully diversified an endogenous B cell Ig variable domain with a CBE. The inventors’ MEGA system mimicked the complex SHM signature ex vivo.
  • Epigenomic editing aims to modify specific DNA methylation sites to change gene expression. Targeted changes in promotor/enhancer methylation either leads to gene activation or silencing. Active 5mC and 5hmC demethylation is mainly catalysed by members of the ten-eleven translocation methylcytosine dioxygenase (TET) enzyme family. Thus, current programmable epigenomic editors link dCas9 to a TET moiety.
  • TET ten-eleven translocation methylcytosine dioxygenase
  • AID can also actively demethylate selected genomic loci. In contrast to TET-induced oxidation, AID deaminates 5mC/5hmC whereby changing it to a T. Eventually, TDG or methyl-binding domain glycosylase 4 (MBD4) recognise and replace the mismatch with an unmethylated C. So far, AID has been only considered for genome but not epigenome engineering. The inventors’ demethylation experiments impressively proved that the modular MEGA system can induce MyoD gene expression by targeted 5mC demethylation (Fig 6B - D). In a recently published work MyoD expression was induced by a dCas9-TET 1 fusion protein.
  • the MyoD DMR5 Enhancer was targeted by four different gRNAs spanning a region over 200 nucleotides.
  • the inventors achieved a comparable increase in MyoD expression as in the previous work.
  • the inventors’ finding that demethylation of certain CpGs is enough to induce gene expression fits in line with previous work.
  • TET-fusion constructs have a broad and unspecific demethylation activity, the inventors’ MEGA system was able to edit the essential 5mC site in a narrow editing window of 16 nucleotides to induce gene expression.
  • the absence of genomic 5mC- to-T mutations with MEGA-1 confirmed that the deamination activity exclusively affected the epigenome but not the genome.
  • MEGA-1 has very limited mutagenic activity.
  • MEGA-2 did not have any mutagenic activity, even though it is the variant with the strongest deamination phenotype.
  • highly methylated genomic regions represent a challenging target for base editors. It has been shown that editing efficacy depends on the CBE type. Eventually, the inventors’ system can be useful to understand the role of specific CpG clusters through its precise demethylation window.
  • MEGA-2 was assembled by introducing AID*A-XTEN-Linker at the N-terminus of dCas9-VP64 in the backbone vector Cas9m4VP64 (Addgene #47319) through two-step ligation.
  • AID*A was amplified from the pGH335_MS2-AID*A-Hygro plasmid (Addgene #85406) with primers including the XTEN-Linker at the C-terminus.
  • MEGA-1 the same cloning strategy was used but with human full-length wild type AID.
  • MEGA3 was constructed in a one-step ligation process whereby AID*A-XTEN was introduced into the Cas9m2 vector (Addgene #47317).
  • MEGA-4 was cloned in the same way as MEGA3.
  • the AID*A fragment without XTEN was ligated into the hCas9_D10A (Addgene #41816) backbone.
  • gRNAs targeting the GFP, TP53BP1 and CH12-F3 IgH variable domain locus as well as the L/FKA22 variable heavy/light chain domain locus were designed by manual curation with Benchling or using CRISPRdirect (Y. Naito et al., Bioinformatics 31 , 1120-1123 (2015)).
  • CRISPRdirect Y. Naito et al., Bioinformatics 31 , 1120-1123 (2015).
  • ATP1a1 and mouse MyoD published gRNAs were used (D. Agudelo, et al., Nature Methods 14, 615-620 (2017); X. S. Liu, et al., Cell 167, 233-247.e17 (2016)).
  • Cloning of gRNA expressing vectors was done as described previously (P. Mali, et al., Science 339, 823-826 (2013)).
  • HEK293A, HEK293T- GFP and 3T3 cells were grown in DMEM (Sarstedt) supplemented with 10 % FBS (Sarstedt), 1 % penicillin/streptomycin + L-Glutamine (Gibco), 1% Sodium Pyruvate (Gibco) and 50 pM p- mercaptoethanol (Gibco).
  • Growth media of HEK 293TG-mllFKA22 and HEK293TG-mllFKA22 variants in addition included 300 pg/ml Hygromycin B (Corning).
  • CH12-F3 mouse erythroleukemia B cells were grown in RPMI 1640 (Sarstedt) supplemented with 10 % FBS (Sarstedt), 1 % penicillin/streptomycin + L-Glutamine (Gibco), 1% Sodium Pyruvate (Gibco), 5 % NCTC-109 (Gibco) and 50 pM p-mercaptoethanol (Gibco).
  • CH12-F3 cells were electroporated using the Gene Pulser Xcell Eukaryotic System (BioRad).
  • BioRad Gene Pulser Xcell Eukaryotic System
  • 5 x 10 6 cells were resuspended in Opti- MEM medium (Gibco) together with 3 pg DNA (1000 ng MEGA plasmid and 500 ng of each respective gRNA plasmid) in 0.4 cm gap cuvettes (BioRad). Electroporation was done with 30 ms pulse and square wave setting.
  • the SF Cell Line 4D-NucleofectorTM X Kit S (Lonza) was used with program EN-158.
  • 1 x 10 6 cells were resuspended with 750 ng MEGA plasmid and 1000 ng mouse MyoD gRNA plasmid.
  • GFP fluorescence was detected using a BD FACSCalibur or BD Accuri C6 plus flow cytometer. Live cells were gated based on FSC-A/SSC-A morphology. GFP-negative HEK293A and GFP- positive HEK293T-GFP cells were used as negative and positive control, respectively. Loss in GFP signal was compared to non-transfected HEK293T-GFP cells using FlowJo V6.
  • IL-2 staining HEK293TG-mUFKA22 WT and variant cells were incubated with fixable live/dead ZOMBIE Violet stain followed by anti-human IgG Fc APC and biotinylated IL-2 (IL2-Biotin). To detect IL-2 the cells were subsequently incubated with PE-conjugated streptavidin. Cells were analysed with BD Fortessa. PacBio long-read single molecule sequencing of the GFP locus and LIFKA22 Variable Heavy/Light Chain Domain locus
  • Illumina amplicon library preparation was done with the Nextera XT DNA Library Preparation Kit (Illumina). PAD sequences for Nextera adaptors were added to the locus specific primers and subsequently each locus was amplified (Table 2). Primers are listed in table 3. Three independent reactions were performed and pooled for each sample to avoid PCR bias. After PCR clean-up Nextera Index were added to the amplicons through a second PCR reaction. PCR product quantity and quality were assessed by Qubit Fluoremeter (Thermo Fisher Scientific) and TapeStation (Agilent). Samples were sequenced by the Crudencing Service with an Illumina HiSeq Sequencer.
  • Bisulfite conversion of DNA from 5 x 10 4 cells was performed using the EZ DNA methylation kit (Zymo Research).
  • the genomic DNA was incubated with CT conversion reagent at 98 °C for 8 mins, then 14 cycles of 95 °C for 15 seconds and 64 °C for 15 mins. DNA was cleaned and eluted following the manufacturer’s instructions.
  • the target region of the DMR5 MyoD enhancer region was amplified by PCR at 95 °C for 12 mins, then 40 cycles of 95 °C for 90 seconds, 58 °C for 90 seconds, 72 °C for 45 seconds. A final elongation step of 10 min was included in all reactions.
  • PCR products were analysed by gel electrophoresis and products purified with MiniElute (Qiagen). Three separate PCRs were performed on each sample to control for PCR bias in the subsequent analysis. PCR products were pooled from individual samples and cloned into a TA vector and sequenced by Sanger sequencing. Only unique sequences (as determined by either unique CpG methylation pattern or unique non-conversion of non-CpG cytosines) are shown, and all sequences had a conversion rate >99 %.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Biotechnology (AREA)
  • Medicinal Chemistry (AREA)
  • Biophysics (AREA)
  • Microbiology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Virology (AREA)
  • Physics & Mathematics (AREA)
  • Plant Pathology (AREA)
  • Immunology (AREA)
  • Peptides Or Proteins (AREA)

Abstract

The present invention relates to a method for ex vivo antibody diversification, making use of a protein-RNA complex comprising an activation-induced cytidine deaminase (AID), a nuclease dead Cas9 (dCas9) or nickase Cas9 (nCas9), transcription activator VP64; and one or more single-guide RNAs (sgRNA).

Description

AID-based cytosine base editor system for ex vivo antibody diversification
This application claims the right of priority of European Patent Application EP22157081 .5 filed 16 February 2022, incorporated by reference herein.
The present invention relates to a method for ex vivo antibody diversification, making use of a protein-RNA complex comprising an activation-induced cytidine deaminase (AID), a nuclease dead Cas9 (dCas9) or nickase Cas9 (nCas9), transcription activator VP64; and one or more single-guide RNAs (sgRNA).
Background of the Invention
Since its discovery AID has shown to be a multifunctional mutator protein. It is still not completely understood how AID alone can initiate both genomic and epigenomic modifications. Base editors have highlighted the great potential of DNA deaminases in genomic engineering. But even AID- based editors only focus on C-to-T mutations within a limited context. Herein, the inventors describe the first instance of a modular AID-based editor that recapitulates the full spectrum of genomic and epigenomic activity. The inventors’ novel multifunctional “swiss army knife”-like toolbox will help to improve targeted genomic and epigenomic editing.
Active genomic and epigenomic modifications play a crucial role in the vertebrate adaptive immune system. B cells rely on such mutations to diversify their immunoglobulin (Ig) binding and effector repertoire. This ensures adequate protection against the vast plethora of pathogenic threats. The antibody diversification process is divided into affinity maturation by somatic hypermutation (SHM) and Ig isotype change through class switch recombination (CSR). Two mechanisms that still have not been recapitulated completely ex vivo. The B cell-specific mutator enzyme activation-induced deaminase (AID) is the main driver of SHM and CSR (3). It belongs to the mutagenic Apolipoprotein B mRNA Editing Catalytic Polypeptide-like (APOBEC) family. By deaminating cytosine (C) to uracil (U) in single stranded DNA (ssDNA) AID creates an uracil guanosine (G) mismatch. Either through replication or error-prone DNA repair the mismatch resolves in a fixed C to thymine (T) transition mutation. AID targets preferentially C’s within WRC hotspots and WGCW overlapping hotspots (OHS) (W = adenosine (A) or T, R = A or G, Y = C or T), in contrast to SYC motifs (S = G or C) which are inert coldspots that do not undergo mutations. During SHM single point mutations occur primarily within the complementary determining regions (CDRs) of the antibody variable heavy (VH) and light (VL) chain. In CSR, however, the same mutagenic mechanisms ultimately lead to DNA double strand breaks (DSBs) in the Ig switch regions. Not only does AID lead to genomic changes but it has been also shown to have epigenetic activity. AID-dependent deamination of 5- methylcytosine (5mC) as well as 5-hydroxymethylcytosine (5hmC) creates a T:G mismatch. Involvement of the thymidine DNA glycosylase (TDG) induces T removal. Base excision repair (BER) eventually fills the abasic site with a non-methylated C. Hence, AID is a multifunctional mutator protein that induces in total three distinct genomic and epigenomic effects. RNA-guided Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)ZCRISPR associated nuclease 9 (Cas9) system originates from the prokaryotic adaptive immune system against exogenous double stranded DNA (dsDNA). The ability to introduce precise DSBs at specific single guide RNA (sgRNA) targeted loci revolutionised the field of targeted genome engineering. Despite being a promising tool, predicting and controlling the editing outcome remains challenging. In most cases Cas9-induced DSBs are resolved by error-prone non-homologous end joining (NHEJ), leading to insertions and deletions (Indels) at the break site. As this happens in a random manner it creates the risk of unwanted genomic outcomes. By merging the prokaryotic and the vertebrate adaptive immune system, a more precise, efficient and safer class of genome engineering tools emerged. Cytosine Base Editors (CBEs) combine the catalytic activity of a vertebrate cytosine deaminase with the gene targeting activity of a nuclease-deficient prokaryotic sgRNA-dependent Cas9. This allows targeted single nucleotide C-to-T mutations with low frequencies of unwanted DSBs.
Until now, base editor development focused exclusively on the mutagenesis of single nucleotides. Even though there are base editors containing AID or orthologs of it (e.g. TAM, Target-AID or CRISPR-X), they are not able to exploit AID’S full functional potential. Moreover, programmable, and active epigenomic editing has only been demonstrated to a limited level by merging the ten- eleven translocation methylcytosine dioxygenase (TET) with dCas9, which indirectly induces the effects of demethylation through 5mC oxidation but not active demethylation as proposed herein. CBEs ability to mediate epigenomic editing, however, has not been demonstrated yet. Here, the inventors describe a novel human AID-focused modular CBE that besides its genomic editing function allows programmable epigenomic editing through C deamination. Overall, the inventors’ Modular Epigenomic and Genomic AID base editor system (MEGA) takes advantage of AID’S multifunctional characteristic to induce targeted C-to-T mutations, DSBs, and 5mC demethylation. Thereby the respective effects are defined by distinct MEGA configurations. So far, this functional variability has not been achieved with a single system. Hence, the invention provides a new and better equipped programmable “genomic swiss army knife”. Its use will provide the opportunity to fully translate AID activity ex vivo for genome editing and cytosine methylation editing at high resolution.
Based on the above-mentioned state of the art, the objective of the present invention is to provide means and methods to provide a system for genomic and/or epigenomic editing. This objective is attained by the subject-matter of the independent claims of the present specification, with further advantageous embodiments described in the dependent claims, examples, figures and general description of this specification.
Summary of the Invention
A first aspect of the invention relates to method for ex vivo antibody diversification, comprising the steps: a. providing a plurality of mammalian cells, wherein each cell expresses an antibody of interest on the cell surface; b. introducing an expression system encoding a protein-RNA complex into the cell, wherein the protein-RNA complex comprises i. activation-induced cytidine deaminase (AID); ii. nuclease dead Cas9 (dCas9) or nickase Cas9 (nCas9); iii. transcription activator VP64; and iv. a single-guide RNA (sgRNA) comprising a tracrRNA part and a crRNA part, wherein the crRNA part is complementary to a sub-sequence of a DNA sequence encoding the antibody of interest, and wherein the protein components i-iii are covalently linked; c. selecting a cell expressing an antibody with increased or decreased affinity and/or altered effector functions, particularly via FACS; d. expanding the selected cell in cell culture.
A further aspect of the invention relates to a protein-RNA complex comprising a. activation-induced cytidine deaminase (AID); b. nuclease dead Cas9 (dCas9) or nickase Cas9 (nCas9); c. transcription activator VP64; and d. a single-guide RNA (sgRNA) comprising a tracrRNA part and a crRNA part.
A further aspect of the invention relates to an expression system encoding the protein-RNA complex (on several vectors) according to the aspect above.
A further aspect of the invention relates to a method for genomic and/or epigenomic editing, comprising the steps: a. providing a target cell comprising a sequence-to-be-edited; b. introducing an expression system according to the aspect above into the target cell, wherein the crRNA part is complementary to a sub-sequence of the sequence-to-be-edited; c. keeping the target cell under conditions that allow for expression of the expression construct.
Terms and definitions
For purposes of interpreting this specification, the following definitions will apply and whenever appropriate, terms used in the singular will also include the plural and vice versa. In the event that any definition set forth below conflicts with any document incorporated herein by reference, the definition set forth shall control. The terms “comprising,” “having,” “containing,” and “including,” and other similar forms, and grammatical equivalents thereof, as used herein, are intended to be equivalent in meaning and to be open-ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. For example, an article “comprising” components A, B, and C can consist of (i.e., contain only) components A, B, and C, or can contain not only components A, B, and C but also one or more other components. As such, it is intended and understood that “comprises” and similar forms thereof, and grammatical equivalents thereof, include disclosure of embodiments of “consisting essentially of’ or “consisting of.”
Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit, unless the context clearly dictates otherwise, between the upper and lower limit of that range and any other stated or intervening value in that stated range, is encompassed within the disclosure, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the disclosure.
Reference to “about” a value or parameter herein includes (and describes) variations that are directed to that value or parameter per se. For example, description referring to “about X” includes description of “X.”
As used herein, including in the appended claims, the singular forms “a,” “or,” and “the” include plural referents unless the context clearly dictates otherwise.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art (e.g., in cell culture, molecular genetics, nucleic acid chemistry, hybridization techniques and biochemistry). Standard techniques are used for molecular, genetic, and biochemical methods (see generally, Sambrook et al., Molecular Cloning: A Laboratory Manual, 4th ed. (2012) Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. and Ausubel et al., Short Protocols in Molecular Biology (2002) 5th Ed, John Wiley & Sons, Inc.) and chemical methods.
The term VP64 in the context of the present specification relates to a synthetically designed tetrameric repeat of the minimal activation domain of herpes simplex virus VP16 (Beerli et al. Proc Natl Acad Sci U S A. 1998 Dec 8;95(25): 14628-33).
The term PAM in the context of the present specification relates to protospacer adjacent motif which is a 2-6-base pair DNA sequence immediately following the DNA sequence targeted by a Cas9 nuclease. The canonical PAM is the sequence 5'-NGG-3', where "N" is any nucleobase followed by two guanine ("G") nucleobases.
The term nuclease dead Cas9 in the context of the present specification relates to a catalytically inactive variant of Cas9. The term nickase Cas9 in the context of the present specification relates to a partially inactive enzyme Cas9 which can only cleave the DNA strand that is complementary to the gRNA.
Sequences
Sequences similar or homologous (e.g., at least about 70% sequence identity) to the sequences disclosed herein are also part of the invention. In some embodiments, the sequence identity at the amino acid level can be about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or higher. At the nucleic acid level, the sequence identity can be about 70%, 75%, 80%, 85%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or higher. Alternatively, substantial identity exists when the nucleic acid segments will hybridize under selective hybridization conditions (e.g., very high stringency hybridization conditions), to the complement of the strand. The nucleic acids may be present in whole cells, in a cell lysate, or in a partially purified or substantially pure form.
In the context of the present specification, the terms sequence identity and percentage of sequence identity refer to a single quantitative parameter representing the result of a sequence comparison determined by comparing two aligned sequences position by position. Methods for alignment of sequences for comparison are well-known in the art. Alignment of sequences for comparison may be conducted by the local homology algorithm of Smith and Waterman, Adv. Appl. Math. 2:482 (1981), by the global alignment algorithm of Needleman and Wunsch, J. Mol. Biol. 48:443 (1970), by the search for similarity method of Pearson and Lipman, Proc. Nat. Acad. Sci. 85:2444 (1988) or by computerized implementations of these algorithms, including, but not limited to: CLUSTAL, GAP, BESTFIT, BLAST, FASTA and TFASTA. Software for performing BLAST analyses is publicly available, e.g., through the National Center for Biotechnology-Information (http://blast.ncbi.nlm.nih.gov/).
One example for comparison of amino acid sequences is the BLASTP algorithm that uses the default settings: Expect threshold: 10; Word size: 3; Max matches in a query range: 0; Matrix: BLOSUM62; Gap Costs: Existence 11 , Extension 1 ; Compositional adjustments: Conditional compositional score matrix adjustment. One such example for comparison of nucleic acid sequences is the BLASTN algorithm that uses the default settings: Expect threshold: 10; Word size: 28; Max matches in a query range: 0; Match/Mismatch Scores: 1 .-2; Gap costs: Linear. Unless stated otherwise, sequence identity values provided herein refer to the value obtained using the BLAST suite of programs (Altschul et al., J. Mol. Biol. 215:403-410 (1990)) using the above identified default parameters for protein and nucleic acid comparison, respectively.
Reference to identical sequences without specification of a percentage value implies 100% identical sequences (i.e. the same sequence).
General Biochemistry: Peptides, Amino Acid Sequences
The term polypeptide in the context of the present specification relates to a molecule consisting of 50 or more amino acids that form a linear chain wherein the amino acids are connected by peptide bonds. The amino acid sequence of a polypeptide may represent the amino acid sequence of a whole (as found physiologically) protein or fragments thereof. The term "polypeptides" and "protein" are used interchangeably herein and include proteins and fragments thereof. Polypeptides are disclosed herein as amino acid residue sequences.
The term peptide in the context of the present specification relates to a molecule consisting of up to 50 amino acids, in particular 8 to 30 amino acids, more particularly 8 to 15amino acids, that form a linear chain wherein the amino acids are connected by peptide bonds.
Amino acid residue sequences are given from amino to carboxyl terminus. Capital letters for sequence positions refer to L-amino acids in the one-letter code (Stryer, Biochemistry, 3rd ed. p. 21). Lower case letters for amino acid sequence positions refer to the corresponding D- or (2R)- amino acids. Sequences are written left to right in the direction from the amino to the carboxy terminus. In accordance with standard nomenclature, amino acid residue sequences are denominated by either a three letter or a single letter code as indicated as follows: Alanine (Ala, A), Arginine (Arg, R), Asparagine (Asn, N), Aspartic Acid (Asp, D), Cysteine (Cys, C), Glutamine (Gin, Q), Glutamic Acid (Glu, E), Glycine (Gly, G), Histidine (His, H), Isoleucine (lie, I), Leucine (Leu, L), Lysine (Lys, K), Methionine (Met, M), Phenylalanine (Phe, F), Proline (Pro, P), Serine (Ser, S), Threonine (Thr, T), Tryptophan (Trp, W), Tyrosine (Tyr, Y), and Valine (Vai, V).
In the context of the present specification, the term amino acid linker refers to a polypeptide of variable length that is used to connect two polypeptides in order to generate a single chain polypeptide. Exemplary embodiments of linkers useful for practicing the invention specified herein are oligopeptide chains consisting of 1 , 2, 3, 4, 5, 10, 20, 30, 40 or 50 amino acids. A non-limiting example of an amino acid linker is a monomer or di-, tri- or tetramer of a tetraglycine-serine peptide linker.
General Molecular Biology: Nucleic Acid Sequences, Expression
The term gene refers to a polynucleotide containing at least one open reading frame (ORF) that is capable of encoding a particular polypeptide or protein after being transcribed and translated. A polynucleotide sequence can be used to identify larger fragments or full-length coding sequences of the gene with which they are associated. Methods of isolating larger fragment sequences are known to those of skill in the art.
The term transgene in the context of the present specification relates to a gene or genetic material that has been transferred from one organism to another. In the present context, the term may also refer to transfer of the natural or physiologically intact variant of a genetic sequence into tissue of a patient where it is missing. It may further refer to transfer of a natural encoded sequence the expression of which is driven by a promoter absent or silenced in the targeted tissue.
The term recombinant in the context of the present specification relates to a nucleic acid, which is the product of one or several steps of cloning, restriction and/or ligation and which is different from the naturally occurring nucleic acid. A recombinant virus particle comprises a recombinant nucleic acid.
The terms gene expression or expression, or alternatively the term gene product, may refer to either of, or both of, the processes - and products thereof - of generation of nucleic acids (RNA) or the generation of a peptide or polypeptide, also referred to transcription and translation, respectively, or any of the intermediate processes that regulate the processing of genetic information to yield polypeptide products. The term gene expression may also be applied to the transcription and processing of a RNA gene product, for example a regulatory RNA or a structural (e.g. ribosomal) RNA. If an expressed polynucleotide is derived from genomic DNA, expression may include splicing of the mRNA in a eukaryotic cell. Expression may be assayed both on the level of transcription and translation, in other words mRNA and/or protein product. sThe term nucleic acid expression vector in the context of the present specification relates to a plasmid, a viral genome or an RNA, which is used to transfect (in case of a plasmid or an RNA) or transduce (in case of a viral genome) a target cell with a certain gene of interest, or -in the case of an RNA construct being transfected- to translate the corresponding protein of interest from a transfected mRNA. For vectors operating on the level of transcription and subsequent translation, the gene of interest is under control of a promoter sequence and the promoter sequence is operational inside the target cell, thus, the gene of interest is transcribed either constitutively or in response to a stimulus or dependent on the cell’s status. In certain embodiments, the viral genome is packaged into a capsid to become a viral vector, which is able to transduce the target cell.
The term expression system in the context of the present specification relates to a nucleic acid sequence encoding the complex of the invention, wherein the nucleic acid sequence is comprised in a nucleic acid expression vector, or inside the genome of a cell.
In the context of the present specification, the term antibody refers to a molecule capable of specific binding to another molecule or target with high affinity / a Kd < 10'7 mol/L (particularly < 10-9mol/L). In certain embodiments, an antibody refers to an immunoglobulin, an antibody, or an antibody sequence. The antibody may be of any species. The antibody I antibody sequence of interest can be of any isotype (e.g. human heavy chain = IgA (lgA1 & lgA2) IgD, IgG (lgG1 , lgG2, lgG3, lgG4), IgE and IgM; human light chain = kappa and lamda; mouse heavy chain = IgA, IgD, IgG (lgG1 , lgG2a-c, lgG3), IgGE and IgM; mouse light chain = kappa and lamda). In certain embodiments, the term antibody further encompasses a humanized camelid antibody. In certain embodiments, the term antibody similarly encompasses an scFv fragment.
The antibody includes, but is not limited to immunoglobulin type G (IgG), type A (IgA), type D (IgD), type E (IgE) or type M (IgM), any antigen-binding fragment or single chains thereof and related or derived constructs. An antibody may be a glycoprotein comprising at least two heavy (H) chains and two light (L) chains inter-connected by disulfide bonds. Each heavy chain is comprised of a heavy chain variable region (VH) and a heavy chain constant region (CH). The heavy chain constant region of IgG is comprised of three domains, CH1 , CH2 and CH3. Each light chain is comprised of a light chain variable region (abbreviated herein as VL) and a light chain constant region (CL). The light chain constant region is comprised of one domain, CL. The variable regions of the heavy and light chains contain a binding domain that interacts with an antigen. The constant regions of the antibodies may mediate the binding of the immunoglobulin to host tissues or factors, including various cells of the immune system (e.g., effector cells) and the first component of the classical complement system. Similarly, the term encompasses a so-called nanobody or single domain antibody, an antibody fragment consisting of a single monomeric variable antibody domain.
The source of the antibody sequence can be either physiological (from an immunised person/animal or a hybridoma cell) or non-physiological (e.g. in vitro display libraries).
Detailed Description of the Invention
The first aspect of the invention relates to a method for ex vivo antibody diversification (affinity maturation and/or isotype switching), comprising the steps: a. A plurality of mammalian cells is provided. Each mammalian cell of the plurality expresses an antibody of interest on the cell surface. The antibody of interest is an antibody as defined in the terms and definitions. b. An expression system is introduced into the cell. This expression system encodes a protein-RNA complex as defined right below. The protein-RNA complex is capable of editing a genetic DNA sequence and/or an epigenetic modification of the DNA sequence. The DNA sequence encodes the antibody of interest, or is a sub-sequence of the encoding DNA sequence. c. After allowing the protein-RNA complex to edit the DNA sequence and/or its modifications, a cell is selected which expresses an antibody with a feature different from the original antibody of interest. This feature may be increased or decreased affinity and/or altered effector function. d. The selected cell is expanded in cell culture.
The protein-RNA complex comprises: a. activation-induced cytidine deaminase (AID); b. nuclease dead Cas9 (dCas9) or nickase Cas9 (nCas9); c. transcription activator VP64; and d. a single-guide RNA (sgRNA) comprising a tracrRNA part and a crRNA part, wherein the crRNA part is complementary to a sub-sequence of a DNA sequence encoding the antibody of interest.
All aspects and embodiments below describing the protein-RNA complex are applicable to the complex of the method of the first aspect. In certain embodiments, the selection of step c is performed via FACS.
The sub-sequence of a DNA sequence encoding the antibody of interest relates to the sequence part which should be altered in order to diversify the antibody sequence.
The ex vivo antibody diversification (affinity maturation and/or isotype switching) pipeline uses a mammalian cell line that stably expresses the antibody of interest on its surface. A human- derived B cell line with endogenous antibody expression could be also used as a starting cell line. For monoclonal antibody de novo generation B cells with a naive antibody sequence could be employed. A potential cell might be human-derived RAMOS or Raji cells.
The beneficial features that a cell needs to have for ex vivo affinity maturation:
- the cell needs to be of mammalian origin (e.g. human-derived HEK293 or hamster- derived CHO) to ensure correct protein folding and post-translational modifications
- stably expressing the antibody on the cell surface
- be well transfectable
- have a functional DNA repair mechanism to install the mutations caused by the base editor.
The steps of the mutation pipeline are as follows:
- After introducing the base editor and gRNAs (more than one) into the antibody-displaying cells, they are incubated for 2 - 3 days. This process might be repeated 1 or 2 more times/rounds.
- Then the pool of mutated/non-mutated cells are stained with a tagged-form of the target antigen and applied for Fluorescence Activated Cell Sorting. Cells with desired binding phenotype and/or effector functions can be sorted out of the mixed pool.
- Cells with either increased or decreased affinity and/or altered effector functions are then expanded in culture
- To identify the mutations caused by the treatment the genomic DNA is isolated and sequenced by a cutting-edge single molecule long read deep sequencing method (e.g. PacBio Sequencing).
- After analysing the deep sequencing reads potential antibody gene variants are identified and can be ordered/produced as soluble proteins for further in vitro/in vivo characterisation.
In certain embodiments, editing of the antibody of interest happens within the variable domain of the heavy and light chain. This includes the complement determining regions (CDR1-3) and framework regions (FWR1-4). In certain embodiments, CDRs are targeted (as they mediated direct interaction with the antigen). In certain embodiments, CDR3 regions are targeted.
In certain embodiments, the antibody of interest includes a tag sequence. A tag sequence can be any sequence conferring an additional feature to the antibody of interest. A non-limiting example for a tag sequence is an affinity tag sequence used in protein purification. In certain embodiments, the antibody of interest is fused to other molecules/particles. In certain embodiments, the antibody sequence includes sites, residues, sequences, and/or motifs for post- translational modifications. In certain embodiments, these sites, residues, sequences, and/or motifs for post-translational modification are modified by the method of the invention.
In certain embodiments, the antibody sequence of interest is modified and/or optimised and/or codon optimised. As a non-limiting example, more AID hotspots (WRC(Y)) sequences) are introduced to improve the diversification process.
The diversification of the antibody sequence is independent of other cell types (e.g. helper cells) or molecular (co-)factors such as cytokines, chemokines, etc.
The mammalian cell can be a B cell or a non-B cell. The cell used can be wild type, from a diseases sample, and/or genetically modified/engineered. In certain embodiments, the cell has an endogenous immunoglobulin locus. In certain embodiments, the cell is engineered to express/contain the antibody sequence of interest. An exogenous antibody sequence can be either stably integrated into the cell genome or be ectopically present. The engineered cell can have integrated either one or multiple gene copies of the antibody sequence into its genome. An engineered cell can have one or more different antibody sequences integrated. The heavy and light chains of the different antibodies can assemble in different combinations, meaning that the heavy chain of the first antibody can also pair with the light chain of the second antibody. Heavy and light chain sequences of the antibody of interested can integrate at the same locus or different loci in the genome.
The mammalian cell can relate to a pool of cells, wherein this pool of cells either consist of monoclonal cells (all expressing the same antibody/having the same antibody sequence integrated) or poly-clonal cells (each cell express a different antibody/have a different antibody sequence).
The following embodiments relate to a protein-RNA complex itself, and also to the protein-RNA complex employed in the method of antibody diversification.
One aspect of the invention relates to a protein-RNA complex comprising a. activation-induced cytidine deaminase (AID); b. nuclease dead Cas9 (dCas9); c. transcription activator VP64; and d. a single-guide RNA (sgRNA) comprising a tracrRNA part and a crRNA part.
One alternative aspect of the invention relates to a protein-RNA complex comprising a. activation-induced cytidine deaminase (AID); b. nickase Cas9 (nCas9); c. transcription activator VP64; and d. a single-guide RNA (sgRNA) comprising a tracrRNA part and a crRNA part.
One Base Editor molecule associates with one single-guide RNA molecule.
VP64 is a transcriptional activator which leads to transcription bubble formation. AID requires active transcription for both somatic hypermutation and class switch recombination in vivo. Providing a more physiological context could enhance AID’S ex vivo activity in a base editor architecture. An enhanced spectrum of AID-dependent events (Single base substitutions, DSBs & demethylation) was observed when the base editor was combined with VP64.
The target-specific crRNA part may be joint to the tracrRNA scaffold part without a linker.
In certain embodiments, the protein components a-c (AID and VP64 and nCas or dCas) are covalently linked. In certain embodiments, the protein components a-c (AID and VP64 and nCas or dCas) are expressed as a continuous peptide chain.
The system of the invention is designed with the protein components AID, nCas9 or dCas9, and VP64 covalently linked to ensure the mutagenic/epi-mutagenic activity is restricted/limited to a genomic site that is defined by the gRNA. Without the linkage, unwanted off-target effects are more likely.
For base editors in general it is of importance that the Cas moiety does not lead to double-strand breaks. In terms of altering the PAM-recognition of Cas9, the inventors only used Cas9 enzymes which recognize the standard 5’NGG3’ PAM motif. There are Cas9 variants reported that have a relaxed/altered PAM-recognition, meaning they also recognize motifs others then 5’NGG3’ (e.g. 5’NG3’).
In certain embodiments, the protein-RNA complex comprises dCas9.
In certain embodiments, the dCas9 comprises amino acid substitutions D10A, D839A, H840A and N863A (with the numbering referring to Cas9 of UniProt-ID Q99ZW2).
In certain embodiments, the nCas9 comprises amino acid substitution D10A.
In certain embodiments, AID is wild-type full-length human AID.
In certain embodiments, AID is a hyperactive human AID variant (AID*A) lacking a C-terminal Nuclear Export Signal (NES) and comprising amino acid substitutions K10E, T82I, and E156G (with the numbering referring to human wt AID of SEQ ID NO 001).
In certain embodiments, the dCas9 is an enzymatically-inactive variant of Cas9 from Streptococcus. In certain embodiments, the dCas9 is an enzymatically-inactive variant of Cas9 from Streptococcus pyogenes.
In certain embodiments, AID and dCas9 are covalently linked via a linker. In certain embodiments, AID and dCas9 are covalently linked via an XTEN-linker. In certain embodiments, AID and dCas9 are covalently linked via an XTEN-linker of SEQ ID NO 012. In certain embodiments, the covalently linked protein components are linked via their protein backbone, and thus, form a continuous polypeptide chain.
The XTEN-Linker consists of 16 amino acids and is a rather flexible linker. Other linkers, especially longer and/or more flexible linkers could be also beneficial. However, with nCas9 no or short linkers can be also used, because nickase Cas9 can partially compensate for the XTEN- Linker between AID*A and Cas9; constructs without linker were still able to mutate the genome. Also, a nickase Cas9 could potentially increase the editing window size.
In certain embodiments, the protein-RNA complex additionally comprises uracil-DNA glycosylase inhibitor (UGI). In certain embodiments, UGI is non-covalently associated with the protein-RNA complex.
In certain embodiments, the UGI is from the Bacillus subtilis bacteriophage PBS1 or PBS2. In certain embodiments, the UGI is from PBS1.
In certain embodiments, molecules that can further enhance/modulate editing activity are included as independent co-factors. The inventors’ data showed that having UGI co-expressed (and not covalently linked to the base editor) can improve genomic base editing purity and specificity.
Another aspect that is defined by the inventors’ construct architecture is the genomic editing window size. It defines the length of DNA where a base editor can successfully mutate. Known cytosine base editors tend to have very narrow editing windows (approx. 5 nucleotides). The construct of the invention has an editing window size of up to 20 nucleotides.
For epigenomic editing, the construct of the invention has a more narrow/precise editing window than known TET-based modulators. In a window of between 16 and 56 nucleotides depending on cell type, 5mC as well as 5hmC demethylation is observed.
For genomic editing, the inventors observed an editing window of approx. 20 nucleotides. Editing happens between position 9 and 29 with the PAM sequence being at position 0. When the gRNA is complementary to the + strand the editing can also happen 5’ of it. When the gRNA is complementary to the - strand the editing can also happen 3’ of it.
In certain embodiments, wild-type full-length human AID comprises or essentially consists of the sequence SEQ ID NO 001. In certain embodiments, AID*A comprises or essentially consists of the sequence SEQ ID NO 003. In certain embodiments, dCas9 comprises or essentially consists of the sequence SEQ ID NO 005. In certain embodiments, nCas9 comprises or essentially consists of the sequence SEQ ID NO 007. In certain embodiments, VP64 comprises or essentially consists of the sequence SEQ ID NO 009. In certain embodiments, the tracrRNA part comprises or essentially consists of the sequence SEQ ID NO 011. In certain embodiments, UGI comprises or essentially consists of the sequence SEQ ID NO 014. One aspect of the invention relates to an expression system encoding the protein-RNA complex (on several vectors) according to the first aspect.
In certain embodiments, the expression construct (plasmid vector) only encodes the base editor, a separate expression construct contains the gRNA sequence, and optionally a third expression construct can be used which encodes UGI.
One aspect of the invention relates to a method for genomic and/or epigenomic editing, comprising the steps: a. providing a target cell comprising a sequence-to-be-edited; b. introducing an expression system according to the second aspect into the target cell, wherein the crRNA part is complementary to a sub-sequence of the sequence-to-be-edited; c. keeping the target cell under conditions that allow for expression of the expression construct.
The crRNA defines the position of the protospacer region.
In certain embodiments, a single base substitution and/or a DNA double strand break is introduced in the sequence-to-be-edited, and AID is human AID*A.
In certain embodiments, a 5-methylcytosine and/or 5-hydroxymethylcytosine demethylation is introduced in the sequence-to-be-edited, and AID is full-length human AID.
AID-dependent 5mC/5hmC demethylation is a multistep procedure. It is initiated by the active deamination of 5mC/5hmC by AID resulting in a T:G or 5hmll:G mismatch, respectively. Through the involvement of additional downstream enzymes such as TDG (Thymine-DNA glycosylase) the T/5hmll nucleotides are removed from the genome. Eventually, the BER (Base excision repair) pathway fills the resulting abasic site with a non-modificated C. Thus, AID-dependent deamination leads to epigenetic 5mC/5hmC demethylation.
Sequences
AID wt:
Amino Acid Sequence (SEQ ID NO 001):
MPKKKRKVDSLLMNRRKFLYQFKNVRWAKGRRETYLCYVVKRRDSATS FSLDFGYLRNKNGCHVELLFLRYI SDW DLDPGRCYRVTWFTSWS PCYDCARHVADFLRGNPNLSLRI FTARLYFCEDRKAE PEGLRRLHRAGVQIAIMTFKD YFYCWNTFVENHERTFKAWEGLHENSVRLSRQLRRI LLPLYEVDDLRDAFRTLGL
Nucleotide Sequence (5’-3’) (SEQ ID NO 002)
AID*A:
Amino Acid Sequence (SEQ ID NO 003): MDSLLMNRREFLYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFGYLRNKNGCHVELLFLRYISDWDLDPGRC
YRVTWFISWSPCYDCARHVADFLRGNPNLSLRIFTARLYFCEDRKAEPEGLRRLHRAGVQIAIMTFKDYFYCWNT
FVENHGRTFKAWEGLHENSVRLSRQLRRILLPLYEVDDLRDAFRT
Nucleotide Sequence (5’-3’) (SEQ ID NO 004) dCas9:
Amino Acid Sequence (SEQ ID NO 005):
MDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRR
KNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKAD
LRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENL
IAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAI
LLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKF
IKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPY
YVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELT
KVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKI
IKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQS
GKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKV
MGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVD
QELDINRLSDYDVAAIVPQSFLKDDSIDNKVLTRSDKARGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNL
TKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKV
REINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEI
TLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKD
WDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPK
YSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISE
FSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQ SITGLYETRIDLSQLGGD
Nucleotide Sequence (5’-3’) (SEQ ID NO 006) nCas9:
Amino Acid Sequence (SEQ ID NO 007):
MDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRR
KNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKAD
LRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENL
IAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAI
LLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKF
IKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPY
YVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELT
KVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKI
IKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQS GKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKV
MGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVD
QELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNL
TKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKV
REINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEI
TLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKD
WDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPK
YSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISE
FSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQ SITGLYETRIDLSQLGGD
Nucleotide Sequence (5’-3’) (SEQ ID NO 008)
VP64:
Amino Acid Sequence (SEQ ID NO 009):
DALDDFDLDMLGSDALDDFDLDMLGSDALD
Nucleotide Sequence (5’-3’) (SEQ ID NO 010) tracrRNA part:
Nucleotide Sequence (5’ -3’) (SEQ ID NO 011):
GTT T T AGAGCT AGAAAT AGCAAGT T AAAAT AAGGCT AGT C C GT TAT CAACT T GAAAAAGT GGCAC C GAGT C GGT G CTTTTTTT
XTEN linker:
Amino Acid Sequence (SEQ ID NO 012):
SGSETPGTSESATPES
Nucleotide Sequence (5’-3’) (SEQ ID NO 013)
UGI:
Amino Acid Sequence (SEQ ID NO 014):
MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDS NGENKIKML
Nucleotide Sequence (5’-3’) (SEQ ID NO 015)
Complete protein complex:
MEGA-2 Amino Acid Sequence (SEQ ID NO 016):
MDSLLMNRREFLYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFGYLRNKNGCHVELLFLRYISDWDLDPGRC
YRVTWFISWSPCYDCARHVADFLRGNPNLSLRIFTARLYFCEDRKAEPEGLRRLHRAGVQIAIMTFKDYFYCWNT
FVENHGRTFKAWEGLHENSVRLSRQLRRILLPLYEVDDLRDAFRTSGSETPGTSESATPESALEDRTLATMDKKY SIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRIC
YLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIY LALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLP GEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDI LRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPIL EKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPL ARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYV TEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKD FLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTIL DFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHK PENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDI NRLSDYDVAAIVPQSFLKDDSIDNKVLTRSDKARGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAER GGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINN YHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANG EIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKK YGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFE
LENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRV ILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGL YETRIDLSQLGGDSRADPKKKRKVEASGSGRADALDDFDLDMLGSDALDDFDLDMLGSDALDDFDLDMLGSDALD DFDLDMLINSR
MEGA-2 Nucleotide Sequence (5’-3’) (SEQ ID NO 017)
MEGA-4 Amino Acid Sequence (SEQ ID NO 018):
MDSLLMNRREFLYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFGYLRNKNGCHVELLFLRYISDWDLDPGRC YRVTWFISWSPCYDCARHVADFLRGNPNLSLRIFTARLYFCEDRKAEPEGLRRLHRAGVQIAIMTFKDYFYCWNT FVENHGRTFKAWEGLHENSVRLSRQLRRILLPLYEVDDLRDAFRTSGSETPGTSESATPESALEDRTLATMPKKK RKVGRGMDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTAR RRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVD STDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKS RRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAK NLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQ EEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKIL TERI PYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYE YET VYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTY
HDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLING IRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVV DELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQI LKEHPVENTQLQNEKLYLYYLQNG RDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQ RKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKD FQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMN FFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKL IARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDL
IIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEI
IEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLD ATLIHQSITGLYETRIDLSQLGGDSRADPKKKRKV
MEGA-4 Nucleotide Sequence (5’-3’) (SEQ ID NO 019)
MEGA3 Amino Acid Sequence (SEQ ID NO 020):
MDSLLMNRREFLYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFGYLRNKNGCHVELLFLRYISDWDLDPGRC
YRVTWFISWSPCYDCARHVADFLRGNPNLSLRIFTARLYFCEDRKAEPEGLRRLHRAGVQIAIMTFKDYFYCWNT FVENHGRTFKAWEGLHENSVRLSRQLRRILLPLYEVDDLRDAFRTSGSETPGTSESATPESALEDRTLATMDKKY SIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRIC YLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIY LALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLP
GEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDI
LRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPIL
EKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPL
ARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYV TEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKD FLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTIL DFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHK PENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDI
NRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAER GGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINN YHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANG EIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKK YGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFE
LENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRV ILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGL YETRIDLSQLGGDSRADPKKKRKV
MEGA3 Nucleotide Sequence (5’-3’) (SEQ ID NO 021)
MEGA-1 Amino Acid Sequence (SEQ ID NO 022):
MPKKKRKVDSLLMNRRKFLYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFGYLRNKNGCHVELLFLRYISDW
DLDPGRCYRVTWFTSWSPCYDCARHVADFLRGNPNLSLRIFTARLYFCEDRKAEPEGLRRLHRAGVQIAIMTFKD YFYCWNTFVENHERTFKAWEGLHENSVRLSRQLRRILLPLYEVDDLRDAFRTLGLSGSETPGTSESATPESALED RTLATMPKKKRKVGRGMDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETA EATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPT I YHLRKKLVDSTDKADLRLI YLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAK AILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGD
QYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGY AGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSI PHQIHLGELHAILRRQEDFYPFLK DNREKIEKI LTFRI PYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQS FIERMTNFDKNLPNEKVLP KHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVE I SGVE DRFNASLGTYHDLLKI IKDKDFLDNEENEDI LEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGW GRLSRKLINGIRDKQSGKTI LDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGS PAIK KGI LQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQI LKEHPVENTQLQNE KLYLYYLQNGRDMYVDQELDINRLSDYDVAAIVPQS FLKDDS IDNKVLTRSDKARGKSDNVPSEEVVKKMKNYWR QLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQI LDSRMNTKYDENDKLIREVKVITLK SKLVSDFRKDFQFYKVRE INNYHHAHDAYLNAVVGTALIKKYPKLESE FVYGDYKVYDVRKMIAKSEQE I GKATA KYFFYSNIMNFFKTE ITLANGE IRKRPLIETNGETGE IVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKES I LPKRNSDKLIARKKDWDPKKYGGFDS PTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSS FEKNPIDFLEA KGYKEVKKDLI IKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGS PEDNEQKQLFV EQHKHYLDE I IEQI SE FSKRVI LADANLDKVLSAYNKHRDKPIREQAENI I HLFTLTNLGAPAAFKYFDTTIDRK RYTSTKEVLDATLIHQS ITGLYETRIDLSQLGGDSRADPKKKRKVEASGSGRADALDDFDLDMLGSDALDDFDLD MLGSDALDDFDLDMLGSDALDDFDLDMLINSR
MEGA-1 Nucleotide Sequence (5’-3’) (SEQ ID NO 023)
The invention further encompasses the following items:
Items:
1 . A protein-RNA complex comprising a. activation-induced cytidine deaminase (AID); b. nuclease dead Cas9 (dCas9) or nickase Cas9 (nCas9), particularly dCas9; c. transcription activator VP64; and d. a single-guide RNA (sgRNA) comprising a tracrRNA part and a crRNA part.
2. The protein-RNA complex according to item 1 , wherein the protein components a-c are covalently linked.
3. The protein-RNA complex according to any one of the preceding items, wherein said dCas9 comprises amino acid substitutions D10A, D839A, H840A and N863A.
4. The protein-RNA complex according to any one of the preceding items 1 to 3, wherein AID is wild-type full-length human AID.
5. The protein-RNA complex according to any one of the preceding items 1 to 3, wherein AID is a hyperactive human AID variant (AID*A) lacking a C-terminal Nuclear Export Signal (NES) and comprising amino acid substitutions K10E, T82I, and E156G.
6. The protein-RNA complex according to any one of the preceding items, wherein said dCas9 is derived from Streptococcus, particularly from Streptococcus pyogenes.
7. The protein-RNA complex according to any one of the preceding items, wherein AID and dCas9 are covalently linked via a linker, particularly via an XTEN-linker, more particularly an XTEN linker of SEQ ID NO 012. 8. The protein-RNA complex according to any one of the preceding items, wherein the protein-RNA complex additionally comprises uracil-DNA glycosylase inhibitor (UGI), particularly UGI non-covalently associated with the protein-RNA complex.
9. The protein-RNA complex according to item 8, wherein the UGI is derived from the Bacillus subtilis bacteriophage PBS1 or PBS2, particularly the UGI is derived from PBS1.
10. The protein-RNA complex according to any one of the preceding items, wherein a. wild-type full-length human AID comprises or essentially consists of the sequence SEQ ID NO 001 ; and/or b. AID*A comprises or essentially consists of the sequence SEQ ID NO 003; and/or c. dCas9 comprises or essentially consists of the sequence SEQ ID NO 005; and/or d. nCas9 comprises or essentially consists of the sequence SEQ ID NO 007; and/or e. VP64 comprises or essentially consists of the sequence SEQ ID NO 009; and/or f. the tracrRNA part comprises or essentially consists of the sequence SEQ ID NO 011 ; and/or g. UGI comprises or essentially consists of the sequence SEQ ID NO 014.
11 . An expression system encoding the protein-RNA complex according to any one of the preceding items.
12. A method for genomic and/or epigenomic editing, comprising the steps: a. providing a target cell comprising a sequence-to-be-edited; b. introducing an expression system according to item 11 into the target cell, wherein the crRNA part is complementary to a sub-sequence of the sequence-to- be-edited; c. keeping the target cell under conditions that allow for expression of the expression construct.
13. The method according to item 12, wherein a single base substitution and/or a DNA double strand break is introduced in the sequence-to-be-edited, and wherein AID is human AID*A.
14. The method according to item 12, wherein a 5-methylcytosine demethylation and/or 5- hydroxymethylcytosine demethylation is introduced in the sequence-to-be-edited, and wherein AID is full-length human AID.
15. A method for ex vivo antibody diversification, comprising the steps: a. providing a mammalian cell comprising expression of an antibody of interest on the cell surface; b. introducing an expression system according to item 11 into the cell, wherein the crRNA part is complementary to a sub-sequence of a DNA sequence encoding the antibody of interest; c. selecting a cell expressing an antibody with increased or decreased affinity and/or altered effector functions, particularly via FACS; d. expanding the selected cell in cell culture. Wherever alternatives for single separable features such as, for example, a protein complex component or a method step are laid out herein as “embodiments”, it is to be understood that such alternatives may be combined freely to form discrete embodiments of the invention disclosed herein. Thus, any of the alternative embodiments for a protein complex component may be combined with any of the alternative embodiments of a method step mentioned herein.
The invention is further illustrated by the following examples and figures, from which further embodiments and advantages can be drawn. These examples are meant to illustrate the invention but not to limit its scope.
Description of the Figures
Fig. 1 shows proof-of-function of the new MEGA System. A) Schematic representation of MEGA-2. Hyperactive AID*A is fused with a XTEN-Linker to the N-terminus of dCas9. The transcriptional activator VP64 is attached to the C- terminus of dCas9. UGI is included as independent co-factor. A locus specific gRNA is complexed with the dCas9 moiety. B) (Upper panel) GFP-specific gRNAs were designed to either span or flank sequences where targeted C/G-to-T/A mutations created in-frame pre-mature stop codons. (Lower panel) GFP disruption assay. Successful base editing should lead to loss of GFP fluorescence. C) Representative FACS histograms show the distribution of GFP-negative and - positive cells depending on the indicated condition. Changes in GFP-negative cell population of transfected HEK293T-GFP cells were normalised to GFP-negative population of non-transfected HEK293T-GFP cells. D) Loss in GFP signal was normalized and shown as fold increase of GFP-negative population over nontransfected control HEK293T-GFP cells. MEGA led to a significant increase in GFP-negative cells. It is shown without or in combination with one out of four different GFP-targeting gRNAs, respectively. Each data point represents an independent experiment (n=3) and the standard deviation is shown. For statistical analysis an unpaired t-Test was done (* = p < 0.05; ** = p < 0.001). AID (Activation Induced Deaminase), dCas9 (nuclease dead Cas9), UGI (Uracil Glycosylase Inhibitor), NT (Non-Treated), FACS (Fluorescence Activated Cell Sorting).
Fig. 2 shows deamination occurs preferentially at AID hotspots within protospacer. Heatmaps show the base editing frequency on a single base level in comparison to the reference sequence. All loci were edited by MEGA-2 together with UGI. gRNAs targeting the antisense strand lead to G-to-A mutations whereas gRNAs complementary to the sense strand result in C-to-T mutations. The protospacer and PAM sequence are highlighted. Dots indicate specific sequence motifs within the quantification window. Dark grey dots: AID OHS; dots without colour filling: AID hotspots; grey dots: AID coldspots; light grey dots: C/G that do not belong to specific AID motifs. C’s and G’s that lead to pre-mature stop codons by deamination are marked with black stars. Three independent replicates of each target site were sent for deep sequencing. GFP loci G1 - G4 was sequenced by PacBio long-read Sequencing. Gene loci ATP1 a1 and TP53BP1 were sequenced by Illumina Technology. A (Adenosine), C (Cytosine), G (Guanosine), T (Thymine), OHS (Overlapping Hotspot). SEQ ID NO. and the sequence shown in this figure is stated in the table below.
Figure imgf000022_0001
Fig. 3 shows UGI enhances target mutation. A) C/G-to-T/A mutation frequency is shown for six loci, respectively. Base editing with and without UGI is compared. For each locus the respective wild type sequence is given. Protospacer region, PAM sequence and C’s/G’s are highlighted as in Fig 2. Nucleotide numbering corresponds to their position relative to PAM sequence at position 0. Standard deviation is shown for locus ATP1a1 and 53Bp1 . B) Total Indel frequency with and without UGI is summarized for each locus. Standard Deviation is given for locus ATP1 a1 and 53Bp1. Three independent replicates of each target site were sent for deep sequencing. GFP loci G1 - G4 was sequenced by PacBio long-read Sequencing. Gene loci ATP1a1 and TP53BP1 were sequenced by Illumina Technology.
Fig. 4 shows MEGA configuration impacts mutagenic activity. A) Schematic illustration of all four MEGA configurations. Its modular property allows the key components AID, linker and transcription activator VP64 to be changed. Independent co-factor UGI is shown as well. MEGA-4 lacks both the transcription activator VP64 and the XTEN-Linker. MEGA3 connects AID*A through a XTEN- Linker to dCas9. Compared to MEGA-2, construct 4 uses human wild type (WT) AID. B) Sequencing histograms of the GFP amplicon visualise the mutagenic outcome of each MEGA construct. Insertions, deletions and substitutions are shown. Deletions and substitutions occur mainly at the gRNA position (black errors with light grey background). C) Overall Indel and Substitution frequency of all four MEGA constructs. Three independent repeats were done for each condition. GFP amplicons was sequenced by PacBio Sequencing.
Fig. 5 shows MEGA induces broad mutation pattern in murine variable heavy chain domain. A) Plasmids encoding MEGA-2, UGI and four different CDR-targeting gRNAs were electroporated into the murine B cell line CH12-F3. B) Sequencing histogram of the variable heavy chain domain. Insertions, deletions and substitutions are. (Upper Panel) MEGA-2 together with UGI and four gRNAs. (Lower Panel) WT S. pyogenes Cas9 with four gRNAs. CDR1 - 3 and gRNAs are highlighted. C) Representative heatmap from gRNA locus V1 shows the base editing frequency on a single base level. The protospacer region and PAM sequence are highlighted. Dark grey dots: AID OHS; dots without colour filling: AID hotspots; grey dots: AID coldspots; light grey dots: C/G that do not belong to specific AID motifs; black dots: Polymerase Eta motifs. Three independent repeats were done for each condition. VH amplicons was sequenced by Illumina Technology. CDR (Complement Determining Region), VH (Variable Heavy Chain Domain). SEQ ID NO. and the sequence shown in this figure is stated in the table below.
Figure imgf000023_0001
Fig. 6 shows MEGA-1 has low mutagenic but high epigenetic activity. A) MEGA enables targeted demethylation of 5mC’s. Deaminated 5mC’s are recognized as T’s and will eventually be replaced by enzymatically replaced with a nonmethylated C. B) Schematic representation of the amplified mouse MyoD enhancer region. Red represents a CpG dinucleotide within an AID hotspot. The black arrow represents the location of the gRNA MyoD. Pie charts underneath indicate the percentage of clones with methylated CpG (shown as dark slice). Nucleotide numbering is in relation to the PAM sequence at position 0. C) Graphs show the percentage of genomic C-to-T mutations by Illumina sequencing at the methylated CpG positions and sequencing following bisulfite treatment in 3T3 cells. X axis labels indicate position with the PAM sequence being at position 0. The experiment was performed once. D) Relative MyoD gene expression normalized to housekeeping gene Actin B.
Fig. 7 shows comparison PacBio vs. Illumina Sequencing. Comparison between
PacBio long-read sequencing and Illumina sequencing of the GFP locus G2. The protospacer and PAM sequence are highlighted. Nucleotide numbering corresponds to their position relative to PAM sequence at position 0. SEQ ID NO. and the sequence shown in this figure is stated in the table below.
Figure imgf000024_0001
Fig. 8 shows base editing window of MEGA-2. In relation to the PAM sequence at position 0 mutations happened predominantly but were not limited to the second half of the protospacer region. The target window had an approximately size of 20 nucleotides. Mutation frequency peaked between position 14 and 17.
Fig. 9 shows no enhanced loss of GFP in presence of UGI. UGI did not enhance the increase in GFP-negative cells. Each data point represents an independent experiment (n=3) and the standard deviation is shown.
Fig. 10 shows UGI improved base editing purity. C-to-A/G and G-to-C/T mutation frequency is shown for six loci, respectively. Base editing purity with and without UGI is compared. For each locus the respective wild type sequence is given. Except for GFP locus G4 UGI reduces non-target base substitutions. Both off- target C-to-A & C-to-G as well as G-to-C & G-to-T mutations occurred with lower frequency. Protospacer region, PAM sequence and C’s/G’s are highlighted. Dark grey dots: AID OHS; dots without colour filling: AID hotspots; grey dots: AID coldspots; light grey dots: C/G that do not belong to specific AID motifs. Nucleotide numbering corresponds to their position relative to PAM sequence at position 0. Standard deviation is shown for locus ATP1a1 and 53Bp1 . Three independent replicates of each target site were sent for deep sequencing. GFP loci G1 - G4 was sequenced by PacBio long-read Sequencing. Gene loci ATP1a1 and TP53BP1 were sequenced by Illumina Technology.
Fig. 11 shows mutagenic activity of the MEGA System. A) The mutagenic activity of each MEGA version together with UGI was tested by a GFP disruption assay. GFP loci G3 and G4 are targeted simultaneously. Increase in GFP-negative cell population was normalized to non-transfected GFP-negative HEK293T-GFP cells. Despite differing efficiency all MEGA configurations caused a significant increase in GFP-negative cells. B) HEK293T-GFP cells were transfected with WT SpCas9 and gRNAs G3 and G4. Sequencing histogram shows a large deletion event spanning both gRNA locations. No single nucleotide substitutions were detectable. C) Comparison of targeted C-to-T mutation frequency at GFP locus G3 and G4. Protospacer region, PAM sequence and C’s/G’s are highlighted. Dark grey dots: AID OHS; dots without colour filling: AID hotspots; grey dots: AID coldspots; light grey dots: C/G that do not belong to specific AID motifs. Nucleotide numbering corresponds to their position relative to PAM sequence at position 0. MEGA configuration did impact the on-target C-to-T mutation frequency, especially at gRNA position G3. MEGA-2 had the highest editing efficacy. For gRNA position G4 the difference was less pronounced. MEGA3 showed comparable editing activity than MEGA-2. D) Non-target base editing frequency is shown for each MEGA version at gRNA position G3 and G4. MEGA configuration did influence base editing purity. The highest off-target mutations were seen for MEGA-2. Three independent replicates of each condition were sent for PacBio deep sequencing. Standard deviation is shown. For statistical analysis an unpaired t-test was done (* = p < 0.5, ** = p < 0.001 , *** = p < 0.0001 & **** = p < 0.00001).
Fig. 12 shows targeted mutagenesis in variable heavy domain. Heatmaps represent targeted single base mutations around gRNA position V2, V3 and V4. Protospacer region, PAM sequence and C’s/G’s are highlighted. Dark grey dots: AID OHS; dots without colour filling: AID hotspots; grey dots: AID coldspots; light grey dots: C/G that do not belong to specific AID motifs; black dots: Polymerase Eta motifs. (Upper panel) Quantification window around gRNA position V2 only shows specific C-to-T editing at AID hotspot motifs. (Middle and lower panels) At gRNA positions V3 and V4 a highly diffuse mutation pattern was observed. It did not follow an AID-like mutation signature. Three independent replicates were sent for Illumina deep sequencing.
Figure imgf000025_0001
Fig. 13 shows experimental outline of 5mC demethylation analysis. Human HEK293A or mouse 3T3 cells were transfected with MEGA-1 and Myo D-targ eting gRNAs. Genomic DNA was extracted and used either for bisulfite treatment and sequencing or Illumina deep sequencing.
Fig. 14 shows MEGA-based ex vivo affinity maturation pipeline. Schematic representation of the screening pipeline for MEGA-dependent antibody engineering/ex vivo affinity maturation.
Fig. 15 shows UFKA22-displaying cell line for MEGA mutagenesis. A) Comparison in IgG Fc and IgK expression between HEK293T-GFP and HEK293TG-mllFKA22 cells. B) Comparison in IL-2 binding between HEK293T-GFP and HEK293TG- mllFKA22 cells. C) Schematic representation of gRNA locations at the UFKA22 variable heavy and light chain domain, respectively. D) Representative comparison between mutated and non-mutated UFKA22-displaying cell lines.
Fig. 16 shows UFKA22 mutagenesis. A) Three different mutated UFKA22 cell lines were generated through MEGA-2-dependent ex vivo affinity maturation. Cell line S11 ,3#-P underwent three mutation rounds with MEGA-2 and a mix of ten gRNAs targeting both variable heavy and light chain. Cell Line S6.1#- underwent one mutation round with MEGA-2 and four gRNAs only targeting the variable light chain domain. Cell line S12.1#- also underwent only one mutation round with MEGA 1 but with four gRNAs targeting the variable heavy chain domain. B) Mutation spectrum of the respective cell lines analysed by single molecule long read sequencing. C) Amino acid sequence of the most frequent variants for each cell line respectively. SEQ ID NO. and sequences shown in this figure are stated in the table below.
Figure imgf000026_0001
Figure imgf000027_0001
Fig. 17 shows that UFKA22 variants reduce but not diminished affinity to IL-2. A) Visualising the mutations for further testing on the crystal structure of UFKA22 variable heavy and light chain domain in complex with IL-2. B) In vitro IL-2 binding assay shows different level of reduced IL-2 affinity.
Table 1 shows gRNA List.
Table 2shows PacBio sequencing primers.
Table 3shows Illumina sequencing primers.
Table 4shows qPCR primers.
Examples
Example 1: Engineering and proof-of-function of the MEGA base editing system
To establish a modular base editing system, the inventors selectively exploited the benefits of prokaryotic and eukaryotic adaptive immunity by merging the Cas9 and AID gene editors’ advantages, respectively. MEGA-2 includes hyperactive human AID*A as deaminase moiety (Fig 1 A). Hyperactive AID*A was reported to have superior deamination activity than human wild type (WT) AID. The inventors fused AID*A with a flexible XTEN-Linker to the N-terminus of the S. pyogenes-de rived nuclease dead Cas9 (dCas9). VP64 is a potent minimal transcription activator leading to transcription bubble formation and subsequently ssDNA exposure. By adding this to the C-terminus of the inventors’ construct the inventors aimed to increase substrate accessibility. Inhibiting the endogenous uracil DNA glycosylase with the bacteriophage-derived uracil glycosylase inhibitor (UGI) improves C-to-T base editing. Hence, many CBEs have included UGI within their constructs. The inventors complemented the inventors’ MEGA System with UGI as an independent co-factor which has been also shown to enhance mutagenesis. To demonstrate the activity of the inventors’ system a GFP disruption assay was performed using HEK293T-GFP cells, that constitutively express GFP. Either through CRISPRdirect or manual curation, four GFP- specific gRNAs were designed to target different loci where C/G-to-T/A mutations create a premature stop codon (Fig 1 B). After transfection of HEK293T-GFP cells, fluorescence loss was measured by flow cytometry. With each gRNA, the inventors increased the GFP-negative cell population, respectively. Without gRNA such a shift was not observed (Fig 1C). Depending on the targeted locus the GFP-low cell population significantly increased between 2.32 ± 0.28 to 4.24 ± 1 .43-fold compared to non-transfected control (Fig 1 D). Example 2: Deamination occurs preferentially at AID hotspots within the protospacer
Next, the inventors performed deep sequencing to identify the mutation signature of MEGA-2 at six different genomic sites. In addition to the four GFP loci the inventors included two endogenous genes. The Na+/K+ ATPase ATP1a1 gene was targeted with a previously published gRNA herein referred as gRNA A. For TP53BP1 the inventors designed gRNA B. Targeted amplicon sequencing was done for each locus, respectively. All four GFP loci underwent single molecule long-read PacBio sequencing. ATP1a1 and TP53BP1 were sequenced by Illumina technology. Both approaches gave comparable results (Fig 7). MEGA-2 exclusively mutated C’s on the non-targeted DNA strand (Fig 2). In general, editing efficacy varied between the different loci. Most deamination events happened within the protospacer region where the base editor unwinds the DNA. For GFP loci G1 and G2 as well as for TP53BP1 , however, single nucleotide mutations also occurred beyond the protospacer (Fig 2). The inventors observed that MEGA-2 retained physiological AID sequence preferences. Hotspot and OHS motifs were preferentially deaminated across all tested sites. For TP53BP1 the OHS outside the protospacer was targeted as well. Interestingly, MEGA-2 was able to mutate coldspot motifs at GFP loci G1 and G2. When hotspot/OHS were present, e.g. at GFP loci G4, mutations at coldspots occurred less efficiently. Indeed, the inventors detected deamination of unrelated C’s/G’s that did not belong to specific motifs. Most often that was seen when they were in close proximity to AID hotspots. In addition to specific sequence motifs, base editing efficacy also dependent on the target nucleotide position. MEGA-2 showed a broad editing window of approx. 20 nucleotides. Deamination events happened most efficiently between nucleotide position 12 and 19 with the PAM sequence being at position 0 (Fig 8). The allele frequency showed that mostly single nucleotide substation events happened. Simultaneous mutations of more than one nucleotide happened less frequently. For the ATP1a1 locus with its three OHS, multiple editing events, however, occurred with the highest frequency.
Example 3: UGI enhances target mutations
In the phenotypic GFP disruption assays the inventors did not notice a significant additive effect of UGI (Fig 9). Yet, when the inventors compared on-target mutation frequency UGI strongly improved overall base editing efficacy at four out of six loci (Fig 3A). For example, at GFP locus G4 position C14 (PAM sequence being at position 0) C-to-T editing increased from 1.58 % to 5.42 %. At GFP locus G3 and gene locus TP53BP1, however, UGI improved editing only slightly at three out of eight and two out of three C’s, respectively, (Fig 3A). Overall, with UGI the maximum editing frequency ranged between 5.42 % and 22.69 ± 7.91 %. Improving on-target mutations aside, UGI also reduced unwanted substitution events (Fig 10). For GAP loci G1 to G3, gene locus ATP1a1 and gene locus TP53BP1 UGI led to a reduction of non-targeted base substitutions. Especially, at GFP locus G3 position C16 off-target C-to-A as well as C-to-G mutations dropped from 1.68 % and 4.01 % to 0.41 % and 0.97 %, respectively. Interestingly, the opposite was seen for GFP locus G4 where non-target base substitutions were already below 1 % without UGI (Fig 10). Here, UGI did lead to an increase rather than a decrease in events, suggesting that UGI could not inhibit UNG activity efficiently enough. While the inventors did observe a high level of Indels across all loci (Fig 3B), UGI reduced overall Indel frequency at four out of six loci (GPP G2, G3, ATP1a1 and TP53BP1). Even though UGI improved the non-targeted editing efficacy at locus G1 , it did not achieve to lower the Indel rate. At locus G4 UGI even doubled the Indel frequency. This indicated that UGI activity also influenced by the sequence context. Eventually, the total Indel frequency ranged between 6.84 % and 22.7 %.
Example 4: MEGA configuration impacts mutagenesis
MEGA-2 efficiently mutated single bases (Fig 3A). However, high Indel frequencies pointed towards the induction of DNA DSBs (Fig 3B). Three additional configurations were generated to better understand the mutagenic mode-of-action of the Cas9-AID fusion. The modular setup allowed us to change the deaminase moiety and to remove or retain the linker as well as the transcription activator (Fig 4A). Again, the inventors tested base editing activity by GFP disruption assay. HEK293T-GFP cells were transfected with both gRNAs G3 and G4, UGI and one of four MEGA variants. Indeed, MEGA architectural conformations affected editing activity. All constructs led to a significant increase in the GFP-negative cell population (Fig 11 A). The highest fold-change with 4.3 ± 1.36 was seen for MEGA-2. Without VP64 transcription activator MEGA3 achieved a 3.16 10.45- fold change while WT AID MEGA-1 only showed a modest change of 1 .11 ± 0.07-fold compared to control. Most current CBEs use S. pyogenes-derived nickase Cas9 (nCas9) instead of dCas9. The D10A point mutation within the RuvC domain allows nCas9 only to cut the DNA strand that is complementary to the gRNA. Hence, the non-edited will be targeted for DNA repair whereby the deaminated C (recognized as a U) will serve as template. Eventually this enhances the frequency of successful C/G-to-T/A mutations. To assess the effect of nCas9 in the inventors’ system, AID*A was directly fused to the N-terminus of nCas9, to create MEGA-4. Its activity was comparable to MEGA3 and led to a 2.43 ± 0.79-fold increase in the GFP-negative cell population. The frequency of insertions, deletions and substitutions was again determined by PacBio Deep Sequencing. With WT S. pyogenes Cas9 the inventors detected broad deletions and few insertions between both gRNAs (Fig 11 B). The sequence histograms showed that substitutions only happened around gRNA G3 and G4 (Fig 4B). The inventors further noticed the presence of long deletions spanning the region between both gRNAs. That occurred predominantly with MEGA-2 and to a lesser degree with MEGA-4 and MEGA3. The deletion frequency peaked at the gRNA location G4.
MEGA-2 showed the highest overall substitution frequency of 31 .46 %. With 21 .88 % MEGA3 was less efficient. Without the XTEN-Linker MEGA3 had a total substitution frequency of 15.92 %. The least efficient construct with an overall substitution frequency of 6.14 % was MEGA-1 (Fig 4C). As visualised in the sequence histograms (Fig 4B), the construct architecture also influenced overall Indel formation. MEGA-2 did not only induce the most efficient base editing but also the highest Indel frequency of 46.34 %. Compared to that MEGA-4 and 3 had only minor Indel rates of 1 .59 % and 3.21 %, respectively. No deletions were detectable with MEGA-1 . When comparing targeted C-to-T editing frequency at GFP locus G3 and G4 all four versions showed the same nucleotide preference (Fig 11 C). The most efficient C-to-T editing was done by MEGA-2, followed by 3, 2 and 4. Interestingly, the base editing frequency at locus G4 improved when gRNA G3 and G4 were combined compared to gRNA G4 alone (Fig 3A). Construct configuration influenced base editing purity as well (Fig 11 D). No unwanted editing outcomes were detected with MEGA-1 . Both, MEGA-4 and 3 caused detectable but low non-target mutations. The highest rate of off- target bases was seen with MEGA-2.
Example 5: MEGA induces broad mutation pattern in murine Variable Heavy Chain Domain
The murine IgM-positive B cell lymphoma cell line CH12-F3 is unable to undergo SHM. So far it has not been reported that its immunoglobulin variable domain can be mutated by exogenous means. The inventors’ AID-based MEGA System retained a similar mode-of-action as physiological AID. The inventors were interested to see whether or not it can mutate the variable heavy chain domain in a SHM-like fashion. CH12-F3 cells were electroporated with MEGA-2 together with UGI and four CDR-targeting gRNAs (Fig 5A). Subsequently, the VH domain was sequenced by Illumina technology. At the four gRNA locations the inventors succeeded in creating single base mutations that were distinguishable from background noise. Deletions were also detectable at the respective sites. Overall only MEGA-2 was able to induce single base mutations while WT Cas9 only caused broad deletions (Fig 5B). Base editing predominantly happened at AID hotspots (Fig 5C and Fig 12). For locus V1 the inventors observed A-to-G mutations at 5’ WA 3’ motifs indicating the involvement of error-prone polymerase eta. For gRNA loci V3 and V4 the mutation pattern was scattered and did not follow defined sequence motifs (Fig 12).
Example 6: MEGA-1 has low mutagenic but high epigenomic editing through active Cytosine demethylation
Besides genomic editing AID-dependent deamination leads to 5mC demethylation. Targeted 5mC deamination results in a T-G mismatch, which resolves by replacing the T through an unmethylated C (Fig 6A). Despite its low genomic editing activity, the inventors wanted to know if MEGA-1 had instead epigenomic editing potential. MyoD is a well-defined master transcriptional regulator for muscle cell development. Gene expression depends on the methylation status of the MyoD DMR5 Enhancer region. To evaluate if the inventors’ MEGA system can induce MyoD expression by deaminating and subsequently demethylating 5mC’s, the inventors targeted a methylated AID hotspot within the MyoD DMR5 Enhancer region of murine 3T3 cells that has been shown to be critical for gene expression (Fig 6A). The inventors used a previously published MyoD-specific gRNA (gRNA MyoD) located in close proximity to the methylated AID hotspot (Fig 6B). Methylation status was analysed 48 h post-transfection by sequencing bisulfite-treated DNA where unmethylated C’s are read as T’s and 5mC’s as C’s (Fig 13). MEGA-1 and MEGA-2 led to 5mC demethylation in an editing window of 16 and 87 nucleotides, respectively (Fig 6B & Fig 6C). Between the two base editors MEGA-1 showed the strongest activity. At the methylated AID hotspot at position C26 and the adjacent 5mC at position C21 MEGA-1 demethylated 80 % of all sequencing reads (Fig 6C). In parallel, the locus was fully sequenced using Illumina to exclude that the bislufite-sequencing results were due to genomic 5mC-to-T mutations (Fig 13). Neither MEGA-1 nor MEGA-2 were able to mutate a 5mC (Fig 6C). Eventually, the inventors detected a 2.3±0.83-fold increase in MyoD expression compared to control when transfecting with MEGA-1 and gRNA MyoD (Fig 6D). When using another published gRNA (gRNA Ctrl) that binds approx. 170 nucleotides upstream of gRNA MyoD the inventors did not detect a significant increase in expression (Fig 6D).
Example 7: MEGA-based ex vivo affinity maturation pipeline
To perform ex vivo antibody diversification (affinity maturation) a cell line which displays the mAb of interest on its cell surface is transfected with plasmids encoding MEGA-2 and gRNAs targeting the variable heavy/light chain domain, respectively (Fig 14). Upon one to three rounds of transfections a mixed pool of cells with different binding characteristics emerges. Eventually, cells with the desired binding affinity (higher or lower) can be isolated of the mixed pool through fluorescence activated cell sorting. The mutation spectrum is than determined by deep sequencing with a single molecule long read sequencing method.
Example 8: UFKA22-displaying cell line for MEGA mutagenesis
The mutagenesis pipeline requires the antibody of interest to be stably expressed on the surface of a mammalian cells. Hence, HEK293T-GFP cells, which constitutively express GFP were transfected with a plasmid encoding UFKA22 heavy and light chain, a humanized anti-human IL- 2 specific mAb. For surface-display a trans-membrane domain was added. 86.1 % of the HEK293TG-mUFKA22 cell line expressed the full antibody on its surface (Fig 15 A). Heavy and light chain were displayed in an equal ratio. 82.3 % of the cells were able to bind recombinant human IL-2 (Fig 15 B). In addition to the cell line, the mutagenesis pipeline depends on gRNAs that localises MEGA-2 to the variable heavy and light chain domain. In total eleven gRNAs were designed to target predominantly the CDR regions of UFKA22 (Fig 15 C). One round of mutagenesis with MEGA-2 and all eleven gRNAs showed a change in IL-2 binding compared to non-mutated cells (Fig 15 D). A small population of cells showed a reduced affinity to bind IL-2.
Example 9: L/FKA22 mutagenesis
In total three different affinity-reduced cell lines were generated using our ex-vivo affinity maturation pipeline with MEGA-2 (Fig 16 A). Cell Line S11 ,3#-P underwent three mutation rounds while cell lines S12.1#- and S6.1#- were only mutated once. The cell lines differed in their IL-2 binding level but had in common that their affinity was lower than from non-mutated cells. Single molecule long read sequencing showed that different genotypes had similar phenotypes. Both, mutations in the variable light chain (S11 ,3#-P and S6.1#-) as well as in the variable heavy chain (S12.1#-) had an effect on IL-2 affinity. Overall, MEGA-2 mutagenesis led to deletions, insertions and substitution events on the genomic level (Fig 16 B). Even though always a mix of four to ten gRNAs were used per transfection, mutations were not always seen at each gRNA locus. For cell line S11 ,3#-P deletion and substitution events were only detected around gRNA GL2, while in cell line S6.1#- mutations only occurred next to gRNA GL5. In cell line S12.1#- mutagenic events happened at each gRNA position. The most frequent allele variants and their respective amino acid sequences were retrieved by analysing the deep sequencing data. Each cell line showed different sequences with characteristic mutations (Fig 16 C).
Example 10: L/FKA22 show reduced but not diminished affinity to IL-2
Based on the crystal structure of the UFKA22 variable heavy and light chain in complex with IL-2 three variants were chosen for further testing in vitro (Fig 17 A). In addition to the UFKA22_C94Y/A100T double mutant the respective single mutants (C94Y only and A100T only) were included as well. In an in vitro IL-2 binding assay it was seen that all UFKA22 variants kept their IL-2 specificity (Fig 17 B). Indeed, all variants showed a reduced IL-2 by having higher IC50 values than non-mutated UFKA22. Among the variants the C94_A100T double mutant remained the best binder, even better than the single mutants alone.
Example 11: Discussion
In this work, the inventors succeeded in re-creating the full functional spectrum of human AID ex vivo. The inventors’ novel MEGA system allows targeted SHM-like single base mutations, CSR- like DSB induction and 5mC demethylation. Construct architecture and configuration strongly influenced genetic and epigenetic mutagenesis.
MEGA architecture impacts AID-dependent mutagenesis
Previous studies with non-B cell lines have shown that ectopic expression of full-length WT AID is able to induce SHM-like events. Such approaches, however, suffer from poor mutation rates. To ensure high and efficient base editing the inventors focused on the engineered hyperactive human AID*A variant. Compared to full-length protein AID*A lacks the C-terminal nuclear export signal (NES) and harbours three amino acid substitutions. Both modifications alone have been correlated with enhanced mutation frequency. Without NES AID*A stays longer in the nucleus which prolongs its mutagenic activity. Neither hyperactive AID*A nor comparable variants have been used in Cas9 fusion proteins.
When comparing MEGA-2 and MEGA3 the inventors noticed that enhanced substrate accessibility improved base editing drastically. MEGA3 had a more conventional CBE architecture with a deaminase, a linker and dCas9. For ssDNA exposure these CBEs rely on Cas9-dependent RNA:DNA hybridisation, known as R-loops. In B cells, however, R-loops are not essential for SHM. Instead, AID requires active transcription where stalling or pausing of the RNA polymerase II (RNAPII) results in premature transcription termination and ssDNA exposure. Therefore, the inventors hypothesised AID’S ex vivo activity would be enhanced with a physiological substrate environment through transcription bubble formation. Fusion of the synthetic transcriptional activator VP64 to dCas9 has been shown to induce gene transcription. Indeed, VP64-activity allowed MEGA- 2 reach higher mutation frequencies than MEGA3 (Fig 3, 4C & 12C). With 20 nucleotides the editing window also reflected the approx, size of a transcription bubble (Fig 8). Further studies are needed to determine if the same transcription machinery with RNAPII and its co-factor Spt5 is involved in MEGA-2-dependent editing. Even in the most advanced configuration the activity of full-length WT AID (MEGA-1) remained low (Fig 4C). It shows that efficient base editing depends on the deaminase and the sequence context. For future genome editor design both should be considered equal important.
Steric hinderances between deaminase and Cas have been reported to decrease or even abolish the overall editing efficacy. Interestingly, for MEGA-4 where AID*A was directly fused to nCas9 the inventors saw comparable base editing activity as for MEGA3. It seemed that the missing linker was partially compensated by nCas9 (Fig 4C & 12C). This showed that in the inventors’ system AID*A was not affected by steric hinderance hence allowing a more compact base editor architecture.
The interplay ofAID*A and VP64 mimics CSR induction
Besides expected single base substitutions, the inventors also observed high Indel frequencies with MEGA-2 (Fig 3B). Standard CBEs catalyse no or only few DSBs (E. M. Porto et al., Nature Reviews Drug Discovery 19, 839-859 (2020)). Most likely, the phenotypic GFP loss caused by MEGA-2 was not exclusively a result of targeted C/G-to-T/A mutations. The occurrence of Indels and potential sequence frame-shifts might have contributed strongly to the phenotype. This might explain why UGI did not have a significant effect in the GFP disruption assay (Fig 9). UGI improved on-target single base mutations, but simultaneously reduced DSB formation (Fig 3).
During CSR AID mutagenesis leads to DNA DSBs. The Ig switch regions contain many AID OHS. This alone, however, does not explain the difference between SHM and CSR. While SHM happens independent of secondary DNA structures, the opposite is seen for CSR. The high G-cluster density in the switch regions facilitate R-loop as well as G4 structure formation. It is proposed that they slowdown RNAPII processivity. In addition, the presence of R-loops was associated with switch region-specific replication. Potentially transcription and replication together provide AID a prolonged substrate exposure. A structural study demonstrated also that G4 structures have a higher affinity towards AID and promote its oligomerisation in vitro.
With MEGA-2 the inventors were able to recapitulate the requirements for single point mutations to be processed to DSBs. Forced transcription bubble formation through VP64 together with Cas9- dependent R-loops could have provided a CSR-like secondary DNA structure. Whether or not locus-specific replication was involved remains to be elucidated. Further work would be needed to clarify the potential DNA structure. Hyperactive AID*A was likely to compensate for the lack of high AID hotspot frequency. It cannot be excluded that deaminase oligomerisation contributed to enhanced base editing as well as DSB creation. Cryo-electron microscopy structure analysis of adenosine base editor ABE8e revealed that the deaminase moiety of two ABE8e molecules dimerise at the target site. The enhanced substrate presentation could promote a similar mechanism for MEGA-2. Interestingly, only the combination of AID*A and VP64 efficiently led to CSR-like DSB events.
The MEGA system induces synthetic SHM ex vivo
Upon binding AID slides and jumps along the ssDNA to search for suitable target motifs. The enhanced substrate accessibility and construct flexibility might allow MEGA-2 a similar mode-of- action. The inventors’ MEGA system showed nearly the same sequence preference as AID in vivo (Fig 2). The inventors detected the highest mutation rates at hotspot and OHS motifs while slightly favouring OHS. This confirmed similar observations of a recently published AID CBE. Coldspots and unrelated C’s which would be omitted by physiological AID were found to be mutated as well (Fig 2). Hence, nucleotide sequence preference was less restricted with hyperactive AID*A. Eventually, the inventors wanted to prove that the inventors’ MEGA system not only mechanistically mimics AID, but also recreates physiological AID functions. The inventors chose the murine B cell line CH12-F3 to test the inventors’ system. Despite expressing AID, its Ig variable locus is supposed to be not mutable. Indeed, the inventors achieved to create mutations above background level. Besides C/G-to-T/A mutations the inventors detected A-to-G substitutions (Fig 5C). As they fall into 5’-WA-3’ motifs the inventors concluded the low-fidelity DNA polymerase eta to be responsible. This is of particular interest as error-prone DNA repair represents the second phase of SHM. In addition, the inventors induced different levels of Indels within the variable domain (Fig 5B). During physiological SHM this happens as well to further broaden the Ig repertoire. Standard WT Cas9 was not able to generate a comparable mutagenic diversity (Fig 5B). The inventors are the first who successfully diversified an endogenous B cell Ig variable domain with a CBE. The inventors’ MEGA system mimicked the complex SHM signature ex vivo.
MEGA-1 Promotes AID’S Epigenetic Function
Epigenomic editing aims to modify specific DNA methylation sites to change gene expression. Targeted changes in promotor/enhancer methylation either leads to gene activation or silencing. Active 5mC and 5hmC demethylation is mainly catalysed by members of the ten-eleven translocation methylcytosine dioxygenase (TET) enzyme family. Thus, current programmable epigenomic editors link dCas9 to a TET moiety.
AID can also actively demethylate selected genomic loci. In contrast to TET-induced oxidation, AID deaminates 5mC/5hmC whereby changing it to a T. Eventually, TDG or methyl-binding domain glycosylase 4 (MBD4) recognise and replace the mismatch with an unmethylated C. So far, AID has been only considered for genome but not epigenome engineering. The inventors’ demethylation experiments impressively proved that the modular MEGA system can induce MyoD gene expression by targeted 5mC demethylation (Fig 6B - D). In a recently published work MyoD expression was induced by a dCas9-TET 1 fusion protein. The MyoD DMR5 Enhancerwas targeted by four different gRNAs spanning a region over 200 nucleotides. By only using one gRNA together with MEGA-1 the inventors achieved a comparable increase in MyoD expression as in the previous work. The inventors’ finding that demethylation of certain CpGs is enough to induce gene expression fits in line with previous work. While TET-fusion constructs have a broad and unspecific demethylation activity, the inventors’ MEGA system was able to edit the essential 5mC site in a narrow editing window of 16 nucleotides to induce gene expression. The absence of genomic 5mC- to-T mutations with MEGA-1 confirmed that the deamination activity exclusively affected the epigenome but not the genome. It proofed again that MEGA-1 has very limited mutagenic activity. Interestingly, MEGA-2 did not have any mutagenic activity, even though it is the variant with the strongest deamination phenotype. However, highly methylated genomic regions represent a challenging target for base editors. It has been shown that editing efficacy depends on the CBE type. Eventually, the inventors’ system can be useful to understand the role of specific CpG clusters through its precise demethylation window.
Example 12: Materials and Methods
Construct Cloning
MEGA-2 was assembled by introducing AID*A-XTEN-Linker at the N-terminus of dCas9-VP64 in the backbone vector Cas9m4VP64 (Addgene #47319) through two-step ligation. AID*A was amplified from the pGH335_MS2-AID*A-Hygro plasmid (Addgene #85406) with primers including the XTEN-Linker at the C-terminus. For MEGA-1 the same cloning strategy was used but with human full-length wild type AID. MEGA3 was constructed in a one-step ligation process whereby AID*A-XTEN was introduced into the Cas9m2 vector (Addgene #47317). MEGA-4 was cloned in the same way as MEGA3. The AID*A fragment without XTEN was ligated into the hCas9_D10A (Addgene #41816) backbone. gRNA plasmid constructs
Specific gRNAs targeting the GFP, TP53BP1 and CH12-F3 IgH variable domain locus as well as the L/FKA22 variable heavy/light chain domain locus were designed by manual curation with Benchling or using CRISPRdirect (Y. Naito et al., Bioinformatics 31 , 1120-1123 (2015)). For ATP1a1 and mouse MyoD published gRNAs were used (D. Agudelo, et al., Nature Methods 14, 615-620 (2017); X. S. Liu, et al., Cell 167, 233-247.e17 (2016)). Cloning of gRNA expressing vectors was done as described previously (P. Mali, et al., Science 339, 823-826 (2013)). In brief, 19 bp of the respective gRNA sequence were incorporated into two 60mer oligo nucleotides. The two oligos were annealed and extended using Phusion polymerase (NEB®). Eventually, the destination plasmid gRNA_Cloing vector (Addgene #41824) was linearized with Aflll and the 100 bp fragment was incorporated by Gibson assembly. A complete list of gRNAs is in table 1 .
Cell Culture
All cells were maintained in 10 cm dishes (Sarstedt) at 37 °C and 5 % CO2. HEK293A, HEK293T- GFP and 3T3 cells were grown in DMEM (Sarstedt) supplemented with 10 % FBS (Sarstedt), 1 % penicillin/streptomycin + L-Glutamine (Gibco), 1% Sodium Pyruvate (Gibco) and 50 pM p- mercaptoethanol (Gibco). Growth media of HEK 293TG-mllFKA22 and HEK293TG-mllFKA22 variants in addition included 300 pg/ml Hygromycin B (Corning). CH12-F3 mouse erythroleukemia B cells were grown in RPMI 1640 (Sarstedt) supplemented with 10 % FBS (Sarstedt), 1 % penicillin/streptomycin + L-Glutamine (Gibco), 1% Sodium Pyruvate (Gibco), 5 % NCTC-109 (Gibco) and 50 pM p-mercaptoethanol (Gibco).
Lipid-based Cell Transfection
All lipid-based transfections were done with Lipofectamine™ 3000 (Invitrogen) and following manufacturers protocol. In brief, for GFP, ATP1a1, TP53BP1 and LIFKA22 variable heavy and light chain domain base editing 0.5 x 106 HEK293T-GFP cells/well were seeded in 6 well plates the day before transfection. 750 ng of MEGA or WT Cas9 plasmid, 500 ng of gRNA plasmid were mixed with 5 pl P3000™ reagent and 3.75 pl Lipofectamine™ 3000 reagent. To determine the effect of UGI 500 ng of UGI plasmid was added to the mix. For targeted demethylation 1 x 106 mouse 3T3 cells/well were seeded in 6 well plates. 750 ng of MEGA plasmid, 1000 ng of mouse MyoD gRNA plasmid or control gRNA plasmid and 500 ng of a GFP plasmid as transfection marker were mixed with 5 pl P3000™ reagent and 3.75 pl Lipofectamine™ 3000 reagent. After 72 hours cells were harvested and either directly analysed or stored at -80 °C for sequencing.
Cell Electroporation
To mutate the murine variable heavy chain domain CH12-F3 cells were electroporated using the Gene Pulser Xcell Eukaryotic System (BioRad). In brief, 5 x 106 cells were resuspended in Opti- MEM medium (Gibco) together with 3 pg DNA (1000 ng MEGA plasmid and 500 ng of each respective gRNA plasmid) in 0.4 cm gap cuvettes (BioRad). Electroporation was done with 30 ms pulse and square wave setting. For electroporation of 3T3 cells the SF Cell Line 4D-Nucleofector™ X Kit S (Lonza) was used with program EN-158. 1 x 106 cells were resuspended with 750 ng MEGA plasmid and 1000 ng mouse MyoD gRNA plasmid.
Flow cytometry
Cells were harvested, spun down and resuspended in 1% PFA/PBS 72 hours after transfection. GFP fluorescence was detected using a BD FACSCalibur or BD Accuri C6 plus flow cytometer. Live cells were gated based on FSC-A/SSC-A morphology. GFP-negative HEK293A and GFP- positive HEK293T-GFP cells were used as negative and positive control, respectively. Loss in GFP signal was compared to non-transfected HEK293T-GFP cells using FlowJo V6. For IL-2 staining HEK293TG-mUFKA22 WT and variant cells were incubated with fixable live/dead ZOMBIE Violet stain followed by anti-human IgG Fc APC and biotinylated IL-2 (IL2-Biotin). To detect IL-2 the cells were subsequently incubated with PE-conjugated streptavidin. Cells were analysed with BD Fortessa. PacBio long-read single molecule sequencing of the GFP locus and LIFKA22 Variable Heavy/Light Chain Domain locus
For amplicon long-read single molecule sequencing the Pacific Biosiences (PacBio) RSII sequencer with the a SMRTcell was used. Genomic DNA was isolated with the GeneJet Genomic DNA extraction kit (Thermo Fisher Scientific) from frozen cells which were previously transfected with MEGA and gRNAs. Library preparation of the target gene amplicon was done in a one-step PCR reaction. The respective primers contained a PAD sequence for ligation to the SMRTcell, a barcode sequence for multiplexing and the GFP-specific sequence. Primer sequences are listed in table 2. To avoid PCR bias three independent PCR reactions for each sample were combined. Eventually, all barcoded amplicons were pooled in an equimolar ratio and cleaned with GeneJet PCR Purification Kit (Thermo Fisher Scientific). The amplicon pool was quantified with Qubit Fluorometer (Thermo Fisher Scientific) and Qubit dsDNA BR Assay Kit (Thermo Fisher Scientific). Product purity was measured with an automated TapeStation (Aiglent) using High Sensitivity D1000 Screen Tape. Follow up AMPure PB bead clean-up, adaptor ligation and sequencing were done by the Exeter Sequencing Service.
Illumina sequencing of GFP, ATP1a1, TP53BP1, variable heavy chain domain and MyoD
Illumina amplicon library preparation was done with the Nextera XT DNA Library Preparation Kit (Illumina). PAD sequences for Nextera adaptors were added to the locus specific primers and subsequently each locus was amplified (Table 2). Primers are listed in table 3. Three independent reactions were performed and pooled for each sample to avoid PCR bias. After PCR clean-up Nextera Index were added to the amplicons through a second PCR reaction. PCR product quantity and quality were assessed by Qubit Fluoremeter (Thermo Fisher Scientific) and TapeStation (Agilent). Samples were sequenced by the Exeter Sequencing Service with an Illumina HiSeq Sequencer.
Bisulfite treatment and sequencing
Bisulfite conversion of DNA from 5 x 104 cells was performed using the EZ DNA methylation kit (Zymo Research). In brief, the genomic DNA was incubated with CT conversion reagent at 98 °C for 8 mins, then 14 cycles of 95 °C for 15 seconds and 64 °C for 15 mins. DNA was cleaned and eluted following the manufacturer’s instructions. Subsequently, the target region of the DMR5 MyoD enhancer region was amplified by PCR at 95 °C for 12 mins, then 40 cycles of 95 °C for 90 seconds, 58 °C for 90 seconds, 72 °C for 45 seconds. A final elongation step of 10 min was included in all reactions. PCR products were analysed by gel electrophoresis and products purified with MiniElute (Qiagen). Three separate PCRs were performed on each sample to control for PCR bias in the subsequent analysis. PCR products were pooled from individual samples and cloned into a TA vector and sequenced by Sanger sequencing. Only unique sequences (as determined by either unique CpG methylation pattern or unique non-conversion of non-CpG cytosines) are shown, and all sequences had a conversion rate >99 %. RNA isolation and qPCR
RNA was isolated and purified using the RNeasy Mini Kit (Qiagen) according to manufacturer’s instructions. RNA concentration was measured with NanoDrop-ND 1000 (Thermo Fisher Scientific) and cDNA was synthesized using the High-Capacity cDNA Reverse Transcription Kit (Thermo Fischer Scientific). Quantitative Reverse Transcription-PCR (qRT-PCR) for target genes was performed using HOT FIREPol EvaGreen qPCR Mix Plus w/o ROX (Solis BioDyne) using CFX384 Touch Real-Time PCR System (Bio-Rad). Respective primers are listed in table 4. Pre-analysis was done with the CFX Maestro Software and the relative expression values were calculated with the AACt method, normalizing the Ct values to the housekeeping gene Actin B (mouse).
Targeted Amplicon Sequencing Analysis
Demultiplexed amplicon deep sequencing data were analysed with CRISPResso2 web version (K. Clement, et al., Nature Biotechnology 37, 224-226 (2019)). PacBio Sequencing data were uploaded as single end and Illumina Sequencing data as paired end reads. Minimum homology for alignment was set to 80 %. Remaining parameters were kept at default. Nextera PE was chosen for adapter trimming of Illumina-derived deep sequencing data. Heatmaps were generated with. The mean percentage value was calculated for samples with more than one repeat. Subsequent values of non-transfected controls were subtracted from sample values for normalisation. Graphs were made with Graph Pad Prism V7.
Statistical Analysis
For statistical analysis, either an unpaired t-test or a one way-ANOVA was used. Standard deviation is always shown for mean values of two to three technical repeats. Calculations and visualisations were done with GraphPad V7.
Sequences
Table 1
Figure imgf000038_0001
Figure imgf000039_0001
Table 2
Figure imgf000039_0002
Figure imgf000040_0001
Figure imgf000041_0001
Table 3
Figure imgf000041_0002
Table 4
Figure imgf000041_0003
Figure imgf000042_0001

Claims

Claims
1 . A method for ex vivo antibody diversification, comprising the steps: a. providing a plurality of mammalian cells, wherein each cell expresses an antibody of interest on the cell surface; b. introducing an expression system encoding a protein-RNA complex into the cell, wherein the protein-RNA complex comprises i. activation-induced cytidine deaminase (AID); ii. nuclease dead Cas9 (dCas9) or nickase Cas9 (nCas9); iii. transcription activator VP64; and iv. a single-guide RNA (sgRNA) comprising a tracrRNA part and a crRNA part, wherein the crRNA part is complementary to a sub-sequence of a DNA sequence encoding the antibody of interest, and wherein the protein components i.-iii. are covalently linked; c. selecting a cell expressing an antibody with increased or decreased affinity and/or altered effector functions, particularly via FACS; d. expanding the selected cell in cell culture.
2. The method according to claim 1 , wherein component ii. is dCas9.
3. The method according to any one of the preceding claims, wherein said dCas9 comprises amino acid substitutions D10A, D839A, H840A and N863A.
4. The method according to any one of the preceding claims 1 to 3, wherein said AID is wildtype full-length human AID.
5. The method according to any one of the preceding claims 1 to 3, wherein said AID is a hyperactive human AID variant (AID*A) lacking a C-terminal Nuclear Export Signal (NES) and comprising amino acid substitutions K10E, T82I, and E156G.
6. The method according to any one of the preceding claims, wherein said dCas9 is an enzymatically-inactive variant of Cas9 from Streptococcus, particularly from Streptococcus pyogenes.
7. The method according to any one of the preceding claims, wherein said AID and said dCas9 are covalently linked via a linker, particularly an XTEN linker of SEQ ID NO 012.
8. The method according to any one of the preceding claims, wherein the protein-RNA complex additionally comprises uracil-DNA glycosylase inhibitor (UGI), particularly UGI non-covalently associated with the protein-RNA complex.
9. The method according to claim 8, wherein the UGI is from the Bacillus subtilis bacteriophage PBS1 or PBS2, particularly the UGI is from PBS1.
10. The method according to any one of the preceding claims, wherein i. wild-type full-length human AID comprises or essentially consists of the sequence SEQ ID NO 001 , or ii. AID*A comprises or essentially consists of the sequence SEQ ID NO 003. The method according to any one of the preceding claims, wherein i. dCas9 comprises or essentially consists of the sequence SEQ ID NO 005; or ii. nCas9 comprises or essentially consists of the sequence SEQ ID NO 007. The method according to any one of the preceding claims, wherein said VP64 comprises or essentially consists of the sequence SEQ ID NO 009. The method according to any one of the preceding claims, wherein the tracrRNA part comprises or essentially consists of the sequence SEQ ID NO 011 . The method according to any one of the preceding claims, wherein said UGI comprises or essentially consists of the sequence SEQ ID NO 014.
PCT/EP2023/051453 2022-02-16 2023-01-20 Aid-based cytosine base editor system for ex vivo antibody diversification WO2023156139A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP22157081 2022-02-16
EP22157081.5 2022-02-16

Publications (1)

Publication Number Publication Date
WO2023156139A1 true WO2023156139A1 (en) 2023-08-24

Family

ID=80682695

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2023/051453 WO2023156139A1 (en) 2022-02-16 2023-01-20 Aid-based cytosine base editor system for ex vivo antibody diversification

Country Status (1)

Country Link
WO (1) WO2023156139A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018152197A1 (en) * 2017-02-15 2018-08-23 Massachusetts Institute Of Technology Dna writers, molecular recorders and uses thereof

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018152197A1 (en) * 2017-02-15 2018-08-23 Massachusetts Institute Of Technology Dna writers, molecular recorders and uses thereof

Non-Patent Citations (15)

* Cited by examiner, † Cited by third party
Title
ALTSCHUL ET AL., J. MOL. BIOL., vol. 215, 1990, pages 403 - 410
AUSUBEL ET AL.: "Short Protocols in Molecular Biology", 2002, JOHN WILEY & SONS, INC
BEERLI ET AL., PROC NATL ACAD SCI USA., vol. 95, no. 25, 8 December 1998 (1998-12-08), pages 14628 - 33
D. AGUDELO ET AL., NATURE METHODS, vol. 14, 2017, pages 615 - 620
E. M. PORTO ET AL., NATURE REVIEWS DRUG DISCOVERY, vol. 19, 2020, pages 839 - 859
K. CLEMENT ET AL., NATURE BIOTECHNOLOGY, vol. 37, 2019, pages 224 - 226
LIU DAISY LIU ET AL: "Intrinsic Nucleotide Preference of Diversifying Base Editors Guides Antibody Ex Vivo Affinity Maturation", CELL REPORTS, vol. 25, no. 4, 1 October 2018 (2018-10-01), US, pages 884 - 892.e3, XP055574037, ISSN: 2211-1247, DOI: 10.1016/j.celrep.2018.09.090 *
NEEDLEMANWUNSCH, J. MOL. BIOL., vol. 48, 1970, pages 443
P. MALI ET AL., SCIENCE, vol. 339, 2013, pages 823 - 826
PEARSONLIPMAN, PROC. NAT. ACAD. SCI., vol. 85, 1988, pages 2444
REN BIN ET AL: "Improved Base Editor for Efficiently Inducing Genetic Variations in Rice with CRISPR/Cas9-Guided Hyperactive hAID Mutant", MOLECULAR PLANT, vol. 11, no. 4, 1 April 2018 (2018-04-01), pages 623 - 626, XP055942983, ISSN: 1674-2052, DOI: 10.1016/j.molp.2018.01.005 *
SAMBROOK ET AL.: "Molecular Cloning: A Laboratory Manual", 2012, COLD SPRING HARBOR LABORATORY PRESS
SMITHWATERMAN, ADV. APPL. MATH., vol. 2, 1981, pages 482
X. S. LIU ET AL., CELL, vol. 167, 2016, pages 233 - 247
Y. NAITO ET AL., BIOINFORMATICS, vol. 31, 2015, pages 1120 - 1123

Similar Documents

Publication Publication Date Title
US20220033858A1 (en) Crispr oligoncleotides and gene editing
KR102210322B1 (en) Using rna-guided foki nucleases (rfns) to increase specificity for rna-guided genome editing
US20200032294A1 (en) Somatic haploid human cell line
US7608434B2 (en) Mutated Tn5 transposase proteins and the use thereof
JP4361971B2 (en) In vitro translocation system using modified TN5 transposase
EP2449135B1 (en) Rapid screening of biologically active nucleases and isolation of nuclease-modified cells
EP1506288B1 (en) Activation induced deaminase (aid)
CA3009727A1 (en) Compositions and methods for the treatment of hemoglobinopathies
KR20180069898A (en) Nucleobase editing agents and uses thereof
JP7138712B2 (en) Systems and methods for genome editing
JPWO2016167300A1 (en) A set of polypeptides that exhibit nuclease activity or nickase activity in a light-dependent or drug presence, or that suppress or activate target gene expression
WO2020168122A1 (en) Modified immune cells having adenosine deaminase base editors for modifying a nucleobase in a target sequence
AU2014223602C1 (en) Methods for the production of libraries for directed evolution
KR20190005801A (en) Target Specific CRISPR variants
JP2017517250A (en) Epigenetic modification of the mammalian genome using targeted endonucleases
WO2019165322A1 (en) Crispr/cas9 systems, and methods of use thereof
JP2023063448A (en) Method for modifying target site of double-stranded dna possessed by cell
CN109337904B (en) Genome editing system and method based on C2C1 nuclease
KR20220151175A (en) RNA-guided genomic recombination at the kilobase scale
CA2556997A1 (en) Methods for genetic diversification in gene conversion active cells
WO2023156139A1 (en) Aid-based cytosine base editor system for ex vivo antibody diversification
CN114144519A (en) Single base replacement proteins and compositions comprising the same
CN113748205A (en) Compositions and methods for improved gene editing
MX2012011738A (en) Method for the selection of a long-term producing cell.
US20230242922A1 (en) Gene editing tools

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23701882

Country of ref document: EP

Kind code of ref document: A1